-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate all sites that can hash a value inside a vector #9963
Consolidate all sites that can hash a value inside a vector #9963
Conversation
This pull request was exported from Phabricator. Differential Revision: D57892204 |
✅ Deploy Preview for meta-velox canceled.
|
…incubator#9963) Summary: This commit aims to guarantee consistent evaluation and hashing of NaN (Not-a-Number) values for floating-point types across multiple sites, including SimpleVector, VectorHasher, RowContainer, and ContainerRowSerde. Unit tests will be added incrementally to make it easier to review and to include additional changes that might be required for it. The changes to these classes would affect the following: - Multiple aggregates that have 'distinct' like semantics. Only for the codepaths that consume complex types as inputs. Aggregates include: multimap_agg, map_union_sum, map_union, count(distinct) - In-predicate - Map_subscript, only one optimization code path for constant/dictionary encoded map - Hash Partitioning operator (HashPartition) - Hash Join operator (HashProbe, HashTable) - Local partitioning operator (LocalPartition) - Hash Aggregation operator (GroupingSet) Differential Revision: D57892204
40ae068
to
4c96c65
Compare
This pull request was exported from Phabricator. Differential Revision: D57892204 |
…incubator#9963) Summary: This commit aims to guarantee consistent evaluation and hashing of NaN (Not-a-Number) values for floating-point types across multiple sites, including SimpleVector, VectorHasher, RowContainer, and ContainerRowSerde. Unit tests will be added incrementally to make it easier to review and to include additional changes that might be required for it. The changes to these classes would affect the following: - Multiple aggregates that have 'distinct' like semantics. Only for the codepaths that consume complex types as inputs. Aggregates include: multimap_agg, map_union_sum, map_union, count(distinct) - In-predicate - Map_subscript, only one optimization code path for constant/dictionary encoded map - Hash Partitioning operator (HashPartition) - Hash Join operator (HashProbe, HashTable) - Local partitioning operator (LocalPartition) - Hash Aggregation operator (GroupingSet) Differential Revision: D57892204
4c96c65
to
908fd36
Compare
This pull request was exported from Phabricator. Differential Revision: D57892204 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Since many functions or operators are affected and you plan to add unit tests for them incrementally, do you plan to land this PR first or after those unit tests are added?
vectorHasher->decode(*vector, allRows); | ||
vectorHasher->hash(allRows, false, hashes); | ||
std::vector<uint64_t> expected = { | ||
hasher(1.0), hasher(-1.0), kNaNHash, kNaNHash, hasher(0.0)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also test -0.0 here?
908fd36
to
c242103
Compare
…incubator#9963) Summary: This commit aims to guarantee consistent evaluation and hashing of NaN (Not-a-Number) values for floating-point types across multiple sites, including SimpleVector, VectorHasher, RowContainer, and ContainerRowSerde. Unit tests will be added incrementally to make it easier to review and to include additional changes that might be required for it. The changes to these classes would affect the following: - Multiple aggregates that have 'distinct' like semantics. Only for the codepaths that consume complex types as inputs. Aggregates include: multimap_agg, map_union_sum, map_union, count(distinct) - In-predicate - Map_subscript, only one optimization code path for constant/dictionary encoded map - Hash Partitioning operator (HashPartition) - Hash Join operator (HashProbe, HashTable) - Local partitioning operator (LocalPartition) - Hash Aggregation operator (GroupingSet) Reviewed By: kagamiori Differential Revision: D57892204
This pull request was exported from Phabricator. Differential Revision: D57892204 |
…incubator#9963) Summary: This commit aims to guarantee consistent evaluation and hashing of NaN (Not-a-Number) values for floating-point types across multiple sites, including SimpleVector, VectorHasher, RowContainer, and ContainerRowSerde. Unit tests will be added incrementally to make it easier to review and to include additional changes that might be required for it. The changes to these classes would affect the following: - Multiple aggregates that have 'distinct' like semantics. Only for the codepaths that consume complex types as inputs. Aggregates include: multimap_agg, map_union_sum, map_union, count(distinct) - In-predicate - Map_subscript, only one optimization code path for constant/dictionary encoded map - Hash Partitioning operator (HashPartition) - Hash Join operator (HashProbe, HashTable) - Local partitioning operator (LocalPartition) - Hash Aggregation operator (GroupingSet) Reviewed By: kagamiori Differential Revision: D57892204
c242103
to
7470b81
Compare
This pull request was exported from Phabricator. Differential Revision: D57892204 |
This pull request has been merged in e99972b. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Ensure NaNs values of any binary representations are treated as equal and can be identified as keys in a map. Summary of changes: - For primitive type keys: Use a map that employs the correct hashing and equality operators for floating types. - For complex type keys: It uses GenericView as a key that uses BaseVector's hashing and equality operators. Those recursively call SimpleVector's respective functions that were addressed in facebookincubator#9963 Differential Revision: D57681657
Summary: Ensure NaNs values of any binary representations are treated as equal and can be identified as keys in a map. Summary of changes: - For primitive type keys: Use a map that employs the correct hashing and equality operators for floating types. - For complex type keys: It uses GenericView as a key that uses BaseVector's hashing and equality operators. Those recursively call SimpleVector's respective functions that were addressed in facebookincubator#9963 Differential Revision: D57681657
Summary: Pull Request resolved: #9893 Ensure NaNs values of any binary representations are treated as equal and can be identified as keys in a map. Summary of changes: - For primitive type keys: Use a map that employs the correct hashing and equality operators for floating types. - For complex type keys: It uses GenericView as a key that uses BaseVector's hashing and equality operators. Those recursively call SimpleVector's respective functions that were addressed in #9963 Reviewed By: spershin Differential Revision: D57681657 fbshipit-source-id: 0ce281f303dd0633e698e4b386627c3831567d92
Summary: Ensures that NaNs with different binary representations are considered equal and deduplicated. Summary of changes: For primitive types, we ensure that the Set data structure used to deduplicate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in facebookincubator#9963 Differential Revision: D58161333
Summary: Ensures that NaNs with different binary representations are considered equal and deduplicated. Summary of changes: For primitive types, we ensure that the Set data structure used to deduplicate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in facebookincubator#9963 Differential Revision: D58161333
Summary: Ensures that NaNs with different binary representations are considered equal and deduplicated when used as keys in a map. Summary of changes: For primitive types, we ensure that the Map data structure used to accumulate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in facebookincubator#9963 Differential Revision: D58209683
Summary: The map data structure used in multimap_agg that handles complex keys uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as its hash and equals functions respectively. These were fixed in facebookincubator#9963 to handle NaN values correctly. Therefore, this change adds a unit to verify handling of complex keys that can contain NaN values. Differential Revision: D58212803
…ebookincubator#10074) Summary: The map data structure used in multimap_agg that handles complex keys uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as its hash and equals functions respectively. These were fixed in facebookincubator#9963 to handle NaN values correctly. Therefore, this change adds a unit to verify handling of complex keys that can contain NaN values. Differential Revision: D58212803
…ncubator#10047) Summary: Ensures that NaNs with different binary representations are considered equal and deduplicated. Summary of changes: All these aggs use SetAccumulator where: For primitive types, we ensure that the Set data structure used to deduplicate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in facebookincubator#9963 Differential Revision: D58161333
…ncubator#10047) Summary: Ensures that NaNs with different binary representations are considered equal and deduplicated. Summary of changes: All these aggs use SetAccumulator where: For primitive types, we ensure that the Set data structure used to deduplicate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in facebookincubator#9963 Differential Revision: D58161333
…incubator#9963) Summary: Pull Request resolved: facebookincubator#9963 This commit aims to guarantee consistent evaluation and hashing of NaN (Not-a-Number) values for floating-point types across multiple sites, including SimpleVector, VectorHasher, RowContainer, and ContainerRowSerde. Unit tests will be added incrementally to make it easier to review and to include additional changes that might be required for it. The changes to these classes would affect the following: - Multiple aggregates that have 'distinct' like semantics. Only for the codepaths that consume complex types as inputs. Aggregates include: multimap_agg, map_union_sum, map_union, count(distinct) - In-predicate - Map_subscript, only one optimization code path for constant/dictionary encoded map - Hash Partitioning operator (HashPartition) - Hash Join operator (HashProbe, HashTable) - Local partitioning operator (LocalPartition) - Hash Aggregation operator (GroupingSet) Reviewed By: kagamiori Differential Revision: D57892204 fbshipit-source-id: 3729aec02c597379c8fbb1d2eec190614aa57d43
Summary: Pull Request resolved: facebookincubator#9893 Ensure NaNs values of any binary representations are treated as equal and can be identified as keys in a map. Summary of changes: - For primitive type keys: Use a map that employs the correct hashing and equality operators for floating types. - For complex type keys: It uses GenericView as a key that uses BaseVector's hashing and equality operators. Those recursively call SimpleVector's respective functions that were addressed in facebookincubator#9963 Reviewed By: spershin Differential Revision: D57681657 fbshipit-source-id: 0ce281f303dd0633e698e4b386627c3831567d92
…incubator#9963) Summary: Pull Request resolved: facebookincubator#9963 This commit aims to guarantee consistent evaluation and hashing of NaN (Not-a-Number) values for floating-point types across multiple sites, including SimpleVector, VectorHasher, RowContainer, and ContainerRowSerde. Unit tests will be added incrementally to make it easier to review and to include additional changes that might be required for it. The changes to these classes would affect the following: - Multiple aggregates that have 'distinct' like semantics. Only for the codepaths that consume complex types as inputs. Aggregates include: multimap_agg, map_union_sum, map_union, count(distinct) - In-predicate - Map_subscript, only one optimization code path for constant/dictionary encoded map - Hash Partitioning operator (HashPartition) - Hash Join operator (HashProbe, HashTable) - Local partitioning operator (LocalPartition) - Hash Aggregation operator (GroupingSet) Reviewed By: kagamiori Differential Revision: D57892204 fbshipit-source-id: 3729aec02c597379c8fbb1d2eec190614aa57d43
Summary: Pull Request resolved: facebookincubator#9893 Ensure NaNs values of any binary representations are treated as equal and can be identified as keys in a map. Summary of changes: - For primitive type keys: Use a map that employs the correct hashing and equality operators for floating types. - For complex type keys: It uses GenericView as a key that uses BaseVector's hashing and equality operators. Those recursively call SimpleVector's respective functions that were addressed in facebookincubator#9963 Reviewed By: spershin Differential Revision: D57681657 fbshipit-source-id: 0ce281f303dd0633e698e4b386627c3831567d92
Summary: Ensure NaN values of different binary representation for floating point types are considered as equal. Summary of changes: - Primitive type Input: NaN of different binary representations are denormalized to the same representation before adding to the in-list and before being compared. - Complex Type input: Uses a set that employs hash and equality functions via BaseVector that have been fixed in facebookincubator#9963 to handle NaN values. Differential Revision: D58301120
Summary: Pull Request resolved: facebookincubator#10115 Ensure NaN values of different binary representation for floating point types are considered as equal. Summary of changes: - Primitive type Input: NaN of different binary representations are denormalized to the same representation before adding to the in-list and before being compared. - Complex Type input: Uses a set that employs hash and equality functions via BaseVector that have been fixed in facebookincubator#9963 to handle NaN values. Differential Revision: D58301120
Summary: Pull Request resolved: facebookincubator#10115 Ensure NaN values of different binary representation for floating point types are considered as equal. Summary of changes: - Primitive type Input: NaN of different binary representations are denormalized to the same representation before adding to the in-list and before being compared. - Complex Type input: Uses a set that employs hash and equality functions via BaseVector that have been fixed in facebookincubator#9963 to handle NaN values. Differential Revision: D58301120
Summary: Hash join and hash aggregation employ data structures like VectorHasher, RowContainer and ContainerRowSerde apart from the usual BaseVector to implement a custom hash table and store intermediate state. All these have been updated in facebookincubator#9963 to handle NaN values where all binary representations of the same are considered equal and hash to the same value. Therefore, this change only adds unit tests to verify the expected NaN behavior. Differential Revision: D58365384
Summary: Pull Request resolved: facebookincubator#9893 Ensure NaNs values of any binary representations are treated as equal and can be identified as keys in a map. Summary of changes: - For primitive type keys: Use a map that employs the correct hashing and equality operators for floating types. - For complex type keys: It uses GenericView as a key that uses BaseVector's hashing and equality operators. Those recursively call SimpleVector's respective functions that were addressed in facebookincubator#9963 Reviewed By: spershin Differential Revision: D57681657 fbshipit-source-id: 0ce281f303dd0633e698e4b386627c3831567d92
…ncubator#10125) Summary: Pull Request resolved: facebookincubator#10125 Hash join and hash aggregation employ data structures like VectorHasher, RowContainer and ContainerRowSerde apart from the usual BaseVector to implement a custom hash table and store intermediate state. All these have been updated in facebookincubator#9963 to handle NaN values where all binary representations of the same are considered equal and hash to the same value. Therefore, this change only adds unit tests to verify the expected NaN behavior. Differential Revision: D58365384
…ncubator#10125) Summary: Pull Request resolved: facebookincubator#10125 Hash join and hash aggregation employ data structures like VectorHasher, RowContainer and ContainerRowSerde apart from the usual BaseVector to implement a custom hash table and store intermediate state. All these have been updated in facebookincubator#9963 to handle NaN values where all binary representations of the same are considered equal and hash to the same value. Therefore, this change only adds unit tests to verify the expected NaN behavior. Differential Revision: D58365384
Summary: Pull Request resolved: #10125 Hash join and hash aggregation employ data structures like VectorHasher, RowContainer and ContainerRowSerde apart from the usual BaseVector to implement a custom hash table and store intermediate state. All these have been updated in #9963 to handle NaN values where all binary representations of the same are considered equal and hash to the same value. Therefore, this change only adds unit tests to verify the expected NaN behavior. Reviewed By: kagamiori Differential Revision: D58365384 fbshipit-source-id: 11d7d6d760295b12b42cd24a7c2fcf567a26bd24
) Summary: Pull Request resolved: #10074 The map data structure used in multimap_agg that handles complex keys uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as its hash and equals functions respectively. These were fixed in #9963 to handle NaN values correctly. Therefore, this change adds a unit to verify handling of complex keys that can contain NaN values. Reviewed By: kagamiori Differential Revision: D58212803 fbshipit-source-id: 81cbaa823baea1f4d42bbc32bb7ca5602634506c
) Summary: Pull Request resolved: #10071 Ensures that NaNs with different binary representations are considered equal and deduplicated when used as keys in a map. Summary of changes: For primitive types, we ensure that the Map data structure used to accumulate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in #9963 Reviewed By: kgpai Differential Revision: D58209683 fbshipit-source-id: f8697f248a0346456597ca71105eb5bcf1b71f8b
…ncubator#10047) Summary: Pull Request resolved: facebookincubator#10047 Ensures that NaNs with different binary representations are considered equal and deduplicated. Summary of changes: All these aggs use SetAccumulator where: For primitive types, we ensure that the Set data structure used to deduplicate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in facebookincubator#9963 Differential Revision: D58161333
…ncubator#10047) Summary: Pull Request resolved: facebookincubator#10047 Ensures that NaNs with different binary representations are considered equal and deduplicated. Summary of changes: All these aggs use SetAccumulator where: For primitive types, we ensure that the Set data structure used to deduplicate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in facebookincubator#9963 Reviewed By: kgpai Differential Revision: D58161333
…ncubator#10047) Summary: Pull Request resolved: facebookincubator#10047 Ensures that NaNs with different binary representations are considered equal and deduplicated. Summary of changes: All these aggs use SetAccumulator where: For primitive types, we ensure that the Set data structure used to deduplicate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in facebookincubator#9963 Reviewed By: kgpai Differential Revision: D58161333
Summary: Pull Request resolved: #10047 Ensures that NaNs with different binary representations are considered equal and deduplicated. Summary of changes: All these aggs use SetAccumulator where: For primitive types, we ensure that the Set data structure used to deduplicate uses hash and equality functions that handle NaN. For complex input it uses BaseVector::hashValueAt() and ContainerRowSerde::compare() as a hash and equality function, respectively which were fixed to handle NaNs properly in #9963 Reviewed By: kgpai Differential Revision: D58161333 fbshipit-source-id: d7db9ec2e86d0f87bc0a60b6e238ba95135d7eb1
Summary: Pull Request resolved: facebookincubator#10115 Ensure NaN values of different binary representation for floating point types are considered as equal. Summary of changes: - Primitive type Input: NaN of different binary representations are denormalized to the same representation before adding to the in-list and before being compared. - Complex Type input: Uses a set that employs hash and equality functions via BaseVector that have been fixed in facebookincubator#9963 to handle NaN values. Reviewed By: kagamiori Differential Revision: D58301120
Summary: Pull Request resolved: #10115 Ensure NaN values of different binary representation for floating point types are considered as equal. Summary of changes: - Primitive type Input: NaN of different binary representations are denormalized to the same representation before adding to the in-list and before being compared. - Complex Type input: Uses a set that employs hash and equality functions via BaseVector that have been fixed in #9963 to handle NaN values. Reviewed By: kagamiori Differential Revision: D58301120 fbshipit-source-id: 80b6958a8d9062c80e67ba2159a9a9093b236849
Summary:
This commit aims to guarantee consistent evaluation and hashing of NaN
(Not-a-Number) values for floating-point types across multiple sites,
including SimpleVector, VectorHasher, RowContainer, and ContainerRowSerde.
Unit tests will be added incrementally to make it easier to review and to
include additional changes that might be required for it.
The changes to these classes would affect the following:
the codepaths that consume complex types as inputs. Aggregates
include: multimap_agg, map_union_sum, map_union, count(distinct)
constant/dictionary encoded map
Differential Revision: D57892204