Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement more traits on Datum #704

Merged
merged 1 commit into from Sep 22, 2022

Conversation

syvb
Copy link
Contributor

@syvb syvb commented Sep 22, 2022

Implements From<bool>, Eq, and Hash on Datum. All of these traits are also implemented on usize, so this makes upgrading to pgx 0.5 (where Datum is now a struct) easier.

@workingjubilee
Copy link
Member

...Okay, From<bool> seems fine, but I gotta know. Are you really hashing Datums?

@syvb
Copy link
Contributor Author

syvb commented Sep 22, 2022

Yep, in my case I have code that hashes Datums to put them into a HyperLogLog, with a custom Hasher that can hash Datums right. I just verified that a Datum is hashed the same as a usize with the same content. This might not be the best way to make a Datum hash though.

@workingjubilee
Copy link
Member

workingjubilee commented Sep 22, 2022

So I am trying to understand the purpose of the hash that you are generating, because when you hash a Datum, you are either hashing a value (and so the value may be something other than 8 bytes in size and you are hashing in additional zeros which may or may not be intended) or you are hashing a pointer, and thus probably not hashing the actual thing you may want to hash (and would hash if it was in fact &T).

And I see you appear to be using this to hash what seem to be in fact double-precision floats...?

https://github.com/timescale/timescaledb-toolkit/blob/b7433344f90b142094e73e84c332385498db9335/extension/src/time_vector/pipeline/aggregation.rs#L539-L547

I assume you do not support executing code on 32-bit or 128-bit architectures, so I will not quibble with you over the correctness of hashing doubles as usizes, but...? Floats don't impl Hash either because the wonkiness of floats has odd consequences regarding NaN (especially for the count-distinct problem!). I am reluctant to approve Eq or Hash as a result: you can easily cast as a .value() if you need these implementations and want to ignore the questions about the validity of the underlying bits and what they mean.

@syvb
Copy link
Contributor Author

syvb commented Sep 22, 2022

I updated the PR to only implement From<bool>.

Just to note though: DatumHashBuilder considers the type of thing it's hashing, and calls internal Postgres functions to have Postgres handle hashing arbitrary types. I don't think it really makes sense for DatumHashBuilder to implement Hasher though, since it's not really a generic hasher: it can only handle data that's 8 bytes long and represents a Datum of the type specified at the creation of the hasher.

Another note: The Postgres hashing system will hash all NaNs the same in PG15; prior to PG15 NaNs hashed the same if their bits hashed the same. I should note that in the toolkit documentation.

@workingjubilee
Copy link
Member

Ahh that makes a bit more sense. Yeah, I think it's fine to hash Datums with a specialized construct, it just doesn't necessarily make sense for that to use the Hasher/Hash traits that Rust provides, IMO: those traits aren't necessarily suitable for all hashes and their name is a bit dubious, as they're really specifically about questions like "so, can I use it as the K of a HashMap<K, V>?" e.g. cryptographic hashers use their own specially-crafted types and traits instead.

@workingjubilee workingjubilee merged commit f4edcf4 into pgcentralfoundation:develop Sep 22, 2022
@syvb syvb deleted the sv/datum-bool branch September 22, 2022 22:39
workingjubilee pushed a commit to workingjubilee/pgrx that referenced this pull request Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants