-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(postgres): provide translation for hash ops
#9348
Conversation
| @@ -1501,10 +1501,13 @@ def test_distinct_on_keep_is_none(backend, on): | |||
| "trino", # checksum returns varbinary | |||
| ] | |||
| ) | |||
| def test_hash(backend, alltypes): | |||
| @pytest.mark.parametrize( | |||
| "dtype", ["smallint", "int", "bigint", "float", "double", "string"] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just not testing macaddr here, since alltypes doesn't have that column. Should I just add a macaddr column?
On top of that, I also think something is off with macaddr to begin with, though, since PostgresType.from_ibis(dt.macaddr) returns DataType(this=Type.VARCHAR), and test_macaddr_literal in ibis/backends/tests/test_network.py expects a text type to be returned; I think the change should be bigger, but wanted to get an opinion first. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a column to alltypes will be a headache -- I think we could add a standalone macaddr hash test, but that can also happen in a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, what I had in mind was that can just add the column to the alltypes table inside the test, but a standalone test would also work.
In any case, yes, happy to do it in a follow-up, just because I'm not sure about some of the other behaviors mentioned for macaddr (and also because I think that's a much less important use case). 😅
| if arg_dtype.is_int16(): | ||
| return self.f.hashint2extended(arg, 0) | ||
| elif arg_dtype.is_int32(): | ||
| return self.f.hashint4extended(arg, 0) | ||
| elif arg_dtype.is_int64(): | ||
| return self.f.hashint8extended(arg, 0) | ||
| elif arg_dtype.is_float32(): | ||
| return self.f.hashfloat4extended(arg, 0) | ||
| elif arg_dtype.is_float64(): | ||
| return self.f.hashfloat8extended(arg, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would all of these and also decimal types be covered by hashnumericextended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and you could dispatch on those with is_numeric()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, true...
| if arg_dtype.is_int16(): | |
| return self.f.hashint2extended(arg, 0) | |
| elif arg_dtype.is_int32(): | |
| return self.f.hashint4extended(arg, 0) | |
| elif arg_dtype.is_int64(): | |
| return self.f.hashint8extended(arg, 0) | |
| elif arg_dtype.is_float32(): | |
| return self.f.hashfloat4extended(arg, 0) | |
| elif arg_dtype.is_float64(): | |
| return self.f.hashfloat8extended(arg, 0) | |
| if arg_dtype.is_numeric(): | |
| return self.f.hash_numeric_extended(arg, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gforsyth Actually, doesn't seem to work. Maybe it would require an explicit cast.
I find these functions pretty poorly documented, so I will go ahead and revert this for now to get it green again.
| @@ -1501,10 +1501,13 @@ def test_distinct_on_keep_is_none(backend, on): | |||
| "trino", # checksum returns varbinary | |||
| ] | |||
| ) | |||
| def test_hash(backend, alltypes): | |||
| @pytest.mark.parametrize( | |||
| "dtype", ["smallint", "int", "bigint", "float", "double", "string"] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a column to alltypes will be a headache -- I think we could add a standalone macaddr hash test, but that can also happen in a followup
d3f5e96
to
23fbbc9
Compare
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Description of changes
Postgres supports hashing on a per-type basis, with "extended" versions providing 64-bit integer output. To list these, run
\df hash*extended:This PR implements as many of them as possible, from the set of available Ibis datatypes in ibis/backends/sql/datatypes.py. No attempt to cast is made (e.g. hashing booleans is unsupported, rather than casting to an integer and then using the available hash method).