Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(postgres): provide translation for hash ops #9348

Merged
merged 5 commits into from
Jun 12, 2024

Conversation

deepyaman
Copy link
Contributor

@deepyaman deepyaman commented Jun 11, 2024

Description of changes

Postgres supports hashing on a per-type basis, with "extended" versions providing 64-bit integer output. To list these, run \df hash*extended:

Schema Name Result data type Argument data types Type
pg_catalog hash_aclitem_extended bigint aclitem, bigint func
pg_catalog hash_array_extended bigint anyarray, bigint func
pg_catalog hash_multirange_extended bigint anymultirange, bigint func
pg_catalog hash_numeric_extended bigint numeric, bigint func
pg_catalog hash_range_extended bigint anyrange, bigint func
pg_catalog hash_record_extended bigint record, bigint func
pg_catalog hashbpcharextended bigint character, bigint func
pg_catalog hashcharextended bigint "char", bigint func
pg_catalog hashenumextended bigint anyenum, bigint func
pg_catalog hashfloat4extended bigint real, bigint func
pg_catalog hashfloat8extended bigint double precision, bigint func
pg_catalog hashinetextended bigint inet, bigint func
pg_catalog hashint2extended bigint smallint, bigint func
pg_catalog hashint4extended bigint integer, bigint func
pg_catalog hashint8extended bigint bigint, bigint func
pg_catalog hashmacaddr8extended bigint macaddr8, bigint func
pg_catalog hashmacaddrextended bigint macaddr, bigint func
pg_catalog hashnameextended bigint name, bigint func
pg_catalog hashoidextended bigint oid, bigint func
pg_catalog hashoidvectorextended bigint oidvector, bigint func
pg_catalog hashtextextended bigint text, bigint func
pg_catalog hashtidextended bigint tid, bigint func
pg_catalog hashvarlenaextended bigint internal, bigint func

This PR implements as many of them as possible, from the set of available Ibis datatypes in ibis/backends/sql/datatypes.py. No attempt to cast is made (e.g. hashing booleans is unsupported, rather than casting to an integer and then using the available hash method).

@deepyaman deepyaman marked this pull request as draft June 11, 2024 00:05
@deepyaman deepyaman marked this pull request as ready for review June 11, 2024 00:50
@@ -1501,10 +1501,13 @@ def test_distinct_on_keep_is_none(backend, on):
"trino", # checksum returns varbinary
]
)
def test_hash(backend, alltypes):
@pytest.mark.parametrize(
"dtype", ["smallint", "int", "bigint", "float", "double", "string"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just not testing macaddr here, since alltypes doesn't have that column. Should I just add a macaddr column?

On top of that, I also think something is off with macaddr to begin with, though, since PostgresType.from_ibis(dt.macaddr) returns DataType(this=Type.VARCHAR), and test_macaddr_literal in ibis/backends/tests/test_network.py expects a text type to be returned; I think the change should be bigger, but wanted to get an opinion first. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a column to alltypes will be a headache -- I think we could add a standalone macaddr hash test, but that can also happen in a followup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, what I had in mind was that can just add the column to the alltypes table inside the test, but a standalone test would also work.

In any case, yes, happy to do it in a follow-up, just because I'm not sure about some of the other behaviors mentioned for macaddr (and also because I think that's a much less important use case). 😅

@deepyaman deepyaman requested a review from jcrist June 11, 2024 04:35
Comment on lines +595 to +604
if arg_dtype.is_int16():
return self.f.hashint2extended(arg, 0)
elif arg_dtype.is_int32():
return self.f.hashint4extended(arg, 0)
elif arg_dtype.is_int64():
return self.f.hashint8extended(arg, 0)
elif arg_dtype.is_float32():
return self.f.hashfloat4extended(arg, 0)
elif arg_dtype.is_float64():
return self.f.hashfloat8extended(arg, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would all of these and also decimal types be covered by hashnumericextended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you could dispatch on those with is_numeric()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, true...

Suggested change
if arg_dtype.is_int16():
return self.f.hashint2extended(arg, 0)
elif arg_dtype.is_int32():
return self.f.hashint4extended(arg, 0)
elif arg_dtype.is_int64():
return self.f.hashint8extended(arg, 0)
elif arg_dtype.is_float32():
return self.f.hashfloat4extended(arg, 0)
elif arg_dtype.is_float64():
return self.f.hashfloat8extended(arg, 0)
if arg_dtype.is_numeric():
return self.f.hash_numeric_extended(arg, 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gforsyth Actually, doesn't seem to work. Maybe it would require an explicit cast.

I find these functions pretty poorly documented, so I will go ahead and revert this for now to get it green again.

@@ -1501,10 +1501,13 @@ def test_distinct_on_keep_is_none(backend, on):
"trino", # checksum returns varbinary
]
)
def test_hash(backend, alltypes):
@pytest.mark.parametrize(
"dtype", ["smallint", "int", "bigint", "float", "double", "string"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a column to alltypes will be a headache -- I think we could add a standalone macaddr hash test, but that can also happen in a followup

Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
@cpcloud cpcloud added this to the 9.1 milestone Jun 12, 2024
@cpcloud cpcloud added feature Features or general enhancements postgres The PostgreSQL backend labels Jun 12, 2024
@cpcloud cpcloud enabled auto-merge (squash) June 12, 2024 10:48
@cpcloud cpcloud merged commit 57e2348 into ibis-project:main Jun 12, 2024
79 checks passed
@deepyaman deepyaman deleted the feat/postgres/hash branch June 12, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements postgres The PostgreSQL backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants