Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make geometries hashable (again) #985

Closed
wants to merge 5 commits into from
Closed

Conversation

mwtoews
Copy link
Member

@mwtoews mwtoews commented Sep 16, 2020

It just occurred to me that with #960 to remove mutable geometries that we can restore hashable geometries.

This PR reverts #320 / e7ab27e (which also had tests that always passed).

Is there anything new to consider?

See also #209 with xrefs to relevant open issues to geopandas

@coveralls
Copy link

coveralls commented Sep 16, 2020

Coverage Status

Coverage remained the same at 82.019% when pulling 1592303 on mwtoews:hash into 1d734de on Toblerity:shapely-2.0.

@mwtoews
Copy link
Member Author

mwtoews commented Sep 16, 2020

The hash strategy used before, and again here, just uses the object hash. This means that two equal geometries would have a different hash, which may not be expected. Should we consider improving the hash strategy on (e.g.) WKB?

@jorisvandenbossche
Copy link
Member

The hash strategy used before, and again here, just uses the object hash. This means that two equal geometries would have a different hash, which may not be expected. Should we consider improving the hash strategy on (e.g.) WKB?

Yes, I think equal geometries (but not identical objects) should have an equal hash, ideally. So using WKB seems to be the way to go, although this also gives problems like linearring/linestring not being distinguishable (although we could hash the type id together with the wkb).

Now, this WKB-based hashing is already implemented in the base pygeos Geometry: https://github.com/pygeos/pygeos/blob/a12477f93f9cdb1ed80731ac3fc26059ec1096f2/src/pygeom.c#L116-L150
So if we keep this PR for after #983, the geometry classes will automatically inherit this hashability. And then the update to the tests of this PR is still useful.

@sgillies
Copy link
Contributor

@jorisvandenbossche I think it would be a mistake for shapely geometry object hashes to be special and unlike ordinary Python object hashes. Geometry objects might "inhabit" the same part of space but be different things and we'll get collisions when constructing dicts and sets if we treat spatially-coincident objects as the same object.

@jorisvandenbossche
Copy link
Member

a mistake for shapely geometry object hashes to be special and unlike ordinary Python object hashes.

It depends to what you compare. Most python objects don't use object.__hash__. For example, two tuples that are equal also have an equal hash:

In [12]: t1 = (1, 2) 

In [13]: t2 = (1, 2)

In [14]: t1 == t2  
Out[14]: True

In [15]: hash(t1)== hash(t2)
Out[15]: True

In the same line, I think you can expect Point(1, 2) and Point(1, 2), which compare equal, also to have an equal hash.

Otherwise I think that hash based functions (eg set([..]) to get the unique elements of a sequence) will give surprising / confusing results).

Geometry objects might "inhabit" the same part of space but be different things and we'll get collisions when constructing dicts and sets if we treat spatially-coincident objects as the same object.

Can you explain a bit more the worry about collisions here?
I think the WKB of two geometries is only equal if both geometries are "really" equal (same coordinates, structural equality), and not "spatially" equal?

@mwtoews mwtoews marked this pull request as draft September 16, 2020 21:51
@mwtoews
Copy link
Member Author

mwtoews commented Sep 16, 2020

I think the hashing strategy needs a bit more attention. Should this be discussed as part of shapely/shapely-rfc#1 ? Or is here sufficient?

@jorisvandenbossche
Copy link
Member

Personally, I would say this is more an API detail discussion we can have in Shapely itself (here on the PR, or in a dedicated issue for it), while the RFC discussed the general principles of mutability/hashability. But also happy to start a discussion on the RFC if others prefer that.

@mwtoews
Copy link
Member Author

mwtoews commented Dec 4, 2021

Superseded by #1250

@mwtoews mwtoews closed this Dec 4, 2021
@mwtoews mwtoews deleted the hash branch April 8, 2022 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants