Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Exclude same input geometry from output of nearest_all #324

Open
brendan-ward opened this issue Mar 25, 2021 · 6 comments
Open

ENH: Exclude same input geometry from output of nearest_all #324

brendan-ward opened this issue Mar 25, 2021 · 6 comments
Labels
enhancement New feature or request to-shapely This issue needs to be transferred to shapely

Comments

@brendan-ward
Copy link
Contributor

In this issue in Shapely, there is a need to find the nearest items in a tree constructed of the same geometries that are being used to query the tree, but return the non-self nearest neighbor(s).

One way to approach this would be to provide a new function that does not take new geometry inputs, but instead uses the underlying tree geometries in such a way that we can compare addresses of a geometry to the results returned within the nearest callback function, so as to avoid a more expensive test for equality. We would then return an arbitrarily large distance for the self-pair, to force the tree to look for the next nearest neighbor.

/cc @liunx7594

@brendan-ward brendan-ward added the enhancement New feature or request label Mar 25, 2021
@caspervdw
Copy link
Member

@sgillies
Copy link

@brendan-ward that's a good insight about the cost of an equality test. In https://github.com/Toblerity/Shapely/pull/1166/files#diff-6108807fae8f58167beefb19a5c3270a442c26cc1e5b3b62e0ce25ebbe4673c3R254 I'm promising an exclusive keyword argument to make the set of possible results and the input exclusive. One thing that's not great about using the GEOS distance is that all shapes that contain a shape have the same distance: 0.

@brendan-ward
Copy link
Contributor Author

One thing that's not great about using the GEOS distance is that all shapes that contain a shape have the same distance: 0

This is partly why we return all equidistant results via nearest_all, though that does present a bit of an awkward API ("nearest" implies a singular result). We leave it up to the caller to determine how to then ordinate the results where distance is 0; perhaps they could do so by comparing the distance between centroids or some other metric to better refine that (but it all seems very use specific so I don't know that we can do more here).

It looks like your implementation would extend nicely to nearest_all here as something broadcastable to the shape of the inputs, but I think we'd want to work out whether or not we want to express this in terms of avoiding self-nearest neighbors (proposed above, using pointer addresses), avoiding input nearest neighbors (via exclusive, using equality test), or both. I didn't do a performance test on the equality test, so the gains of using the pointer approach may or may not be negligible; it just seemed like it would involve fewer operations.

@jorisvandenbossche
Copy link
Member

I didn't do a performance test on the equality test, so the gains of using the pointer approach may or may not be negligible; it just seemed like it would involve fewer operations.

Performance might be a reason to choose the pointer method, but the two methods also can give a different result (there can be multiple geometries in the tree that are not identical, but are spatially equal: the one method only excludes the identical geometry, the other method excludes all equal geometries). Although I don't really have a good sense which of the two behaviours would be most preferable (probably also use case dependent)

@brendan-ward
Copy link
Contributor Author

the two methods also can give a different result

Agreed. I think some of this comes down to the degree of duplication of geometries present within the source geometries, and whether or not those are valid results for your analysis (fully equivalent vs functionally equivalent). Though as precedent, the shapely method uses equality of geometries.

I think it depends, like you say, on what the user is trying to accomplish:

  • what is the nearest geometry to me that is not me (pointer equality)
  • what is the nearest geomtry to me that is not in the exact same place (geometric equality)

The user could use a query_bulk with intersects predicate followed by an equals test to identify all the geometries that would fall into the gap between these two methods (non-self but equal geometry). (side note: maybe we should consider adding the equals predicates to query / query_bulk to help with this case?)

But unless they did a spatial deduplication step prior to constructing the tree, they wouldn't be able to easily force nearest to find the nearest geometry that is not spatially equal.

So - long story short - the equals method as established by Shapely is probably the best here, and perhaps we just alert the user to performance issues of equality tests for complex geometries that are spatially equal but not the same input geometry (should be pretty rare, right?).

@caspervdw
Copy link
Member

caspervdw commented Sep 30, 2021

I agree that we should stick to the “equals” and not go for the identity (pointer equality).

I don’t see a pygeos geometry as something that has an identity. They are like integers or strings: immutable and only their value gives them meaning.

Within the context of a tree, we could only do an exclusion by identity if one would specify the query geometry with an index into the tree. (e.g. tree.query(geom_idx) ) but something like that will have a very limited usability.

@caspervdw caspervdw added the to-shapely This issue needs to be transferred to shapely label Dec 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request to-shapely This issue needs to be transferred to shapely
Projects
None yet
Development

No branches or pull requests

4 participants