Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROPOSAL: add STRtree::nearest functions #110

Closed
brendan-ward opened this issue Mar 10, 2020 · 3 comments · Fixed by #272
Closed

PROPOSAL: add STRtree::nearest functions #110

brendan-ward opened this issue Mar 10, 2020 · 3 comments · Fixed by #272

Comments

@brendan-ward
Copy link
Contributor

Given a tree constructed with target geometries, we should be able to get a sorted array of the indexes and distances of the N nearest neighbors, within some maximum distance. Any geometries that intersect get a distance of 0 and compete for the nearest position (not sure how to sort that out yet).

Something like:

>> tree = pygeos.STRtree(pygeos.points(np.arange(10), np.arange(10)))
>> tree.nearest(pygeos.point(0.25, 0.25), neighbors=2, max_distance=1000)
[array([0, 1]), array([0.353553, 1.06066])]

Likewise, a bulk version of this (similar to #108) could return 3 arrays:

>> tree = pygeos.STRtree(pygeos.points(np.arange(10), np.arange(10)))
>> tree.nearest([pygeos.point(0.25, 0.25), pygeos.point(2,2)], neighbors=2, max_distance=1000)
[array([0, 0, 1, 1]), array([0, 1, 2, 1]), array([0.353553, 1.06066, 0, 1.4142])]

There is an existing GEOSSTRtree_nearest_generic in GEOS, but it doesn't do what we want here, so I think we should implement our own approach in C; it is focused on returning the nearest geometry instead of the index of the nearest geometry. GEOSSTRtree_nearest_generic takes a callback function that gets passed the index of the geometries in the tree, and the source geometry. While it is easy to calculate the distance between them, it is much less easy to get the GEOS geoemetry object from an index in the tree, without also having a clean way to pass in the tree or geometries into the callback function. Ultimately, we want the indexes and distances of more than just the nearest neighbor.

I think we can do the following:

  1. calculate the envelope of the source geometry
  2. expand this up to max_distance as a new envelope
  3. query the tree using this envelope
  4. iterate over results from query and sort by distance from source geometry
  5. extract N nearest results and return arrays of indexes and distances
@brendan-ward
Copy link
Contributor Author

Correction: GEOSSTRtree_nearest_generic returns the index of the target geometry in the tree, and it is possible to pass the tree in as the userdata argument of the callback (but harder to pass in the tree and a dynamically resizable vector to store things like distance).

Not sure if it is helpful for us to surface this variant of nearest as well. With some wrapping around GEOSSTRtree_nearest_generic we could return the index of the nearest geometry reasonably easily.

@brendan-ward
Copy link
Contributor Author

Thought about this some more, k nearest neighbor functionality should probably come from GEOS, not implemented on top.

It is likely a matter of porting locationtech/jts#75 to GEOS, then wrapping that here.

For now, we should limit scope on nearest neighbor to what is readily available from GEOS in #111

@srenoes
Copy link

srenoes commented Jul 8, 2020

Yeps interesting extension. Though looking for the nearest neighbour function that should be as a method in GEOS for strtree and is if I understood right already in use in shapely:
https://shapely.readthedocs.io/en/latest/manual.html?highlight=nearest%20neighbour#strtree.STRtree.strtree.nearest

class structure of geos. Though there is no documentation of the nearest neighbour function, maybe shapely provides insight
https://geos.osgeo.org/doxygen/classgeos_1_1index_1_1strtree_1_1STRtree.html

isnt there also the GEOSSTRtree_nearest_generic from above?

also like the idea above about n nearest ....
extending this idea... isnt the centroid a natural reference point for distances? At least it should exist for any geometry,

otherwise there is the nearest_points function in shapely which could serve for determining the distance (between the two nearest points). Eventually that was also originally the idea yet for intersecting polygons (closed lines is somehow different) this maybe doesnt make sense. What would be a useful reference point? maybe centroid and in what cases? Or is anyway 0 distance useful as it literally is inside or extends inside.

Anyway GEOS has tons of other indeces which could be also useful to implement, especially delaunay since it is often used for nearest neighbour regression. Not sure if there is any other index that could do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants