ENH: implement sindex.nearest_N_neigbours for both rtree and strtree in a consistent way (equal results) #1509

srenoes · 2020-07-09T12:33:38Z

Not quite shure what is the plan. There is a pull request which contains most of the discussion #1271 but that is not an issue. And partly nearest neighbour indexing is addressed in #1455. Original related issue seems to be #1096

My suggestion would be to do all changes in geopandas to take the differences between geos and rtree into consideration.

There were some issues with that in pygeos only 1 nearest neighbour is returned, not many if they have same distance:
discussion in issue (pygeos/pygeos#110), and pygeos pull request (pygeos/pygeos#111)

Part of the idea was as it seems to have a method with two additional inputs, N_neighbours and maxdistance.(#1271)

This is relatively easily implemented with existing methods in geopandas. And through those it would also work with both rtree and strtree. For pygeos using implementation could look like this:

sindex=gdf.sindex
pairs=sindex.query_bulk(gdf2.geometry.buffer(max_distance),predicate='intersects').T

geom1=gdf.iloc[pairs[:,1],:].geometry
geom2 =gdf2.iloc[pairs[:,0],:].geometry

distance=geom1.distance(geom2)

df_for_sorting=pandas.DataFrame([pairs[:,1],distance],columns=['geom1_idx','distance'],index=pairs[:,0])
indeces_to_keep=df_for_sorting.groupby('geom1_idx')['distance'].nsmallest(N_neighbours,keep='all').index
newpairs=np.array([indeces_to_keep.values,df_for_sorting.iloc[indeces_to_keep,0].values])
return newpairs

and for rtree it can (but does not have to) be different, maybe usefully so.

Performance wise this might be a problem with big maxdistance (Too many intersections to calculate distances for) In rtree it can be first filtered by N nearest neighbours and then by distance).
What about the other performance issues with building the tree mentioned in #1271 @martinfleis? Or is this solved by using the strtree from pygeos in case of big geometries?

Any opinions about that?

adriangb · 2020-08-26T01:18:26Z

I still think this should be implemented using the algorithm proposed in pygeos/pygeos#111 (comment) or something similar. Ideally in PyGeos/shapely since it'll be much faster.

Regarding the other two inputs proposed in #1271, I think those only make sense in the context of the rtree solution.

adriangb mentioned this issue Mar 2, 2021

ENH: sjoin_nearest #1865

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: implement sindex.nearest_N_neigbours for both rtree and strtree in a consistent way (equal results) #1509

ENH: implement sindex.nearest_N_neigbours for both rtree and strtree in a consistent way (equal results) #1509

srenoes commented Jul 9, 2020 •

edited

adriangb commented Aug 26, 2020 •

edited

ENH: implement sindex.nearest_N_neigbours for both rtree and strtree in a consistent way (equal results) #1509

ENH: implement sindex.nearest_N_neigbours for both rtree and strtree in a consistent way (equal results) #1509

Comments

srenoes commented Jul 9, 2020 • edited

adriangb commented Aug 26, 2020 • edited

srenoes commented Jul 9, 2020 •

edited

adriangb commented Aug 26, 2020 •

edited