Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Implement partitioning based on hilbert distance #71

Closed
wants to merge 6 commits into from

Conversation

tastatham
Copy link
Contributor

Implements partitioning Dask-GeoPandas based on calculated hilbert distances - based on the hilbert_distance function in #70.

Details:

  • Calculates the hilbert distance
  • Adds column using calculated distances
  • Sets index/re-shuffles data using new column

@martinfleis
Copy link
Member

Let's talk about this tonight. I would maybe prefer a different API, more generalisable.

Let's assume we have hilbert_distance and geohash functions based on which we can spatially repartition gdf. I would then prefer to have a single method (spatial_shuffle?) that can consume both.

# using hilbert
ddf = ddf.spatial_shuffle('hilbert')

# using geohash
ddf = ddf.spatial_shuffle('geohash')

This API can be extended by other methods without the necessity to have two methods for each algorithm (esp. since the repartition_ methods will be largely the same).

@jorisvandenbossche
Copy link
Member

I think we might also need APIs at two different levels: a high level one like the spatial_shuffle above that encapsulates both calculating the information (eg hilbert distance, geohash) and doing the actual shuffle, and a "lower level" repartitioning function based on pre-calculated data. For the hilbert case, this might simply be the existing dask set_index, but for geohash this is basically repartitioning based on a discrete attribute (-> #61)

@martinfleis
Copy link
Member

Replaced by #104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants