Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Follow ups for the hilbert_distance function #85

Open
3 of 7 tasks
tastatham opened this issue Jul 21, 2021 · 4 comments
Open
3 of 7 tasks

ENH: Follow ups for the hilbert_distance function #85

tastatham opened this issue Jul 21, 2021 · 4 comments
Assignees
Labels
GSoC Google Summer of Code project

Comments

@tastatham
Copy link
Contributor

tastatham commented Jul 21, 2021

The hilbert_distance function will shortly be merged into Dask-GeoPandas.
There was several enhancements that were discussed as follow up items, which include:

  • To pass the total_bounds argument to hilbert_distance function if this is already pre-computed: ENH: allow to specify known total_bounds for Hilbert/Morton distance #145
  • Use the bounds of the spatial_partitions if present. -> ENH: use total_bounds of spatial_partitions if available for Hilbert/Morton distance #161
  • Calculating the mid-points for (Multi)Point geometries in _continuous_to_discrete_coords is not necessary:
    • To add a check to see whether the passed geometries are (Multi)Points
    • Only compute the mid-points for non-(Multi)Point geometries.
  • Currently the _continuous_to_discrete function clips discrete integer values using base Python because numpy.clip is not currently supported by Numba: (no longer using numba)
    • To add numpy.clip when it's supported by Numba - v0.54
  • Data type optimisation should be assessed
    • Total bit-width in _continuous_to_discrete - to use numpy.int32 instead of numpy.int64
  • Provide guidance on p parameter that controls precision of resulting Hilbert curve; need to help user understand how to set this themselves / when this can be useful to set this themselves. (also check hackmd notes)
  • To add a notebook

Note: Will add permalinks when merged.

@martinfleis
Copy link
Member

One more thing to add is to ensure that codecov sees the coverage of numba jitted functions. We'll probably need to run that bit again without that using env variable.

@martinfleis martinfleis added the GSoC Google Summer of Code project label Aug 4, 2021
@tastatham
Copy link
Contributor Author

tastatham commented Aug 7, 2021

Another follow up is to centralise/ generalise several functions that are used by both hilbert_distanceandmorton_distance`.

  • Firstly, the _continuous_to_discrete_coords function is imported from hilbert_distance.py in morton_distance.py - centralise
  • Secondly, the _distances_from_coordinates function is used in both hilbert_distance.py in morton_distance.py but are defined in slightly different ways - generalise

@TomAugspurger
Copy link
Contributor

I didn't follow this implementation closely, but is the plan to rely on the hilbertcurve package long-term? If so, it needs to be added to the install_requires:

dask-geopandas/setup.py

Lines 6 to 11 in 99fa1a8

install_requires = [
"geopandas",
"dask>=2.18.0,!=2021.05.1",
"distributed>=2.18.0,!=2021.05.1",
"numba",
]
.

@martinfleis
Copy link
Member

martinfleis commented Aug 18, 2021

@TomAugspurger We don't depend on it. It is used only in CI. We use the custom numba implementation ported from spatialpandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GSoC Google Summer of Code project
Projects
None yet
Development

No branches or pull requests

3 participants