Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use vectorized selection for rioxarray #14

Closed
spatialthoughts opened this issue Sep 7, 2022 · 5 comments
Closed

Use vectorized selection for rioxarray #14

spatialthoughts opened this issue Sep 7, 2022 · 5 comments

Comments

@spatialthoughts
Copy link
Contributor

Hi,

I came across this benchmark while comparing rasterstats and rioxarray. Looking into the rioxarray/extract_points.py, I found that it is using iteration instead of much faster vectorized selection. This results in 1000x speedup.

I have opened a PR that shows this improvement. Would be good to test and merge.
#13

@kadyb
Copy link
Owner

kadyb commented Sep 7, 2022

I merged this PR, thank you. I have two questions:

  1. Can you confirm that this operation after vectorization still uses 1 thread?
  2. Can we do something similar for {rasterio}?

@sgillies
Copy link

sgillies commented Sep 7, 2022

@kadyb @spatialthoughts rasterio got a lot faster in 1.3.0 (see rasterio/rasterio#2338) and is even more fast when points are sorted. I haven't compared it to rioxarray.

@kadyb
Copy link
Owner

kadyb commented Sep 7, 2022

After updating {rasterio} to version 1.3.0 I see the change from 18.17 s to 15.29 s for unsorted points.

BTW: @sgillies, could you check issue #10?

@spatialthoughts
Copy link
Contributor Author

@kadyb Responses below

  1. The dask backend is not enabled so it should use single process as before. I verified the resource use and saw a single process for both old and new code.
  2. I am not aware of anything faster than rasterio.sample.

@kadyb
Copy link
Owner

kadyb commented Dec 15, 2022

I just updated the benchmark. The change for {rioxarray} is from 69.7 to 0.08 s for ~68k points. So it looks like {rioxarray} and {terra} are the two fastest packages. In the next iteration I will add more points to the tests to make the differences more visable. Thanks again for the suggestions!

@kadyb kadyb closed this as completed Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants