-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: use STRtree.query_bulk in _area_tables_binning #110
Conversation
Codecov Report
@@ Coverage Diff @@
## master #110 +/- ##
==========================================
- Coverage 41.34% 34.62% -6.73%
==========================================
Files 11 11
Lines 549 491 -58
==========================================
- Hits 227 170 -57
+ Misses 322 321 -1
Continue to review full report at Codecov.
|
Timings: from tobler.area_weighted import area_interpolate
import geopandas
p = ("https://geographicdata.science/book/_downloads/"\
"f2341ee89163afe06b42fc5d5ed38060/sandiego_tracts.gpkg")
src = geopandas.read_file(p)
p = ("https://geographicdata.science/book/_downloads/"\
"d740a1069144baa1302b9561c3d31afe/sd_h3_grid.gpkg")
tgt = geopandas.read_file(p).to_crs(src.crs)
%timeit estimates = area_interpolate(src, tgt, ['total_pop']) master - |
+1 on going ahead with this approach. After several experiments and thinking, my sense with the issue is that the second test case was different balance of polygons, but also polygons were distributed differently over space, giving some bins very few but others a lot of polygons to check, and making the binning approach less efficient. Think of the case of western MSAs in the US, if we bin uniformly, the coastal buckets will have a lot more than the inlands... Not sure if that's the case, but it's a hunch. More generally, what we need for this operation is a spatial index and, given the ecosystem now has very good off-the-shelf options, my vote would be to jump on that wagon and enjoy further improvements down the line. |
@knaaptime there is a |
Docs for `STRTree` area interpolation
@martinfleis yep, not sure how that got there... In general though this is greatT dramatically simplifies the code too. I'm all for it |
Excellent improvement! |
xref #104
This is the latest version of single-core refactoring of
area_interpolate
. It turned out, that using pygeos-backedSTRtree
and its vectorizedquery_bulk
in geopandas is even faster than the numba version @darribas mentioned in #104.While pygeos is not strictly necessary and geopandas can use
rtree
for this operation, pygeos will bring speedup of the order of magnitude so I am adding it to requirements to make sure user have it installed.