add hexify function #111

knaaptime · 2020-12-23T22:21:13Z

following the example of some similar code from the gdsbook, this adds a function to generate a geodataframe of h3 hexagons in the footprint of a given source gdf. I think its useful to have around and i imagine would be useful to generate a target for interpolation

codecov-io · 2020-12-23T22:25:14Z

Codecov Report

Merging #111 (ea73d91) into master (a5d8649) will increase coverage by 4.53%.
The diff coverage is 84.00%.

@@            Coverage Diff             @@
##           master     #111      +/-   ##
==========================================
+ Coverage   37.20%   41.74%   +4.53%     
==========================================
  Files          11       12       +1     
  Lines         508      551      +43     
==========================================
+ Hits          189      230      +41     
- Misses        319      321       +2

Impacted Files	Coverage Δ
tobler/util/util.py	`68.25% <71.42%> (+11.11%)`	⬆️
tobler/tests/test_utils.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5d8649...ea73d91. Read the comment docs.

darribas

I LOVE this idea, I think it's great. I've left a few comments in terms of suggestions, some are more for consideration than anything else.

Thinking through it, one of the values of putting this in tobler is not stopping at generating the hex geometries but having methods to take your data into H3 (or other indexing systems like S2 down the line, as I suggest in the review). You have a bunch of data on different shapefiles with different geometries for the same region, and tobler helps you "align" them on a standard system like H3. In this model, you could choose how each column/geometry is transfered to H3 (which would give us an excuse to integrate "entry-point" methods that dispatch to each of the techniques that tobler covers (area weighted, dasymetric, model-based...). A little bit like the meta-API we've discussed for model in PySAL. Thinking about this, this could be done for any two arbitrary geographies and then if you only specify the input one, the output could go into H3. Some food for thought (and discussion).

Otherwise, the code itself looks good to me!

darribas · 2020-12-26T20:48:07Z

tobler/util/util.py

@@ -102,14 +106,76 @@ def project_gdf(gdf, to_crs=None, to_latlong=False):

    # calculate the centroid of the union of all the geometries in the
    # GeoDataFrame
-    avg_longitude = gdf['geometry'].unary_union.centroid.x
+    avg_longitude = gdf["geometry"].unary_union.centroid.x


@martinfleis will know better, but I think gdf.geometry is a safer, more general approach to pick the geom column in a GeoDataFrame (it could be set to a column called "geom" but the gdf.geometry will always pick it up correctly

Yes. Unless you create "geometry" column by yourself, you cannot assume that it is there and called as such. gdf.geometry automatically picks the active geometry no matter its name.

yep, as i mentioned when @martinfleis raised this a bit ago, i'm confident this pattern appears elsewhere also, so still need to find and resolve all instances.

Further, what i'd prefer to do moving forward is pin to a version of geopandas with the native version of this function implemented and remove this implementation altogether

darribas · 2020-12-26T20:49:41Z

tobler/util/util.py


    # project the GeoDataFrame to the UTM CRS
    projected_gdf = gdf.to_crs(utm_crs)

    return projected_gdf
+
+
+def hexify(source, resolution=6, clip=False):


I'd suggest to rename to h3fy. It's not to any or a random hexagonal grid, it's the H3 implementation. This will also open the door for sister methods in the future like s2fy or OSfy (for the Ordnance Survey standard grid, for example).

cool. figured there'd be some suggestions on the name

darribas · 2020-12-26T20:51:43Z

tobler/util/util.py

+    source : geopandas.GeoDataFrame
+        GeoDataFrame to transform into a hexagonal grid
+    resolution : int, optional
+        resolution of output h3 hexgrid. 


Document default value

cool. also wasnt sure if 6 was the ideal resolution, but will go ahead and leave as-is

darribas · 2020-12-26T20:59:37Z

tobler/util/util.py

+    ----------
+    source : geopandas.GeoDataFrame
+        GeoDataFrame to transform into a hexagonal grid
+    resolution : int, optional


I think this is fine for now, so not a comment to implement before approval. Going forward, it'd be great to give also str options, something like:

auto: some "magic" that gives the "correct" one, if we can come up with some heuristics

balanced (or some alternative name): for the resolution that'll give you the closest number of hexagons to those passed in source

compacted: one that allocates to a H3 Polyfill compacted version

...

yeah these are great ideas. I'd like to merge this version first, then raise an issue for these options as an enhancement

one option might be to try and roughly match the number of input geoms, another option might be to try and match the size, e.g. by taking the best match between mean area of input geoms and the hexes listed at https://h3geo.org/docs/core-library/restable

darribas · 2020-12-26T21:02:14Z

tobler/util/util.py

+    clip : bool, optional
+        if True, hexagons are clipped to the precise boundary of the source gdf. Otherwise,
+        heaxgons along the boundary will be left intact.
+


I'd add an option for return_geoms or similar that controls whether the actual polygons are returned. I'm not sure, we probably have to create them to do the apportioning but given that tools like Kepler.gl can ingest H3 ids and render them and that some folks might use this for fusing data, if we can find ways of apportioning without creating the geometries, we might be able to be more efficient, and not have to create/return a bunch of hex polygons that'll be discarded as soon as the table is created. Food for thought.

darribas · 2020-12-26T21:04:21Z

tobler/util/util.py

+        name="hex_id",
+    )
+
+    polys = hexids.apply(


This could potentially be processed in parallel for faster execution in modern hardware. Chunk hexids, send to each core (similar to how we're thinking about it in #112) and collect later. It should scale almost linearly.

good idea. I wonder if we could also outsource some of that logic to something like modin?

I wonder if we could also outsource some of that logic to something

That should be ideally dask-geopandas which is under development.

+1, though in this case the operation is just an apply, which shouldn't need dask-geopandas (and as far as i understand dask doesnt do apply?)

so in a perfect world, maybe dask-geopandas could end up as an additional backend for modin or something, depending on the operation?

Dask does row-wise apply very efficiently. It cannot do column-wise apply but that is not an issue here.

It is true that we can have hexids as dask.Series and then apply remains the same. Then below we'll just call compute() before creating the GeoDataFrame. That should be relatively straightforward to implement with a vanilla dask.DataFrame even now.

sweet. would the idea be to check whether dask in installed, and then use multicore handling and fallback if not, or to make dask a full dep?

I would add a keyword to control that, similar to this implementation we have in momepy: https://github.com/martinfleis/momepy/blob/19e3a1a6eb9577a2c160710bd139034a459a4f97/momepy/elements.py#L433.

There may be cases when single-core is faster than dask (typically small areas) due to to the scheduler overhead.

cool, thanks for the tip. i'm gonna go ahead and merge this implementation so we all have it available, then we can hack on the multicore enhancement

knaaptime · 2020-12-26T21:39:52Z

sweet

Thinking through it, one of the values of putting this in tobler is not stopping at generating the hex geometries but having methods to take your data into H3 (or other indexing systems like S2 down the line, as I suggest in the review). You have a bunch of data on different shapefiles with different geometries for the same region, and tobler helps you "align" them on a standard system like H3.

definitely agree, and had the same idea particularly when toying around with this notebook in binder. I think the way rasters are handled is probably a lot fater than the way we're doing it now, so happy to keep exploring this thread

martinfleis

Minor code suggestion re CRS handling. Requires newer pyproj but I think that the minimal version geopandas requires is enough.

tobler/util/util.py

Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

knaaptime · 2020-12-27T00:36:11Z

anybody got any idea why the tests arent passing? I can see h3 is in the environment but still getting an import error..?

knaaptime · 2020-12-27T01:25:16Z

got it, on conda, h3 is the C library and h3-py is the package of python bindings (whereas the pypi package is just h3)

darribas · 2020-12-31T15:47:01Z

sweet

Thinking through it, one of the values of putting this in tobler is not stopping at generating the hex geometries but having methods to take your data into H3 (or other indexing systems like S2 down the line, as I suggest in the review). You have a bunch of data on different shapefiles with different geometries for the same region, and tobler helps you "align" them on a standard system like H3.

definitely agree, and had the same idea particularly when toying around with this notebook in binder. I think the way rasters are handled is probably a lot fater than the way we're doing it now, so happy to keep exploring this thread

Let's move this to a separate issue to figure out ways forward. That notebooks seems to use rasters as DataFrame objects? In any case, I think @martinfleis and I may have ideas. I'll leave you open the issue @knaaptime as I think you'll have a better sense of what'd level would be best to discuss ("transferring" data to h3? a meta method to "transfers" data from one geography to another and has defaults to H3 and other indexing systems? Something else?

add hexify

e20e528

knaaptime changed the title ~~add hexify~~ add hexify function Dec 23, 2020

knaaptime requested review from darribas and martinfleis December 23, 2020 22:27

check for string crs equality

6c7e3f5

darribas reviewed Dec 26, 2020

View reviewed changes

martinfleis reviewed Dec 26, 2020

View reviewed changes

tobler/util/util.py Outdated Show resolved Hide resolved

tobler/util/util.py Outdated Show resolved Hide resolved

knaaptime and others added 8 commits December 26, 2020 14:42

Merge branch 'master' of github.com:pysal/tobler into hex

e9ed3ca

Update tobler/util/util.py

2d5ea04

Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

Update tobler/util/util.py

4c6c9d6

Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

hexify to h3fy

f48f633

add h3 to test reqs

6e8063e

add h3 to ci reqs

c8f010e

add clip test

ba6a3b1

alternate h3 import

8266839

knaaptime added 2 commits December 26, 2020 17:04

conda uses h3-py

fce3928

use almost_equal in test

6d4f3de

update conda instructions in import error message

ea73d91

knaaptime merged commit 88ca482 into pysal:master Dec 27, 2020

darribas mentioned this pull request Dec 31, 2020

Add pre-set options to h3fy resolution #118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add hexify function #111

add hexify function #111

knaaptime commented Dec 23, 2020 •

edited

codecov-io commented Dec 23, 2020 •

edited

darribas left a comment

darribas Dec 26, 2020

martinfleis Dec 26, 2020

knaaptime Dec 26, 2020

darribas Dec 26, 2020

knaaptime Dec 26, 2020

darribas Dec 26, 2020

knaaptime Dec 26, 2020

darribas Dec 26, 2020 •

edited

knaaptime Dec 26, 2020

knaaptime Dec 27, 2020

darribas Dec 26, 2020

darribas Dec 26, 2020

knaaptime Dec 26, 2020

martinfleis Dec 26, 2020

knaaptime Dec 27, 2020

martinfleis Dec 27, 2020

knaaptime Dec 27, 2020

martinfleis Dec 27, 2020

knaaptime Dec 27, 2020

knaaptime commented Dec 26, 2020

martinfleis left a comment

knaaptime commented Dec 27, 2020

knaaptime commented Dec 27, 2020

darribas commented Dec 31, 2020

add hexify function #111

add hexify function #111

Conversation

knaaptime commented Dec 23, 2020 • edited

codecov-io commented Dec 23, 2020 • edited

Codecov Report

darribas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darribas Dec 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knaaptime commented Dec 26, 2020

martinfleis left a comment

Choose a reason for hiding this comment

knaaptime commented Dec 27, 2020

knaaptime commented Dec 27, 2020

darribas commented Dec 31, 2020

knaaptime commented Dec 23, 2020 •

edited

codecov-io commented Dec 23, 2020 •

edited

darribas Dec 26, 2020 •

edited