Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hex and Square Grid generators #467

Closed
wants to merge 10 commits into from

Conversation

pramsey
Copy link
Member

@pramsey pramsey commented Aug 22, 2019

  • ST_HexagonGrid(size float8 DEFAULT 1.0, bounds geometry DEFAULT 'LINESTRING(0 0, 100 100)')
  • ST_SquareGrid(size float8 DEFAULT 1.0, bounds geometry DEFAULT 'LINESTRING(0 0, 100 100)')

Can be easily extended to triangles too, I'm quite pleased that the SRF machinery is shared between them. Documentation will take some work to explain how to generate nesting hierarchies, but it actually only requires doubling/halving the size parameter, as everything lines up at the origin neatly.

screenshot_285
screenshot_286

The single-valued functions really just come along for the ride, not sure how useful they will be, but it makes sense to be able to generate a single geometry at a particular cell location.

  • ST_Hexagon(size float8, origin geometry DEFAULT 'POINT(0 0)', cell_i int4 DEFAULT 0, cell_j int4 DEFAULT 0)
  • ST_Square(size float8, origin geometry DEFAULT 'POINT(0 0)', cell_i int4 DEFAULT 0, cell_j int4 DEFAULT 0)

@bowguy
Copy link
Contributor

bowguy commented Aug 23, 2019

This is a strange request but it would be nice to add a value about proximity or adjacency. For example, every tiling scheme can be colored so no two adjacent tiles have the same color. The minimum is 4 - https://en.m.wikipedia.org/wiki/Four_color_theorem
This would help in distribution of a database in a cluster configuration. You don't want adjacent data on the same node - otherwise a local analysis would only utilize that node and the others stay idle.

@strk
Copy link
Member

strk commented Aug 23, 2019

I don't see the usefulness of default arguments for the grid functions, would rather drop them.
The "bounds" parameter could be confusing: I guess the function will ensure the given extent will be covered, but that means very often that the real boundary will be larger, right ?

The single-cell version I don't like (what's i/j for the hexagon part?). I'm sure you can call the grid generator to produce single-cell anyway.

@Komzpa
Copy link
Member

Komzpa commented Aug 23, 2019

Would be good if the argument will just accept a geometry, and push out only the hexes/cells that ST_Intersects with it, instead of Grid. This would allow to use it both ways: to join to a table with index lookups, or just seq scan a table and output a coverage for each shape, and after that GROUP BY the hexagons.

@strk
Copy link
Member

strk commented Aug 23, 2019 via email

/* Ratio of hexagon edge length to height = 2*cos(pi/6) */
#define HXR 1.7320508075688774

static const double hex_x[] = {0.0, 1.0, 1.5, 1.0, 0.0, -0.5, 0.0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels unsafe - hexagons may not collide on the boundary of jump to a next double exponent value, getting a small gap or overlap between them, if you try to get to it via subtraction from next or addition from previous. It's bulletproof to build a hexagon from repeated points from adjacent hexes though.

A good test is to ST_Union a worldful of hexagons.

static LWGEOM *
square(double origin_x, double origin_y, double E, int cell_i, int cell_j, uint32_t srid)
{
POINT4D pt;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
POINT4D pt;
POINT4D pt = {0,0,0,0};

This way no need to set it later.

-- Availability: 1.0.0
CREATE TYPE geometry_grid AS (
geom geometry,
i integer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are i and j needed here? what common scenario breaks if it was just a geometry?

@dbaston
Copy link
Member

dbaston commented Aug 23, 2019

I don't see the usefulness of default arguments for the grid functions, would rather drop them.

Would be good if the argument will just accept a geometry, and push out only the hexes/cells that ST_Intersects with it

Agree with both of these comments. I think these functions will be a nice addition to core.

@Komzpa
Copy link
Member

Komzpa commented Aug 26, 2019

There should be an option or example how to make square grid coincide with tile boundaries.

static const double sqr_y[] = {0.0, 0.0, 1.0, 1.0, 0.0};

static LWGEOM *
square(double origin_x, double origin_y, double E, int cell_i, int cell_j, uint32_t srid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some issue with the parameter naming:

  • origin_x and origin_y: They require looking at the code to know if the origin is used as top-left, bottom-right or whatever.
  • E: interesting name for the size, I have no idea where it might come from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E = edge, in my hand-written notes on the math of hexagon generating, so.
the problem isn't so much the word "origin" as the "are my cell coordinates cartesian or image coordinates" which is really at the root of it. right now they are cartesian, so "origin" means "cartesian origin", however... the comment about wanting to generate equivalent Google grids raises the idea of having them be image coordinates instead. yuck.

@Algunenano
Copy link
Member

Apart from the things that are missing (tests and documentation) and without having reviewed the code too much, I'm wondering a similar thing as @Komzpa mentioned: How would I use this to intersect with other geometries?

Let's say I have a table with N points and I want to group them by hexagons of a certain size.

  • The brute force method would be to first generate all the hexagons of that size (for the whole world), and then count the intersections of the 2 tables. This is highly inefficient and might require lots of memory.
  • An option that might help with that is to first calculate the extent of the first table /query, then generate the hexagons that cover that extent and finally intersect the 2 layers. The worst case is still as inefficient as the brute force method.

On the other hand, if instead I had a function that given a point / geometry and the hexagon grid properties it returns the x/y hexagon (or list) that intersect with it, then I could first group by the simpler x/y values and finally generate the polygons based on x/y. I'm not sure if this approach could be better, but it feels like it.

postgis/Makefile.in Outdated Show resolved Hide resolved
postgis/lwgeom_hexgrid.c Outdated Show resolved Hide resolved
postgis/lwgeom_hexgrid.c Outdated Show resolved Hide resolved
postgis/lwgeom_hexgrid.c Outdated Show resolved Hide resolved
int cell_height;
GBOX bounds;
double size;
uint32_t srid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, srids can be negative and use a int32_t.

postgis/lwgeom_hexgrid.c Outdated Show resolved Hide resolved
@pramsey
Copy link
Member Author

pramsey commented Sep 3, 2019

So, thinking about the expanded functionality requests...

  • given a shape return the set of grid coordinates covered by that shape
  • given a shape return the hex/squares that cover that shape (and their grid coordinates)

The arbitrary shape thing, I think works OK, with a few extras

  • we then need, in addition to the shape to fill with hex/squares, the origin from which to calculate the grid, since the shape doesn't necessarily define the grid domain anymore

We also have some real unpleasant fun with supporting other shape types

  • for polygons, we need some notion of what hexagons/squares a polygons can be said to "cover", which for consistency in the polygon coverage case I would tend to lean towards saying "contains center" means "covers", so that a coverage will tend to have hexes assigned to only one covering polygon at a time (that implies somewhat more complex "containment" code, to handle the "center is on a polygon edge" case, as we neither want those double-counted nor zero-counted)
  • for points, the situation inverts, and we "just" return the one hex that literally contains the point (the PDAL hexer code might have some efficient math for this to copy)
  • for lines, we end up with some real fun... the idea that generating grid coordinates will be "more efficient" starts to get worrisome... this begins to look like a rasterizer

I feel this spinning out of control a little bit. Yes, generating hexes and joining them can be inefficient, but (a) it works out of the box and (b) Carto has been doing this for almost a decade and people just seem to suck up the pain?

Maybe the generic shape case can be deferred?

@Komzpa
Copy link
Member

Komzpa commented Sep 4, 2019

The case of polygon coverage, the notion should be "hex intersects polygon". This lets you have all the hexes, and normalize values by some count(*) over (partition by geom). If you want to count them one-hex-per-polygon, it's trivial to run this binning over ST_Centroid(geom) instead of geom.

The optimal performance doesn't have to be there right away, but better to catch better API now.

@pramsey
Copy link
Member Author

pramsey commented Sep 4, 2019

If the poly/hex case is a full intersection then the optimization power of being able to quickly return a grid coordinate goes away: I'd be doing a full geometry intersects anyways, which is just what a user would be doing in materializing the grid and then doing an intersects on it. It's possible to do point-in-hexgrid with a little math (looked it up in the pdal hexgrid implementation) but full intersection is full intersection.

I'm still kind of intrigued by the "build a hex cover for this shape" idea, but it does imply an extra origin parameter. Optional? Required? Default origin of hex cover to 0,0 (all hex covers will align magically), or to the LL of the input shape?

@Algunenano
Copy link
Member

What about aggregating points / geometries on a grid (hex or square)?

As far as I can tell, the proposed API is extremely awful since you need to generate the whole tile grid (which can be billions of hex/squares depending on their size and extent) to then join the 2 tables.
For extremely dense inputs you can optimize this using the input table extent, but for sparse datasets (single point in Canada, single point in Australia) this method does not scale.
IIRC, CARTO's implementation has this same issue and we've discussed some ideas already: CartoDB/cartodb-postgresql#314

Do you see a better approach to do this aggregation with this API?

@pramsey
Copy link
Member Author

pramsey commented Sep 4, 2019

Well, it seems like being able do the arbitrary geometry input is the answer, then you just group by your i, j output coordinates, and ignore the generated geometry. It requires an origin point or an assumed 0,0 origin, but would seem to work...

I guess I should start on the arbitrary geometry form of this, seems to be desired, if quite complex. The question of what rules to use for (a) polygons (hex intersects polygon, or hex centre intersects polygon) and (b) lines (hex... ? line)

ST_HexagonGrid(float8 size, geometry shape_to_cover, [geometry lower_left_origin]) returns setof geometry_grid

@Algunenano
Copy link
Member

The question of what rules to use for (a) polygons (hex intersects polygon, or hex centre intersects polygon) and (b) lines (hex... ? line)

When talking about API, I think it makes sense to return all grid geometries that intersect with shape_to_cover so that the end user can decide what to do with them. For example, I might want to then calculate what percentage of the area of shape_to_cover is inside each hex to aggregate some value proportionally.
As Darafei said it's trivial to use the centroid / first point / any point of a geometry if you just want one.

ST_HexagonGrid(float8 size, geometry shape_to_cover, [geometry lower_left_origin]) returns setof geometry_grid

I'd say yes to the optional origin, but can you clarify what's lower_left of an hexagon? Is it the bottom left point of the hex shape (only valid if the hex has parallel lines to the equator, which it appear to do based on the example images)?

@pramsey
Copy link
Member Author

pramsey commented Sep 4, 2019

For the purpose of constructing the hex grid the lower left is the left-most point of the bottom edge of the hexagon (hexagons oriented with top/bottom lines horizontal).

The centroid trick doesn't work, BTW, because you don't have the hexagons to run centroid on until the calculation is finished... I mean, you can post-process them, if that's what you're talking about?

@Algunenano
Copy link
Member

The centroid trick doesn't work, BTW, because you don't have the hexagons to run centroid on until the calculation is finished... I mean, you can post-process them, if that's what you're talking about?

I meant that if you want just one grid geometry per polygon you can take the polygon's centroid / oint on surface / "whatever you consider proper" and use that as input instead of the whole polygon. Same idea for lines.

@Algunenano
Copy link
Member

Algunenano commented Sep 4, 2019

Another thing: Seeing Uber's H3 I've noticed that the orientation of the hexagons is different depending on the zoom level. Should we consider using a geometry (which includes size, origin and orientation) instead of size + origin coordinates? How complex would that API be?

Also, if the function accepts a geometry (which could be either a square or a hex) we would only need a single function for both, right?

@pramsey
Copy link
Member Author

pramsey commented Sep 4, 2019

Re poly/hex I mean the opposite: I want N hexes per polygon, but I don't want any one hex to fall in more than one polygon.

@pramsey
Copy link
Member Author

pramsey commented Sep 4, 2019

Re altering orientations, I feel like that's not really a needed knob.

@dr-jts
Copy link
Contributor

dr-jts commented Sep 4, 2019

+1 for a way of getting the cell indices which cover a geometry. Actually ++1 for handling points, since that's easy.

Lines and polygons are of course harder. Seems like that is effect building a rasterizer. But at least rasterization can be done more efficiently than an external table intersection. I.e. for polygons can compute the set of cells intersected by the border, which then determines the "scanlines" of cells which intersect the interior.

@strk strk closed this Oct 27, 2019
@strk
Copy link
Member

strk commented Oct 28, 2019

Please rebase to master branch

@strk strk reopened this Oct 28, 2019
@strk strk changed the base branch from svn-trunk to master October 28, 2019 06:55
@pramsey pramsey closed this Oct 28, 2019
@pramsey pramsey deleted the svn-trunk-hexgrid branch October 28, 2019 16:42
@pramsey pramsey restored the svn-trunk-hexgrid branch October 28, 2019 16:51
@pramsey pramsey reopened this Oct 28, 2019
@robe2
Copy link
Member

robe2 commented Nov 8, 2019

are we going anywhere with this or are you all just going to chat about it forever?

@pramsey
Copy link
Member Author

pramsey commented Nov 8, 2019

People requested lots of changes and blah blah blah, so I can't just flat commit it, and also I don't have time right now to address comments, so just cool your jets, it'll happen.

@robe2
Copy link
Member

robe2 commented Nov 8, 2019

@pramsey I'm just doing my part of being the human stalebot pointer outer :)

@nyurik
Copy link
Contributor

nyurik commented Nov 14, 2019

@pramsey is your hex algo the same as created by Uber's H3 (code)? To my knowledge, that's the most well known implementation, but I have also seen geohex.net (java code) and I might have seen another triangle-based one somewhere I can't find at the moment.

Bounds is no longer the domain of the grid, instead the grid always
starts at origin 0,0 in whatever SRS the bounds are. Bounds is the
rectangle-to-be-filled with shapes, and only those shapes needed to
fill the bounds are instantiated, along with their cell coordinates.
@pramsey
Copy link
Member Author

pramsey commented Jan 16, 2020

No, this isn't H3. It's a simple, configurable planar tiling. H3 is a set tiling of the globe, with fixed keys. Good for our buddies with key/value stores.

@nyurik
Copy link
Contributor

nyurik commented Jan 16, 2020

No, this isn't H3. It's a simple, configurable planar tiling. H3 is a set tiling of the globe, with fixed keys. Good for our buddies with key/value stores.

:) Would it make sense to use a globe-focused hexing schema in PostGIS? Or are there benefits of using a planar one? Triangle-grid sounds interesting too btw, but triangles are a bit easier to wrap around the globe.

@pramsey
Copy link
Member Author

pramsey commented Jan 16, 2020

As a proof-of-utility, try the existing extension https://github.com/bytesandbrains/h3-pg and see if it's useful. It has casts to postgis geometry, so it should be useful in and of itself.

@pramsey
Copy link
Member Author

pramsey commented Jan 16, 2020

Merged to master

@pramsey pramsey closed this Jan 16, 2020
Copy link
Member

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I guess that you decided to rush and merge this to avoid it stagnating again.

*
**********************************************************************
*
* Copyright 2018 Paul Ramsey <pramsey@cleverelephant.ca>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2020?

static const double hex_y[] = { 0.0, -1*H, -1*H, 0.0, H, H, 0.0};

static LWGEOM *
hexagon(double origin_x, double origin_y, double size, int cell_i, int cell_j, uint32_t srid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

srid should be int32_t since it can be negative

/* ********* ********* ********* ********* ********* ********* ********* ********* */

static LWGEOM *
square(double origin_x, double origin_y, double size, int cell_i, int cell_j, uint32_t srid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the srid type


/* ********* ********* ********* ********* ********* ********* ********* ********* */

typedef enum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd better to move the internal function to liblwgeom and leave only the PG implementation here, but I don't have a strong opinion on it.

{
state->done = true;
}
return !state->done;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value is never used, so I guess it isn't necessary.

switch (state->cell_shape)
{
case SHAPE_HEXAGON:
lwgeom = hexagon(0.0, 0.0, state->size, state->i, state->j, state->srid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth it to support setting the origin also in the ST_*Grid functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I contemplated, but it didn't seem necessary so I leave as a future option if someone requires it.


/* stuff done on every call of the function */
funcctx = SRF_PERCALL_SETUP();
newcontext = funcctx->multi_call_memory_ctx;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm missing something but I don't know why this assignment is necessary if newcontext isn't used after it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it might be copied from LWGEOM_dumppoints which also does a similar dead store.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's true

_COST_MEDIUM;

-- Availability: 3.0.0
CREATE OR REPLACE FUNCTION ST_HexagonGrid(size float8, bounds geometry, OUT geom geometry, OUT i integer, OUT j integer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the OUT parameters in ST_HexagonGrid and ST_SquareGrid needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my new trick to avoid having to declare "utility relations" just to support tuple-valued outputs. Works like a hot damn: declare output as anonymous record but use OUT parameters to provide name and types for the record members. If we want, we could excise all these old friends like geometry_dump valid_detail and such.

@Algunenano
Copy link
Member

Also, NEWs are missing.

@pramsey
Copy link
Member Author

pramsey commented Jan 17, 2020

See #538

@pramsey pramsey deleted the svn-trunk-hexgrid branch May 7, 2020 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants