ENH: quantize polygons/snap shapes to grid? #1727

ljwolf · 2020-12-08T00:11:05Z

Is your feature request related to a problem?

In spatial applications, "quantization" entails removing the significant digits from coordinates and removing the duplicated points. This is often an analogue to simplification: quantization makes it easy to find vertices to remove, and snaps the geometries to a grid.

Quantization (as I understand it) is functionally the same as ST_SnapToGrid in postgis.

Describe the solution you'd like

I'm thinking of a quantize method on a GeoSeries that mirrors the functionality of simplify

API breaking implications

There should be no API breakage here.

Describe alternatives you've considered

pygeos may be the reasonable place to implement the actual algorithm or wrap existing GEOS (xref discussion in #1724), since that's a natural analogue to . Regardless of where the implementation lives, it'd be good to expose it on the GeoSeries.

Additional context

An example implementation without pygeos.snap & without de-duplication:

transforms the input geometries to a grid with minimum side length n_bins
rounds the coordinates to the nearest integer
re-transforms the data back to the original scale.

An implementation of this would be:

def digitize(geoms, n_bins=100, validate=True):
    n_bins = n_bins
    bounds = pygeos.total_bounds(geoms)
    mapscale = numpy.array([bounds[2]-bounds[0], 
                            bounds[3]-bounds[1]])
    aspect = numpy.divide(*mapscale)
    
    # constrain the resolution in the shortest dimension
    if aspect < 1: # if tall, constrain width
        digibounds = (0,0, n_bins, int(n_bins/aspect))
    else: # elif high or equidistant, constrain height
        digibounds = (0,0, int(n_bins*aspect), n_bins)
    digiscale = numpy.array([digibounds[2]-digibounds[0], 
                             digibounds[3]-digibounds[1]])
    
    def transformer(target_scale=digiscale, 
                    target_location=0,
                    source_scale=mapscale, 
                    source_location=bounds[0:2]):
        return lambda geom: (geom - source_location)/source_scale * target_scale + target_location
    
    forward_transform = transformer()
    reverse_transform = transformer(target_scale=mapscale, 
                                    target_location = bounds[0:2], 
                                    source_scale=digiscale,
                                    source_location=0)
    
    gridded = pygeos.apply(pygeos.apply(geoms, forward_transform),
                           numpy.round)
    digitized = pygeos.apply(gridded, reverse_transform)
    return pygeos.make_valid(digitized) if validate else digitized

which yields results like this:

import libpysal, geopandas, pygeos, numpy
df = geopandas.read_file(libpysal.examples.get_path('NAT.shp'))

f,ax = plt.subplots(2,2,figsize=(10,10))
for i in range(4):
    df.assign(gridded=digitized)\
           .set_geometry('gridded').plot(facecolor='snow',
                                         edgecolor='k', 
                                         linewidth=.25, ax=ax.flatten()[i])
    df.plot(ax=ax.flatten()[i], facecolor='none', edgecolor='teal', linewidth=.25)
    ax.flatten()[i].axis('off')
ax[0,1].axis([-92, -89, 32, 35])
ax[0,1].set_title("MS-LA Border")
ax[1,0].axis([-90, -87, 40, 43])
ax[1,0].set_title("Chicago, IL")
ax[1,1].axis([-78, -75, 38, 40])
ax[1,1].set_title("Washington, DC")

The text was updated successfully, but these errors were encountered:

brendan-ward · 2020-12-08T02:31:56Z

I think the set_precision function once available in pygeos (>= 0.9) could definitely help with this.

As I understand it, the precision defines a grid size used by GEOS, from the origin (0,0) of the planar projection. It can be a floating-point or integer value, and will cause rounding of coordinates (and subsequent geometry collapse). It is very fast - but making geometries valid again after can be slow.

Given a df similar to above (US counties, in CONUS Albers projection, reduced to an arbitrary precision that came sort of close to above; sorry didn't have time to directly replicate the above)

>>> precision = 10000
>>> df.geometry = pygeos.make_valid(pygeos.set_precision(df.geometry.values.data, precision))

(Note: real implementation would need to remove empty geometries, etc)

similar area of MS - LA Border

The missing part compared to above is figuring out the appropriate precision based on the target number of bins, but that seems primarily a scaling problem.

In terms of API, I think we could at least expose set_precision in GeoPandas. This would be easy where we can leverage pygeos; the absence of it in Shapely at the moment is a bit more of a problem (perhaps it can be conditional and only available with pygeos backend, until Shapely 2.0?).

A companion quantize function could then take an input grid size (number of bins on smallest dimension) and calculate the appropriate precision, then call set_precision.

martinfleis · 2020-12-08T09:39:04Z

On top of this it would be nice to also expose topoquantize from topojson (https://mattijn.github.io/topojson/example/settings-tuning.html#topoquantize) alongside topological simplification we discussed in #1387.

ljwolf · 2020-12-08T14:04:33Z

Great if set_precision can do this (which it looks very similar)! We'd need to figure out the relationship between the two.

On topoquantize, the quantize I've written above is exactly their strategy but using pygeos, rather than operating directly on point arrays. I'd hoped that'd be more performant?

Both rescale the geometries to a grid, round coordinates to the nearest integer, and then re-transform the coordinates back to the original space.

I haven't yet dug into dropping adjacent points, but that should be pretty immediate from here.

My concern with topojson was mattijn/topojson#110, which I have yet to reproduce successfully. I do know that I haven't seen this behavior as I have implemented these methods in pygeos?

martinfleis · 2020-12-08T16:48:19Z

On topoquantize, the quantize I've written above is exactly their strategy but using pygeos, rather than operating directly on point arrays.

It is not. The point of topoquantize is that you first generate topology and then quantize edges. That way you ensure that topology is preserved after quantize. There may be cases when this gets broken (probably not many though). It is the same as simplify vs toposimplify. But I see that is is currently a bit buggy.

ljwolf added the enhancement label Dec 8, 2020

This was referenced Dec 12, 2020

Investigate the effects of low topoquantization epsilon values mattijn/topojson#110

Closed

Investigate speedier options for quantization mattijn/topojson#118

Closed

mattijn mentioned this issue Sep 8, 2022

enh: include features that are possible with shapely 2.0 mattijn/topojson#195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: quantize polygons/snap shapes to grid? #1727

ENH: quantize polygons/snap shapes to grid? #1727

ljwolf commented Dec 8, 2020

brendan-ward commented Dec 8, 2020

martinfleis commented Dec 8, 2020

ljwolf commented Dec 8, 2020 •

edited

martinfleis commented Dec 8, 2020

ENH: quantize polygons/snap shapes to grid? #1727

ENH: quantize polygons/snap shapes to grid? #1727

Comments

ljwolf commented Dec 8, 2020

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

brendan-ward commented Dec 8, 2020

martinfleis commented Dec 8, 2020

ljwolf commented Dec 8, 2020 • edited

martinfleis commented Dec 8, 2020

ljwolf commented Dec 8, 2020 •

edited