Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: quantize polygons/snap shapes to grid? #1727

Open
ljwolf opened this issue Dec 8, 2020 · 4 comments
Open

ENH: quantize polygons/snap shapes to grid? #1727

ljwolf opened this issue Dec 8, 2020 · 4 comments

Comments

@ljwolf
Copy link
Member

ljwolf commented Dec 8, 2020

Is your feature request related to a problem?

In spatial applications, "quantization" entails removing the significant digits from coordinates and removing the duplicated points. This is often an analogue to simplification: quantization makes it easy to find vertices to remove, and snaps the geometries to a grid.

Quantization (as I understand it) is functionally the same as ST_SnapToGrid in postgis.

Describe the solution you'd like

I'm thinking of a quantize method on a GeoSeries that mirrors the functionality of simplify

API breaking implications

There should be no API breakage here.

Describe alternatives you've considered

pygeos may be the reasonable place to implement the actual algorithm or wrap existing GEOS (xref discussion in #1724), since that's a natural analogue to . Regardless of where the implementation lives, it'd be good to expose it on the GeoSeries.

Additional context

An example implementation without pygeos.snap & without de-duplication:

  1. transforms the input geometries to a grid with minimum side length n_bins
  2. rounds the coordinates to the nearest integer
  3. re-transforms the data back to the original scale.

An implementation of this would be:

def digitize(geoms, n_bins=100, validate=True):
    n_bins = n_bins
    bounds = pygeos.total_bounds(geoms)
    mapscale = numpy.array([bounds[2]-bounds[0], 
                            bounds[3]-bounds[1]])
    aspect = numpy.divide(*mapscale)
    
    # constrain the resolution in the shortest dimension
    if aspect < 1: # if tall, constrain width
        digibounds = (0,0, n_bins, int(n_bins/aspect))
    else: # elif high or equidistant, constrain height
        digibounds = (0,0, int(n_bins*aspect), n_bins)
    digiscale = numpy.array([digibounds[2]-digibounds[0], 
                             digibounds[3]-digibounds[1]])
    
    def transformer(target_scale=digiscale, 
                    target_location=0,
                    source_scale=mapscale, 
                    source_location=bounds[0:2]):
        return lambda geom: (geom - source_location)/source_scale * target_scale + target_location
    
    forward_transform = transformer()
    reverse_transform = transformer(target_scale=mapscale, 
                                    target_location = bounds[0:2], 
                                    source_scale=digiscale,
                                    source_location=0)
    
    gridded = pygeos.apply(pygeos.apply(geoms, forward_transform),
                           numpy.round)
    digitized = pygeos.apply(gridded, reverse_transform)
    return pygeos.make_valid(digitized) if validate else digitized

which yields results like this:

import libpysal, geopandas, pygeos, numpy
df = geopandas.read_file(libpysal.examples.get_path('NAT.shp'))

f,ax = plt.subplots(2,2,figsize=(10,10))
for i in range(4):
    df.assign(gridded=digitized)\
           .set_geometry('gridded').plot(facecolor='snow',
                                         edgecolor='k', 
                                         linewidth=.25, ax=ax.flatten()[i])
    df.plot(ax=ax.flatten()[i], facecolor='none', edgecolor='teal', linewidth=.25)
    ax.flatten()[i].axis('off')
ax[0,1].axis([-92, -89, 32, 35])
ax[0,1].set_title("MS-LA Border")
ax[1,0].axis([-90, -87, 40, 43])
ax[1,0].set_title("Chicago, IL")
ax[1,1].axis([-78, -75, 38, 40])
ax[1,1].set_title("Washington, DC")

quantize

@brendan-ward
Copy link
Member

I think the set_precision function once available in pygeos (>= 0.9) could definitely help with this.

As I understand it, the precision defines a grid size used by GEOS, from the origin (0,0) of the planar projection. It can be a floating-point or integer value, and will cause rounding of coordinates (and subsequent geometry collapse). It is very fast - but making geometries valid again after can be slow.

Given a df similar to above (US counties, in CONUS Albers projection, reduced to an arbitrary precision that came sort of close to above; sorry didn't have time to directly replicate the above)

>>> precision = 10000
>>> df.geometry = pygeos.make_valid(pygeos.set_precision(df.geometry.values.data, precision))

(Note: real implementation would need to remove empty geometries, etc)

similar area of MS - LA Border
image

The missing part compared to above is figuring out the appropriate precision based on the target number of bins, but that seems primarily a scaling problem.

In terms of API, I think we could at least expose set_precision in GeoPandas. This would be easy where we can leverage pygeos; the absence of it in Shapely at the moment is a bit more of a problem (perhaps it can be conditional and only available with pygeos backend, until Shapely 2.0?).

A companion quantize function could then take an input grid size (number of bins on smallest dimension) and calculate the appropriate precision, then call set_precision.

@martinfleis
Copy link
Member

On top of this it would be nice to also expose topoquantize from topojson (https://mattijn.github.io/topojson/example/settings-tuning.html#topoquantize) alongside topological simplification we discussed in #1387.

@ljwolf
Copy link
Member Author

ljwolf commented Dec 8, 2020

Great if set_precision can do this (which it looks very similar)! We'd need to figure out the relationship between the two.

On topoquantize, the quantize I've written above is exactly their strategy but using pygeos, rather than operating directly on point arrays. I'd hoped that'd be more performant?

Both rescale the geometries to a grid, round coordinates to the nearest integer, and then re-transform the coordinates back to the original space.

I haven't yet dug into dropping adjacent points, but that should be pretty immediate from here.

My concern with topojson was mattijn/topojson#110, which I have yet to reproduce successfully. I do know that I haven't seen this behavior as I have implemented these methods in pygeos?
topoquantize_and_quantize

@martinfleis
Copy link
Member

On topoquantize, the quantize I've written above is exactly their strategy but using pygeos, rather than operating directly on point arrays.

It is not. The point of topoquantize is that you first generate topology and then quantize edges. That way you ensure that topology is preserved after quantize. There may be cases when this gets broken (probably not many though). It is the same as simplify vs toposimplify. But I see that is is currently a bit buggy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants