Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'GIS-like' operations #38

Closed
carsonfarmer opened this issue Oct 16, 2013 · 5 comments
Closed

'GIS-like' operations #38

carsonfarmer opened this issue Oct 16, 2013 · 5 comments

Comments

@carsonfarmer
Copy link
Contributor

Currently, GeoPandas has several 'overlay' operations (such as intersect, difference, union, etc), which in their current implementation, perform one-to-one spatial overlays on the aligned GeoSeries. While this is useful (particularly if the Indexes of the two GeoSeries' have some useful meaning), many GIS users may find this behavior slightly confusing. Instead, a user may expect to perform a one-to-many overlay comparison between the calling GeoSeries and the input GeoSeries (e.g., for each geometry in the calling GeoSeries, 'difference' it with all 'other' geometries and return the resultant 'differenced' geometry). For this, a spatial index is important to avoid unnecessary overlay comparisons.

A potential API for this type of operation may be something like:

def difference(self, other, by_geom=True):
    """
    Return the set-theoretic difference of each geometry with 
    *other*. This function is used to 'erase' parts of the 
    input geometies that overlap the other geometries.

    Parameters
    ----------
    other: GeoSeries or BaseGeometry
        Series of difference geometry objects
    by_geom: bool
        Should difference be applied by individual geometries,
        or across the whole GeoSeries? `by_geom`=False is the 
        normal GIS-type operation between 'layers', whereas 
        `by_geom`=True is equivalent to geoms - other
    """
...

These types of operations would be greatly enhanced (enabled) by a comprehensive spatial index implementation. This would enable something like obj1.geo_align(obj2) to be applied before the overlay operation when by_geom=False.

@kjordahl
Copy link
Member

This would be powerful. I might call the parameter something other than by_geom; perhaps something like all_geoms? We could implement this API immediately (albeit inefficiently) by using unary_union on the other geometry, couldn't we?

@carsonfarmer
Copy link
Contributor Author

@kjordahl yes and no: The standard GIS 'layer-style' way of working is based on something like this:

for each geom in layer1:
    for each geom in layer2:
        difference layer1 geom by layer2 geom
        update layer1 geom

which is equivalent to using unary_union on the other geometry first. However, when it comes to things like intersect or union, things are slightly different. Now, additional geometries might be created, because parts of layer1 geom might intersect with several layer2 geoms, leading to a larger number of geoms in the output:

for each geom in layer1:
    for each geom in layer2:
        intersect layer1 geom with layer2 geom
        add intersect to output

Having said all that, since we aren't necessarily working in the realm of layers with GeoPandas, perhaps we should drop this conceptual model and adopt an API that is equivalent to a unary_union on other first (with more efficient use of spatial indices down the line). That way instead of saying "intersect this layer with this other layer", we are saying "intersect each of these geometries with these other ones", which I think fits the pandas way of doing things better. Alternatively, if we wanted to stick with a layer concept, then we could always just return a geometry collection for things like intersect and union? The only problem with this is that fewer operations support this, and it would definitely make i/o more difficult for the user.
Long story short: yes, we can do this now if we are willing to do things differently than a GIS (which I think we should be willing to do). Also, having the control afforded by all_geoms means we should probably still be able to do the GIS 'layer-approach' with some clever index alignments..?

@carsonfarmer
Copy link
Contributor Author

If we decide which way to go, I'll try to submit a pull request for some of this over the weekend.

@perrygeo
Copy link
Contributor

@cfarmer Do the spatial join and overlay functionality cover everything in this issue? Or are there additional "GIS-like" operations/concepts that we should look at?

@perrygeo
Copy link
Contributor

Closing for now. If something's missing from sjoin and overlay, let's bring up a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants