Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions to Clip Point, Line and Polygon Data Like Those that exist in other spatial GIS tools #821

Closed
lwasser opened this issue Sep 14, 2018 · 35 comments · Fixed by #1128
Milestone

Comments

@lwasser
Copy link
Contributor

lwasser commented Sep 14, 2018

Hi geopandas dev team:

The Issue / Question That I have

I am trying to come up with a consistent way to "clip" data that is in line with a typical GIS (think arcgis or qgis) workflow as follows:

  • Clip takes a point, line or polygon feature to be clipped and a boundary polygon as the clipping extent
  • Clip outputs a point, line or polygon feature that has ONLY features contained in the boundary polygon. Each feature if it's a line or poly is thus "CLIPPED" to that extent if that is required.
  • Clip maintains the attributes of the clipped point, line or polygon features

On the back end - clipping points, lines or polygons is actually quite different. On the front end however - to a user - it is the "same" input and output.
input: some (shapefile) i want to clip to an extent
output: some clipped shapefile

What I tried initially - overlay()

Initially i tried to use overlay() with the how="intersection" argument. however that approach only seems to support polygon features. Did I do something wrong? I spent some time on this.

What I did / Tried

I am trying to create a simpler way for my students to "clip" a point, line or polygon layer using a boundary polygon object that exposes less of the back end differences between these object types when processed.

Below i've created a set of functions to make up a parent function that clips points, lines or polygons. I wondered if you could have a look at my code to help me determine if my approach is the most efficient one. I haven't built any tests -- yet but will.

Question 2 -- If applicable would this fit in geopandas? Or is there a better approach?

My next question if you don't mind and if i am able to find an approach that fits the project -- is if the geopandas project is open to such a function housed in geopandas? It is a task I and my colleagues do often!

Any feedback is appreciated. I do still owe you another PR for more documentation surrounding dissolve as well.

The Functions I Wrote

# Create function to clip point data using geopandas


def clip_points(shp, clip_obj):
    '''
    Docs Here
    '''

    poly = clip_obj.geometry.unary_union
    return(shp[shp.geometry.intersects(poly)])

# Create function to clip line and polygon data using geopandas


def clip_line_poly(shp, clip_obj):
    '''
    docs
    '''

    # Create a single polygon object for clipping
    poly = clip_obj.geometry.unary_union
    spatial_index = shp.sindex

    # Create a box for the initial intersection
    bbox = poly.bounds
    # Get a list of id's for each road line that overlaps the bounding box and subset the data to just those lines
    sidx = list(spatial_index.intersection(bbox))
    shp_sub = shp[shp.index.isin(sidx)]

    # Clip the data - with these data
    ne_roads_sub['geometry'] = ne_roads_sub.intersection(poly)

    # Return the clipped layer with no null geometry values
    return(ne_roads_sub.geometry.notnull())

## I just updated the above as I realized I could do this with one less line by updating the geometry col

# Final clip function that handles points, lines and polygons


def clip_shp(shp, clip_obj):
    '''
    '''
    if shp["geometry"][0].type == "Point":
        return(clip_points(shp, clip_obj))
    else:
        return(clip_line_poly(shp, clip_obj))
@lwasser lwasser changed the title Functions to Clip Data Functions to Clip Point, Line and Polygon Data Like Those that exist in other spatial GIS tools Sep 14, 2018
@jdmcbr
Copy link
Member

jdmcbr commented Sep 14, 2018

@lwasser I've not run the above code, so maybe I'm missing something, but it isn't clear where this differs from performing an overlay with how="intersection".

@lwasser
Copy link
Contributor Author

lwasser commented Sep 14, 2018

Hi @jdmcbr I added some clarification above as well. I should have mentioned this too. I spent some time looking at overlay. It appears however that overlay only supports intersecting polygons. And yet i want to be able to do this with points and lines as well and for the implementation (on the user end) of a "clip" operation to be uniform. Please let me know if I missed something!

If i implemented overlay incorrectly please do let me know. I looked through the docs and code and it seems it wants polygons only.

@waldnerf
Copy link

Hey @lwasser
Thanks for sharing, I've had the same issue.
I tried running your clip_line_poly function but ne_roads_sub is not defined so it does not run...
Could you update it?
Cheers
Franz

@lwasser
Copy link
Contributor Author

lwasser commented Aug 19, 2019

hey @waldnerf i ended up including the code to clip here in earthpy:

https://github.com/earthlab/earthpy/blob/master/earthpy/clip.py
vignette example in the docs:
https://earthpy.readthedocs.io/en/latest/gallery_vignettes/plot_clip.html#sphx-glr-gallery-vignettes-plot-clip-py

it handles points, lines and polygons and should handle multi features too.
i think we should close this issue as well to get it off of the geopandas maintainers plates so i will do so but let me know if that implementation doesn't make sense to you!! cheers!!

@lwasser lwasser closed this as completed Aug 19, 2019
@martinfleis
Copy link
Member

I will reopen this issue as it is something I feel geopandas should support. Preferably as an enhancement of overlay to work on all types of geometries.

@martinfleis martinfleis reopened this Aug 19, 2019
@lwasser
Copy link
Contributor Author

lwasser commented Aug 19, 2019

that would be great @martinfleis !! if you guys implement it please let me know so i can remove it from earthpy !! then i don't have to support it! i put it over there as we needed it in our classes and it hadn't been addressed. feel free to have a look at the earthpy code. we tested out a few different approaches and found that an rtree approach was the fastest ... however you may have other approaches. clip slows down a lot with large datasets and would probably be optimal in parallel if that were available.

the biggest issue coming from a GIS / User perspective that i see if that overlay only works on polygons. And yet most GIS type users (people working with spatial data) will want to clip not only polygons but also lines and points. Normally in gis tools clip is not specific to the geometry type - it works on any geometry type. that is what a GIS user might expect. from a developer perspective however, clip does have to be managed different for lines and points sometimes.. and then ofcourse multi features...

@martinfleis
Copy link
Member

@lwasser check this branch https://github.com/martinfleis/geopandas/tree/overlay_geoms
I have removed all blocks requiring Polygons to be passed. So it takes LineStrings, Points.. The main issue blocking it so far is use of .buffer(0) to fix geometries at some places, which makes empty geometries on Linestrings and Points. But I don't think we actually need that, or we can apply it only on (Multi)Polygons. Easy so far.

I'll play with it a bit more to figure what could be the issue. You are welcome to do so as well.

@jorisvandenbossche
Copy link
Member

I think there are two separate issues here:

  • overlay can definitely be improved to work with other geometries than polygons. See also the last bullet point in Overview issue: overlay operation #706, and I seem to remember we were recently talking about this (maybe on gitter?).

  • Apart from overlay, I think it is still valuable to have a separate clip function. It is conceptually simpler as a full overlay (you "just" want to clip the left geodataframe by the right polygon, and not "combine" it with the right geodataframe). See also Clipping example for docs #956 and Fast spatial filtering #921.
    @lwasser I think the code (and example) you show above from earthpy looks really good (that is also more or less how I would implement a clip function). Would you like to contribute this clip module to geopandas?

@martinfleis
Copy link
Member

we were recently talking about this

I remember that. It does makes sense to extend its functionality. I have found some corner cases these changes bring, I'll open a PR so we can discuss that in relation to the actual code.

it is still valuable to have a separate clip function

You are right about having clip. However, I think it will be better to make it based on _overlay_intersection we already have and which I assume will be significantly faster than earthpy's implementation (due to unary_union). I have found out that it is, in fact, faster to even split clipping geometry in smaller parts in some cases than making union, when using rtree.

@jorisvandenbossche
Copy link
Member

However, I think it will be better to make it based on _overlay_intersection

The problem with the overlay one is that it will also split your original geometries in the left dataframe if they are crossed by an internal border of the right dataframe (so not just by the extent). Which is something we don't want for the clip behaviour (at least, that is how I currently imagine the clip behaviour: it's only clipping the original geometries by the full extent of the other geometry/geodataframe), so we would need to do a dissolve again after the _overlay_intersection.

But the performance issue is interesting to check to see how it compares (the earthpy variant should also benefit the optimized vectorized operations. And in general we should try to include spatial index in those default operations as well, but that's another issue :)).

@martinfleis
Copy link
Member

martinfleis commented Aug 20, 2019

Hmm, you are right. Dissolve is a way, but it changes original geometry to multi geometry. This snippet is adapted _overlay_intersection.

def clip(left, right):
    """
    Clip left gdf by right gdf.
    """
    df1 = left.copy()
    df2 = right.copy()
    # Spatial Index to create intersections
    spatial_index = df2.sindex
    bbox = df1.geometry.apply(lambda x: x.bounds)
    sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    # Create pairs of geometries in both dataframes to be intersected
    nei = []
    for i, j in enumerate(sidx):
        for k in j:
            nei.append([i, k])
    if nei != []:
        pairs = pd.DataFrame(nei, columns=['__idx1', '__idx2'])
        left = df1.geometry.take(pairs['__idx1'].values)
        left.reset_index(drop=True, inplace=True)
        right = df2.geometry.take(pairs['__idx2'].values)
        right.reset_index(drop=True, inplace=True)
        intersections = left.intersection(right)
        intersections = intersections.apply(
         lambda g: g.buffer(0) if g.type in ['Polygon', 'MultiPolygon']
         else g)

        # only keep actual intersecting geometries
        pairs_intersect = pairs[~intersections.is_empty]
        geom_intersect = intersections[~intersections.is_empty]

        # merge data for intersecting geometries
        df1 = df1.reset_index(drop=True)
        df2 = df2.reset_index(drop=True)
        dfinter = pairs_intersect.merge(
            df1.drop(df1._geometry_column_name, axis=1),
            left_on='__idx1', right_index=True)
    
        gdf = GeoDataFrame(dfinter, geometry=geom_intersect, crs=df1.crs)
        result = gdf.dissolve('__idx1')
        result.reset_index(drop=True, inplace=True)
        result.drop(['__idx2'], axis=1, inplace=True)

        return result
    else:
        return GeoDataFrame(
            [],
            columns=list(set(df1.columns)),
            crs=df1.crs)
polys1 = gpd.GeoSeries([Polygon([(0,0), (2,0), (2,2), (0,2)]),
                        Polygon([(2,2), (4,2), (4,4), (2,4)]),
                        Polygon([(2,0), (4,0), (4,2), (2,2)])])
df1 = df1 = gpd.GeoDataFrame({'geometry': polys1, 'df1':[1,2,3]})
lines = gpd.GeoSeries([LineString([(-1, -1), (6, 6)]), LineString([(2.5, -1), (2.5, 6)])])
df2 = gpd.GeoDataFrame({'geometry': lines, 'df2':['a', 'b']})

clip(df2, df1)
                                           geometry df2
0          MULTILINESTRING ((0 0, 2 2), (2 2, 4 4))   a
1  MULTILINESTRING ((2.5 0, 2.5 2), (2.5 2, 2.5 4))   b

@jorisvandenbossche
Copy link
Member

Also, I think for an initial version figuring out the desired functionality is more important (and for that the earthpy's code seems good to reason about), and we can still profile / optimize in a second step.

@martinfleis
Copy link
Member

Trying earthpy, it seems that the desired functionality is the one @lwasser implemented, not the one above. Geometry should be kept intact if it's fully within clipping polygon.

@lwasser
Copy link
Contributor Author

lwasser commented Aug 20, 2019

@jorisvandenbossche i'd be so happy to contribute our clip function to geopandas! i spent a lot of time optimizing it and we just recently added support for multi part features. i can also provide tests and a vignette. How would you like for me to submit this?

PR with the code and tests and vignette?

and then i do think a future implementation would be to support parallel processing as it is still slow with big datasets BUT i did test multiple approaches the one in the code is the fastest. this is awesome!

also want to ping @mbjoseph and @nkorinek who worked on the modules as well :)

@jorisvandenbossche
Copy link
Member

Yes, a PR with all three of code, tests and docs is perfect!

@lwasser
Copy link
Contributor Author

lwasser commented Aug 20, 2019

wonderful.
and just for reference -- this issue: earthlab/earthpy#112 is where we tested clip the way i implemented it vs using overlay. @mbjoseph ran the tests because i wasn't sure if the rtree approach was fastest. it really was as the data got larger and larger.

@jorisvandenbossche i'll have a PR to you in the next few days and will ping you when it's submitted. i have a few things to work on and a lot of meetings this week but i'll get it to you!

more soon!

@lwasser
Copy link
Contributor Author

lwasser commented Sep 16, 2019

awesome @jorisvandenbossche et al... can you please tell me what the review process is here for my pr? i'm still working on it although all tests are passing now so i'm getting close. Should i just respond to all comments? Or should I wait because other maintainers may have other comments?

Thank you so much!!

@maning
Copy link

maning commented Jan 21, 2020

I'm trying both solutions above. My polygon are composed several adjacent polygons. I want to preserve the geometry that intersects with each polygon. For example, in a line intersects with two polys, 2 linestring will be created. Is there a solution that avoids dissolving the polygon before clipping?

@maning
Copy link

maning commented Jan 21, 2020

What I have in mind is the Intersection process in QGIS.
Screenshot 2020-01-21 17 41 25

@jorisvandenbossche
Copy link
Member

@maning
Copy link

maning commented Jan 21, 2020

@jorisvandenbossche but the overlay function only works for polygon which is why I landed in this ticket searching for a solution.

@jorisvandenbossche
Copy link
Member

Sorry, missed that part of your description. There is an open PR that adds that functionality to overlay: #1110. It would be very welcome to try that out and give feedback!

@maning
Copy link

maning commented Jan 21, 2020

Wow! I tested the line with poly intersection and it works! The output is the same as what I get in QGIS on a small area. Thanks!

@jorisvandenbossche
Copy link
Member

@maning Thanks for testing!

@DchemistRae
Copy link

DchemistRae commented Feb 4, 2020

Thank you for this useful tool. Am currently stuck at getting a return GeoDataFrame without the attributes from the clip_obj data frame. Can someone help out??? I want the information from the clip_obj not just its geometry. <<<<#The returned GeoDataFrame is a clipped subset of shp that intersects with clip_obj.# <<<<<<----- This here is the problem. Can I get a Geodataframe with attributes from the clip_obj as well?

@martinfleis
Copy link
Member

Can I get a Geodataframe with attributes from the clip_obj as well?

@DchemistRae Then you are looking for overlay, not clip.

@DchemistRae
Copy link

Thank you for the insight. However, overlay cannot be applied on linestrings. And using buffer messes up my geometry

@martinfleis
Copy link
Member

@DchemistRae There is a PR #1110 allowing all geometry types in overlay, you can try that and let us know if it works as expected.

@DchemistRae
Copy link

DchemistRae commented Feb 4, 2020

@DchemistRae There is a PR #1110 allowing all geometry types in overlay, you can try that and let us know if it works as expected.

Please how do I use this new feature? I updated the pandas package. But I still got the same multipolygon error.

@martinfleis
Copy link
Member

@DchemistRae You'll have to install dev version of GeoPandas from that specific branch.

pip install git+https://github.com/martinfleis/geopandas.git@overlay_geoms

@DchemistRae
Copy link

How can i install since I use conda? And I suppose I will have to install Git and set path?

@martinfleis
Copy link
Member

You will have to install git, conda install git and then you can use pip to install dev version of geopandas in conda environment. Ideally, you should follow instruction in our contributing guide to set-up environment properly. Then in step 4 install geopandas using pip as above.

@DchemistRae
Copy link

All installed, overlay gave an attribute error. Both GeoDataFrames have geometry type Linestrings.

AttributeError: module 'pandas.api.indexers' has no attribute 'check_bool_array_indexer'

@martinfleis
Copy link
Member

@DchemistRae This is not the right place for this discussion, but you cannot clip one Linestring layer with another Linestring layer, that does not make geometrically much sense as you expect area for clip. Can you report that AttributeError with a few more details (pandas version, actual code used, sample data) in #1110?

@lwasser
Copy link
Contributor Author

lwasser commented Feb 7, 2020

yay closed via #1128 !!! :) 🎆

@jorisvandenbossche jorisvandenbossche added this to the 0.7.0 milestone Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants