Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shapes function returns invalid geometries #1126

Closed
nickwg03 opened this issue Aug 24, 2017 · 9 comments
Closed

Shapes function returns invalid geometries #1126

nickwg03 opened this issue Aug 24, 2017 · 9 comments

Comments

@nickwg03
Copy link
Contributor

nickwg03 commented Aug 24, 2017

Expected behavior and actual behavior.

I expect to convert an array to a set of polygon, but the results contain invalid geometries. In some cases, I'd think the geometries should be MultiPolygon instead of Polygon.

Steps to reproduce the problem.

Loading up the supplied raster, I simply run rasterio.features.shape and then apply shapely.geometry.shape to create shapely geometries:

src = rasterio.open("subset_simple.tif")
band = src.read(1)
result = rasterio.features.shapes(band, transform=src.transform, connectivity=8)
l = [shapely.geometry.shape(x[0]) for x in result]

One of the troublesome polygons I know to be at index 203, so I load that into a geodataframe:
gdf = gpd.GeoDataFrame(geometry=[l[203]])

Plotting that polygon, it looks like this:
image

If I look at the geometry in jupyter notebook, it gives me a notice that the geometry has a self-intersection:
image

My usual go-to is to run a .buffer(0), and that does make the geometry valid, but it cuts off a large part of the polygon:
image

So I'm wondering if instead of the shapes function returning a Polygon, perhaps a MultiPolygon is more appropriate for this particular set of pixels and would avoid invalid geometry results.

I tried re-creating this issue in QGIS, and I found that when using a connectivity of 8 in the built-in QGIS polygonize function, the resulting geometry is still invalid, but is indeed a MultiPolygon geometry type. So then, when I run the .buffer(0) function on the invalid geometry, it fixes the geometry and all polygons remain in the result as I was hoping.

The attached raster file can be used to re-create this problem.
subset_simple.tif.zip

Operating system

Linux

Rasterio version and provenance

'1.0a9', installed from conda, python 3

@sgillies
Copy link
Member

sgillies commented Sep 6, 2017

@nickwg03 thanks for the excellent report. Rasterio uses GDALPolygonize() and changing that function to multipolygons is not something I see us doing in the near future. I have a potential workaround for you: calling shapes() with connectivity=4. That will prevent the corner-connected parts and the resulting self-intersections. I expect that your example polygon above would become 10 polygons with no self-intersections. The price to pay is an increase in the number of features. Would you be willing to give that a try and report back here?

@geowurster
Copy link
Contributor

@sgillies I did not know about that workaround. Might be worth adding to the docs?

@nickwg03
Copy link
Contributor Author

nickwg03 commented Sep 6, 2017

@sgillies @geowurster That was actually the work around that I used, and it does indeed handle the cases where a Multipolygon would be better suited than a Polygon. However, the results still contain Self-intersections for 8 of the some 419 polygons that are produced using connectivity=4.

I tested both using Rasterio as well as QGIS, and unsurprisingly the polygons with self-intersections are the same between the Rasterio method and the QGIS method. So it appears the GDALPolygonize() function produces polygons with self-intersections.

So it doesn't appear that this is an issue in Rasterio as much an issue in the GDAL function.

Red polygons in the image below have self-intersections
image

@sgillies
Copy link
Member

sgillies commented Sep 6, 2017

@nickwg03 ah, I see that the pincer-like touches remain. A very small negative buffer distance would probably erode all of those, but I don't know if that's going to be acceptable in your case.

@nickwg03
Copy link
Contributor Author

nickwg03 commented Sep 6, 2017

@sgillies I've had success just using .buffer(0) on the polygons to make them valid.

@sgillies
Copy link
Member

sgillies commented Sep 6, 2017

@nickwg03 do you know which code is being run to polygonize in QGIS? Is it this: https://github.com/qgis/QGIS/blob/8d615543b7b0d2f79f4f19724fb45202167048fe/python/plugins/processing/algs/gdal/polygonize.py ? I'd like to see Rasterio do this as well for you as QGIS does, but realistically we won't be able to do that until after 1.0.

BTW, I think you've raised the bar for use of geometry visualizations in the Rasterio issue tracker 🎩

@nickwg03
Copy link
Contributor Author

nickwg03 commented Sep 7, 2017

@sgillies I think there is some distinction between the QGIS Polygonize method I was using and the one you linked to above. The one you linked to looks to be the Polygonize function available in the processing plugin, while I was just using the built-in Polygonize function. I do, however, think that both methods are just GUIs that end up essentially running a command line script utilizing gdal_polygonize.py. In fact, when I simply run gdal_polygonize.py from the command line, I end up with the same self-intersecting polygons.

So just to be clear, I think there are a couple of things going on here:

  1. GDAL Polygonize (regardless if run in Rasterio, QGIS, or directly on command line using gdal_polygonize.py) will produce polygons with self-intersections. This appears to be out of the control of Rasterio.
  2. Unlike Rasterio, where necessary, gdal_polygonize.py (either in QGIS or on command line) will create MultiPolygons. This likely only occurs when connectivity=8. These MultiPolygons are then able to handle the redrawing (using .buffer(0)) to produce valid MultiPolygons and not dump any non-adjacent pieces of the MultiPolygon. Since Rasterio appears to always return Polygons, attempts to redraw will end up dumping some pieces of the non-adjacent Polygon, and will produce holes in the vector result that a MultiPolygon would have kept intact.

I think the workaround is good enough for the time being, and certainly waiting until after 1.0 sounds reasonable to me.

@davidanderson-1701
Copy link

davidanderson-1701 commented Sep 28, 2017

I just bumped up against this problem myself. For me the workaround will not work as I am trying to build patches for a patch size analysis where the 8 way connectivity defines a patch. I ended up doing some crufty workarounds to decompose the returned polygon (that is really a multipolygon) into the component pieces then calculating area on each component piece. My workarounds were needed because the area attribute on the incorrect multipolygon is not returning the true area. For the case of corner touching only the area for the first polygon in the multi polygon is being returned. For the case with interior polygons the interior hole area is not being deducted.
If there was a way for Shapely to easily convert a polygon feature that is really a multiplygon to a true multipolygon feature the problem would be solved. I was not able to figure out how to do that.

Here is the not very elegant code I wrote as a workaround to get the correct area.
I apologize for the bad formatting. The mark up is working very oddly with this snippet. I gave up after 30 minutes of futzing around.

`for shape,value in rio_feat.shapes(rio_raster,transform=t,connectivity=8,mask=warm_dry_mask):
new_shape = shapely.geometry.shape(shape)
if len(shape["coordinates"])>1:
temp_shape = copy.copy(shape)
temp_area = 0
temp_shapes = []

for coor in shape["coordinates"]:
    temp_shape["coordinates"]=[coor]
    new_temp_shape = shapely.geometry.shape(temp_shape)
                        
    temp_shapes.append(new_temp_shape)
    inside_polygons = []
    inside_polygons_index = []

for x,y in itertools.combinations(enumerate(temp_shapes),2):
    if x[1].contains(y[1]):
        inside_polygons.append(y[1])
        inside_polygons_index.append(y[0])

for i in sorted(inside_polygons_index,reverse=True):
    del temp_shapes[i]
inside_area = sum([abs(ip.area) for ip in inside_polygons])
new_shape_area = sum([abs(y.area) for y in temp_shapes]) - inside_area`

@sgillies
Copy link
Member

I'm going to mark this as a wontfix only because the GDAL function we're calling doesn't return valid geometries and shape() is going to stay low level with the same behavior. We may yet offer a wrapper of some kind if people aren't already solving the problem with GeoPandas or something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants