Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Filter holes from alpha shape returns #457

Merged
merged 9 commits into from Mar 3, 2022

Conversation

ljwolf
Copy link
Member

@ljwolf ljwolf commented Feb 1, 2022

Noted by @cornhundred in #456, we've got some surprising behavior for alpha shapes with holes. This attempts to fix this in the alpha_shape() function by (1) checking if there are any "holes" in the shapely polygons and, if there are, (2) we only keep polygons that contain an input point.

This solves the problem OK, but I think there's a bit of an issue:
Figure_1

Note that this results in two separate polygons-with-holes on the left, but on the right, we retain one of the holes because a point falls within it... I think this happens because the "correct" polygon for that hole would have a zero-area "cut in" that only includes that point. When we use shapely.ops.polygonize, we lose that cut in, and thus omit the vertex from the alpha shape.

Consider this box with a "cut in" from [1,1] to [.5,.5]:

lines = [
         LineString([[0, 0], [0, 1]]), # 1
         LineString([[0, 1], [1, 1]]), # 2
         LineString([[0, 0], [1, 0]]), # 6
         LineString([[1, 0], [1, 1]]), # 3
         LineString([[1, 1], [0.5, 0.5]]), # 4 
         LineString([[0.5, 0.5], [1, 1]]), # 5
        ]

upon polygonization, we lose that cut-in:

from shapely.ops import polygonize
geom = polygonize(lines)
geom.wkt

geom is now 'POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))' with no mention of the cut-in.
This is good, since a polygon with this kind of cut-in is an invalid polygon, but it complicates our testing here... not all holes are empty.

The other way to do this would be to do an all-pairs within check for the polygons using geoms.sindex.query_bulk(). I was hoping the point-in-poly woudl be faster, but... but you never know. I'll check that & re-submit.

@codecov
Copy link

codecov bot commented Feb 2, 2022

Codecov Report

Merging #457 (ac17fc5) into master (8aa08b1) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #457     +/-   ##
========================================
+ Coverage    78.7%   78.8%   +0.1%     
========================================
  Files         122     122             
  Lines       12884   12910     +26     
========================================
+ Hits        10142   10177     +35     
+ Misses       2742    2733      -9     
Impacted Files Coverage Δ
libpysal/cg/alpha_shapes.py 61.6% <100.0%> (+2.1%) ⬆️
libpysal/cg/tests/test_ashapes.py 97.2% <100.0%> (+0.7%) ⬆️
libpysal/_version.py 40.7% <0.0%> (+2.7%) ⬆️

@ljwolf
Copy link
Member Author

ljwolf commented Feb 2, 2022

OK, I've switched to a zorder search method:

  1. Extract the exterior ring of all shapes geoms. Call this shells.
  2. Compute shells.sindex.query_bulk(geoms, predicate='within') to determine which shells each geometry falls within.
  3. Construct the "within" graph adjacency matrix from these edges.
  4. Compute the row sum of this adjacency matrix. This represents the number of shells in which a geometry falls, self-inclusive, also known as the "z-order".
  5. Select all geometries with an odd z-order, because all holes have an even z-order and all non-holes are odd. Reprising the code comment:
   # A z-order of 1 means the polygon is only inside if its own exterior. This means it's not a hole.
   # A z-order of 2 means the polygon is inside of exactly one other exterior. Because
   #   the hull generation method is restricted to be planar, this means the polygon is a hole. 
   # In general, an even z-order means that the polygon is always exactly matched to one exterior, 
   #   plus some number of intermediate exterior-hole pairs. Therefore, the polygon is a hole.
   # In general, an odd z-order means that there is an uneven number of exteriors. 
   #   This means the polygon is not a hole. 

Figure_1

@ljwolf
Copy link
Member Author

ljwolf commented Feb 2, 2022

@cornhundred, this should be good to go, and solves your reference case. Is there any other known pattern you'd want to test on?

@cornhundred
Copy link

Thanks @ljwolf! The random matrix with holes should cover our use case. Does it return a MultiPolygon or a list of Polygons?

Also, I started to scope out how to allow alternate alpha shape metrics (radius, distance, etc) https://github.com/cornhundred/libpysal/tree/alt_alpha_metrics I'll make a pull request when I've cleaned it up more and meet the pull request requirements (testing etc). For our case we would like to set the threshold based on max triangle length.

@ljwolf
Copy link
Member Author

ljwolf commented Feb 2, 2022

It keeps the return type as a GeoSeries. Would you prefer a single (multi)polygon?

The alpha_shape_auto returns the single shapely polygon, so keeping the return type consistent might be a good idea. However, that'd involve changing the return type of alpha_shape... which would break things.

@cornhundred
Copy link

Ok, keeping it the way it is should be fine. It is easy enough to calculate a MultiPolygon from it. We also want to calculate properties of the individual polygons (area, number of holes, etc) so that would be easier if it returns a list of discontiguous polygons.

@darribas
Copy link
Member

darribas commented Feb 4, 2022

Hello,

Thanks for this @ljwolf for the PR and @cornhundred for bringing it up and chipping in!

I've looked over and it seems all fine, very clever trick!

A couple of thoughts:

  1. Would it be worth to add a test to test both options in alpha_shape?
  2. Right now the holes are discarded, which is probably fine in most applications. Would it be worth considering using the holes to generate "holed" shapes? I'm not sure that's the name, but a MultiPolygon where inside the hole is not part of the polygon? That could also be an option (filter_holes=False/'remove'/'include', or something like that).
  3. It'd be great to add documentation for the filter_holes parameter in alpha_shape.

@cornhundred
Copy link

cornhundred commented Feb 5, 2022

Hey @darribas, those are great ideas, I'm happy to try and help address them. We definitely care about the holes but I'm not sure if it makes sense to pass them as separate polygons rather than derive them later from polygons.

For some background, we're working with single-cell spatial transcriptomics data - see mouse liver data release and example image below

Vizgen MERFISH Mouse Liver Map

The holes in our case can be blood vessels (or artifacts from sectioning) and alpha shapes seem like a great way to quantify them. Also here's a Colab notebook from a brain dataset where we interactively visualize our data.

@martinfleis
Copy link
Member

I'm not sure that's the name, but a MultiPolygon where inside the hole is not part of the polygon?

That can still be a Polygon as long as it has a single part (with N holes in it but not another part within that hole).

@ljwolf
Copy link
Member Author

ljwolf commented Feb 7, 2022

holed shapes

The shapes we return are already holed. We currently return polygons with holes and also the holes themselves as separate extra polygons... Below shows our current behavior:
Figure_1
Currently, we return all of these shapes mixed together in a single GeoSeries. We don't differentiate which is the "hull" (with holes) and which are extra hole polygons. Further, you can't immediately tell which polygons represent the "hull" and which are just extra "holes," hence the z-order search.

False/'ignore'/'include'

Upon reflection, I actually we should remove the option and always filter. Two reasons:

  1. 80% use case: People looking for the alpha_shape() are probably expecting one (or more) polygons that represent the alpha hull of the input point set. In this case, returning duplicate hole polygons (with or without labelling) is giving the user "extra" information they didn't ask for, aren't interested in, and (unless labelled) can't tell apart from what they are interested in: the alpha hull.
  2. Promote "good" patterns: If they do want them, then we should push users towards standard geopandas functionality (e.g. result.interiors.explode()) to extract holes from polygons. The information they want is already there in the holey polygons in the filtered geoseries.

@ljwolf
Copy link
Member Author

ljwolf commented Feb 7, 2022

I've added a test_holes() case to the test_ashapes.py now. If we are OK w/ filtering (no option) then I will add a note/discussion in the docstring about holes and commit. Otherwise, I have the docstring for the option ready to go, and can modify the test easily to check the options.

@cornhundred
Copy link

Hi @ljwolf, I agree with the always filtering logic. Since the holes are properties of specific polygons within the alpha shape we would want users to obtain holes with metadata about their 'parent' polygon, but that should be outside the scope of the alpha shape method.

@cornhundred
Copy link

Hi, just checking on the status of this pull request. I can try to make an additional pull request to have the option for different alpha shape distance thresholds this week so maybe it makes sense to wait on that before making a potential new release.

@ljwolf ljwolf requested a review from martinfleis March 2, 2022 15:03
@ljwolf
Copy link
Member Author

ljwolf commented Mar 2, 2022

A determination about the proper behavior of the option from a reviewer (e.g. @darribas @martinfleis @sjsrey) would get this moving again. Happy to raise this on our monthly developer meeting friday if there is no assent by then!

@cornhundred
Copy link

Great thanks @ljwolf appreciate it.

Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ljwolf. I believe that we should return the polygons that represent the alpha shape and not those that represent holes. As said, those are already included as interiors so we would be just duplicating the same information.

Apart from that one note in the docstring, this seems to be ready to be merged.

Comment on lines 385 to 386
no CRS included in the object. Also note that holes are included in addition
to exteriors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not the case with _filter_holes anymore.

Copy link
Member Author

@ljwolf ljwolf Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so, this is slightly confusing.
There are two functions:

  • alpha_shape(xys, alpha), which returns a GeoSeries for a given alpha and input points xys.
  • alpha_geoms(alpha, triangles, radii, xys), which does the same, but using the delaunay triangulation and its corresponding circumcircle radii.

I have only filtered the alpha_shape() output. alpha_geoms() still can return holes.

I didn't want to add the z-order search into alpha_geoms() because it's used repeatedly in a performance-critical loop by alpha_shape_auto(). That function constructs a single output Polygon that covers the input point set. For its particular case, the holes returned by alpha_geoms() won't matter because of the search procedure used in alpha_shape_auto(). So, if the z-order search were introduced in alpha_geoms, it would introduce an unnecessary (but slight) performance penalty.

Thinking about this, the "right" solution would be to make alpha_geoms() private to alpha_shape_auto(), but we don't know if anyone's using that function.

I propose to solve this:

  • make an unfiltered private version of alpha_geoms() to preserve alpha_shape_auto() speed
  • add a filter to alpha_geoms(), too.
  • deprecate the public alpha_geoms() method.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems reasonable to me to make alpha_geoms() private since users will be directed to use alpha_shape().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alpha_geoms is not a part of public API as far as I can tell. It is not mentioned in the API documentation and it is not listed in __all__ in this file. So I'd say that we do not have to deprecate it and can change as we wish.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Then I'll add an underscore to the name to be extra careful, and then merge this.

Expect a point release with these changes tomorrow UK time @cornhundred

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@ljwolf
Copy link
Member Author

ljwolf commented Mar 3, 2022

tests are now failing in another part of the code. Looks like a networkx issue in Python 3.7. will investigate & merge if I can figure it out.

@ljwolf
Copy link
Member Author

ljwolf commented Mar 3, 2022

OK, well, networkx dropped support for python 3.7; we'll need to drop that too, then, or skip the test.

@ljwolf
Copy link
Member Author

ljwolf commented Mar 3, 2022

Or, I suppose we could pin to networkx 2.6 for Python 3.7. Will investigate.

@ljwolf ljwolf merged commit 0a4b079 into pysal:master Mar 3, 2022
@cornhundred
Copy link

Thanks!

@ljwolf
Copy link
Member Author

ljwolf commented Mar 3, 2022

No problem! 4.6.1 should be live on pypi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants