-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: helper functions part 1 #465
Conversation
Codecov Report
@@ Coverage Diff @@
## main #465 +/- ##
======================================
Coverage ? 96.1%
======================================
Files ? 13
Lines ? 2974
Branches ? 0
======================================
Hits ? 2857
Misses ? 117
Partials ? 0
|
Happy to discuss the new processes! |
Could you summarise what this does? Need to know the goal to properly review it :). |
Hey @martinfleis @jGaboardi
This is trying to add a line from the center point of the roundabout to the closest end of the (previously selected) incoming lines. Anyway, one thing fixed/improved and something else breaks ! |
hey sorry, I hope to get to check the code this week. if you can ensure CI is green in the meantime, that'd be fab! |
Sure, I'm on it ! |
@gregmaya Also, you can take care of the formatting issues/failure either installing
Either option will require you to install |
xref #396 |
Hopefully, by keeping the PR as 'draft' I wasn't spamming you in every single commit 🙈 @jGaboardi thanks for the advice. Formatting is definitely giving me a hard time, lately. |
All good. I don't mind it at all. |
Ok so before you go ahead and review this code I actually kept thinking that the hausdorff_distance bit (lines 860-869) was sub-optimal. |
Generally yes but if waiting for a review should block you then go ahead and make a bigger change |
Hey, @martinfleis and @jGaboardi just flagging that this is ready for review. Also, are you guys thinking to include this as part of GSOC2023? I saw NumFOCUS there but couldn't find momepy in their list |
Thanks! Will have a look.
We haven't discussed this yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. A few notes in the code.
momepy/preprocessing.py
Outdated
@@ -756,6 +757,57 @@ def _polygonize_ifnone(edges, polys): | |||
return polys | |||
|
|||
|
|||
def _create_shape_index(gdf): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has been a development around this in the meantime leading to dropping of this specific formula for the shape index. I don't want to go into details now but the mathematical derivation of this one was flawed.
We now use this one in the paper (in progress)
from esda import shape
polygons["circular_compactness"] = shape.minimum_bounding_circle_ratio(polygons)
polygons["circular_compactness_index"] = polygons["circular_compactness"] * area
I suggest trying the same, but the threshold values will obviously change.
For sure, we should use esda
to measure the actual index to avoid calling pygeos directly here (we'll need to get rid of pygeos soon).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reviewed this and the reality is that we only use reock
as a selection metric for roundabouts. By using shape.minimum_bounding_circle_ratio(polygons)
we would be simplifying the way we get those numbers (which is great) but the actual values are the same.
Either I'm missing something or we could just make those changes and forget about the extra step to create the INDEX that in reality never gets used.
momepy/preprocessing.py
Outdated
polygons_gdf["geometry"] = polygons_gdf["geometry"].apply( | ||
lambda x: make_valid(x) if not x.is_valid else x | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done as a vectorised call, make_valid
is now available as GeoSeries.make_valid
.
momepy/preprocessing.py
Outdated
max_dists = [] | ||
for i, row in rab_adj.iterrows(): | ||
xs, ys = row.geometry.exterior.coords.xy | ||
poly = np.array(list(map(lambda x, y: (x, y), xs, ys))) | ||
|
||
cx, cy = row.savedgeom.centroid.coords.xy | ||
c_point = np.array(list(map(lambda x, y: (x, y), cx, cy))) | ||
|
||
# list of distances between centroid and each vertex of polygon | ||
ds = map(np.linalg.norm, c_point - poly) | ||
d_max = max(ds) | ||
max_dists.append(d_max) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just a max distance between a centroid and a node? If so, you can avoid the loop and depend on shapely here for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I could think of was this...
max_dists = []
for _, row in rab_adj.iterrows():
dists = [
row.savedgeom_right.centroid.distance(Point(x, y))
for x, y in row.geometry.exterior.coords
]
max_dists.append(max(dists))
rab_adj["max_dist"] = max_dists
but you probably had something else in mind.
Thanks, Martin. |
I tried to implement most of the suggestions. However, I must say that I couldn't figure out a way to avoid the for loop for calculating the max_distance to all vertex of the adjacent polygons. I did change it to using the distance() function but couldn't figure out how to avoid the loop. |
@martinfleis @jGaboardi just raising some awareness of this as I believe the updated functions work as long as |
I've added esda to CI in 7548af0. Let's see how it goes, I haven't tested it locally. |
Hey! Any idea why the pre-commit checks remain 'pending'? |
@gregmaya I managed to make it work :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make sure the test pass?
Counts the number of linestrings that formed the polygons | ||
""" | ||
# Check if the polygons are valid geometries and fix them if not | ||
polygons_gdf["geometry"] = polygons_gdf.geometry.make_valid() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail for older geopandas (which we still formally support).
Also, we should call make_valid
only if there are actually some invalid polygons.
area_threshold_val = gdf.area.quantile(area_threshold) | ||
rab = rab[rab[area_col] < area_threshold_val] | ||
if area_threshold_val > 2000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why 2000? Should this be controllable via a keyword?
@gregmaya can you update the code to depend on shapely 2 instead of pygeos? We've just merged it elsewhere. There are also some to-do's when shapely 2 is available that can be tackled now. |
Oh wow! |
Closing due to inactivity and needs of updating. The code will not go anywhere, so we can refer to it when we want to revisit this. |
This is the correction that I really wanted to make before moving back to fixing the tri_like_junctions.
Apologies for any code clumsiness. I'm afraid some of my python skills might need refreshing.