-
Notifications
You must be signed in to change notification settings - Fork 821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend clean_intersections to return a new graph, topologically equal to the input graph, but with clusters of nodes merged into a single centroid node. #249
Conversation
…o its input, but with clusters of nodes merged into a single centroid node.
This is a feature I was looking forward to! I will try it out and return back with any issues |
Thanks for trying it out, @samuelduchesne! I'm sure that there a few glitches that I may have missed and room for improvement overall. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PedrosWits I've left you some preliminary comments. Also, would you mind providing a comment in this PR documenting the logic of how this new functionality works in a few sentences?
@PedrosWits also, can you explain a bit more about why this is the case:
|
Take for instance the network below. If we only care about intersections where we have multiple turning options (and ignore u-turns), then a lot of these nodes are redundant as the only option is moving forward, along the same geometry. I realise that this is a gross oversimplification of the road network, but I'm guessing it can be useful when analysing major road infrastructure. On the other hand, it may be important to keep these nodes as they represent entry/exit points from the major network. I'm not a 100% sure on this one. However, identifying these nodes is a bit tricky as not all nodes of degree 2 are redundant, especially the ones near the edge of the graph. |
Main logic behind the method:
|
Hi @PedrosWits, when I use the new function and then apply a
These values come from to same example you provided in tests. |
Thanks for noticing this, @samuelduchesne. I don't think it is a ndarray, but just a nested list. I wasn't sure whether to flatten the lists when merging attributes of several candidate edges between the centroids, but now I have my answer! I wasn't sure because we lose the mapping of attributes to its prior edges, that is:
So, we could potentially get the previous edges back from each new edge, and reverse the clean_intersections transformation if we wanted to. But this is probably not a big deal, as we don't care about the reverse transformation, i.e. going from the cleaned intersections graph back to the original one. We just have the two graphs in memory or store them to disk. I'll just flatten the resulting lists and submit a new commit. |
I also want to do a quick speed benchmark to see how the method performs with increasing size of G. |
@PedrosWits would you mind resolving the branch conflicts so we can run the CI tests? |
Whooops, I'm sorry, I didn't realise this was something that I had to do. I thought I had made sure that the two master branches didn't diverge, but I guess this wasn't the case. And this right was front of me for so long.. erm.. anyway.. it should be ok now. |
This comment has been minimized.
This comment has been minimized.
@PedrosWits
I receive the following error
|
Hi @schoeller, thanks for trying out the new method! Issue: Solution: But other than that it looks like the method is doing the job right.
|
That sounds like a good solution. |
Oops, that solution didn't actually work as we were iterating over every k,v pair. |
…o its input, but with clusters of nodes merged into a single centroid node.
One of the conditions was causing a ValueError exception for nodes containing attributes whose value are lists
@gboeing , so I just had a look into these two points that I've highlighted previously. So.. I've computed the shortest paths distances for all pairs in both the original and cleaned graphs and then, for each pair, I took the absolute difference between the two. One thing that I've noticed is that some nodes in the original graph had a shortest distant greater than 2 times the threshold, but after cleaning, they belong to the same cluster! It's obvious to me now why the second bullet point quoted above doesn't make sense. Two nodes might be close together on a straight line, and hence be assigned to the same cluster, but that doesn't imply that their shortest distance along the network, is anywhere close to that value. That obviously depends on the topology of the network. Please see example below. So that solves that mystery. But that makes it really difficult to test for correctness. By cleaning the graph we're effectively throwing away information, so there will always be some differences between paths computed in the original vs cleaned graphs. But can we do better than just visual inspection or is that enough? Example
(4550426634, 13800765): {'new_edge': (277, 277), 'value': 455.99300000000005} |
So I rerun the performance test for the new method: Params:
Overall times:
Times inside the method:
Overall, the So these both of the issues that I've highlighted previously don't appear to be issues at all. |
@ppintosilva thanks! Can you resolve the merge conflict and I'll take a final reviewing pass then merge? |
In cases where there is a single edge between the centroid and its neighbor, we can keep rather than discard parallel edges
This comment has been minimized.
This comment has been minimized.
@ppintosilva thanks! |
@gboeing thanks for the support, patience and guidance getting this PR through! Cheers 🙌 |
Hi @gboeing, it seems that the new function is not integrated/merged with the master (the latest version of Osmnx still have the version that return a Geoseries). Is there a reason for that? |
@psohouenou it was merged into a feature branch that went stale due to some unresolved issues (see discussion in #299). A new PR for similar functionality is now open (see #339) but needs a little more work before it's ready. |
All right. Thank you very much for your prompt reply! |
Superseded by #450 |
Hi @gboeing. I'm happy to put forward the first version of the new
clean_intersections()
method. The method now returns the simplified graph instead of just the intersection centroids.I've added comments to the code, which hopefully will be easy to follow, but I'm happy to write down the main steps of the algorithm. The method doesn't appear to be (too) slow, but there is probably room for improvement.
There are a few nuances to look out for, e.g. when choosing between multiple geometries, but I think that the final decision was sensible.
Finally, the simplified graph can end up having a lot of nodes of degree 2 (in its undirected version). Therefore, further simplification can be achieved by removing these nodes. I'm working on this, but maybe this should be a separate method.
I'm looking forward to your input. Cheers, Pedro.