Extend clean_intersections to return a new graph, topologically equal to the input graph, but with clusters of nodes merged into a single centroid node. #249

ppintosilva · 2019-01-28T19:11:24Z

Hi @gboeing. I'm happy to put forward the first version of the new clean_intersections() method. The method now returns the simplified graph instead of just the intersection centroids.

I've added comments to the code, which hopefully will be easy to follow, but I'm happy to write down the main steps of the algorithm. The method doesn't appear to be (too) slow, but there is probably room for improvement.

There are a few nuances to look out for, e.g. when choosing between multiple geometries, but I think that the final decision was sensible.

Finally, the simplified graph can end up having a lot of nodes of degree 2 (in its undirected version). Therefore, further simplification can be achieved by removing these nodes. I'm working on this, but maybe this should be a separate method.

I'm looking forward to your input. Cheers, Pedro.

…o its input, but with clusters of nodes merged into a single centroid node.

ppintosilva · 2019-01-28T19:13:04Z

Before and after pics:

2700 Shattuck Ave, Berkeley, CA
Newcastle main road network (secundary and higher ranked roads)

Before	After

Edit: visuals

coveralls · 2019-01-28T19:36:19Z

Coverage increased (+0.1%) to 93.685% when pulling 877ba56 on ppintosilva:master into 54f5f76 on gboeing:master.

samuelduchesne · 2019-01-30T01:32:38Z

This is a feature I was looking forward to! I will try it out and return back with any issues

ppintosilva · 2019-01-30T11:48:11Z

Thanks for trying it out, @samuelduchesne! I'm sure that there a few glitches that I may have missed and room for improvement overall.

gboeing

@PedrosWits I've left you some preliminary comments. Also, would you mind providing a comment in this PR documenting the logic of how this new functionality works in a few sentences?

osmnx/simplify.py

gboeing · 2019-01-30T14:03:23Z

@PedrosWits also, can you explain a bit more about why this is the case:

Finally, the simplified graph can end up having a lot of nodes of degree 2 (in its undirected version).

ppintosilva · 2019-01-31T15:15:12Z

@PedrosWits also, can you explain a bit more about why this is the case:

Take for instance the network below. If we only care about intersections where we have multiple turning options (and ignore u-turns), then a lot of these nodes are redundant as the only option is moving forward, along the same geometry. I realise that this is a gross oversimplification of the road network, but I'm guessing it can be useful when analysing major road infrastructure. On the other hand, it may be important to keep these nodes as they represent entry/exit points from the major network. I'm not a 100% sure on this one.

However, identifying these nodes is a bit tricky as not all nodes of degree 2 are redundant, especially the ones near the edge of the graph.

ppintosilva · 2019-01-31T17:01:40Z

Main logic behind the method:

For each node in G, we find the closest centroid (the cluster to which that node belongs)
We create the new graph G' and add the nodes, labelled 1 to n, where n is the number of centroids.
To create the edges in the new graph, we iterate through each centroid:
a. We get the list of nodes that belong to that centroid (we call these 'elements')
b. For each element of our current centroid, we get its out going edges.
c. We map the second node of each out-edge (successor), into nodes of G'.
d. We compute the successors of the current centroid as the set of (mapped) successors of its elements, thus removing duplicates.
e. For each successor centroid, we merge the existing attributes of the out going edges to that particular centroid.
f. The 'geometry' with largest length is kept and "fixed" to avoid gaps in the resulting graph (is there a better way of choosing the geometry?)
g. Finally, we add the edge (centroid, successor) to G'. There is no need to handle predecessors, as the current centroid will be a successor to other centroids.

samuelduchesne · 2019-02-01T14:55:47Z

Hi @PedrosWits, when I use the new function and then apply a ox.get_undirected() on the new graph, I get an error inside the is_duplicate_edge() function. Your method seems to add osmid attributes as ndarrays instead of lists. (I have highlighted the array below for data_other). The error occurs on line 326 in the set generator.

data= {'lanes': '1', 'bridge': 'yes', 'highway': 'primary', 'oneway': True, 'geometry': <shapely.geometry.linestring.LineString object at 0x1a296900b8>, 'to': 94, 'length': 271.10933596417567, 'ref': 'A695', 'from': 98, 'osmid': [5016040, 95817240], 'name': 'Derwenthaugh Road', 'maxspeed': '70 mph'}

data_other= {'lanes': ['1', '1'], 'bridge': 'yes', 'highway': ['primary', 'primary'], 'oneway': [True, True], 'from': 94, 'to': 98, 'length': 406.7024179676412, 'geometry': <shapely.geometry.linestring.LineString object at 0x1a29693128>, 'ref': ['A695', 'A695'], 'osmid': [[114882696, 60893148], 505407462], 'name': 'Derwenthaugh Road', 'maxspeed': ['70 mph', '70 mph']}

These values come from to same example you provided in tests.

ppintosilva · 2019-02-04T11:43:56Z

Thanks for noticing this, @samuelduchesne. I don't think it is a ndarray, but just a nested list. I wasn't sure whether to flatten the lists when merging attributes of several candidate edges between the centroids, but now I have my answer! I wasn't sure because we lose the mapping of attributes to its prior edges, that is:

[114882696, 60893148] -> osmids or previous edge A
505407462 -> osmid of previous edge B

So, we could potentially get the previous edges back from each new edge, and reverse the clean_intersections transformation if we wanted to. But this is probably not a big deal, as we don't care about the reverse transformation, i.e. going from the cleaned intersections graph back to the original one. We just have the two graphs in memory or store them to disk.

I'll just flatten the resulting lists and submit a new commit.

…d tests.

ppintosilva · 2019-02-04T11:59:48Z

I also want to do a quick speed benchmark to see how the method performs with increasing size of G.

osmnx/simplify.py

gboeing · 2019-02-06T18:43:34Z

@PedrosWits would you mind resolving the branch conflicts so we can run the CI tests?

ppintosilva · 2019-02-06T22:05:29Z

@PedrosWits would you mind resolving the branch conflicts so we can run the CI tests?

Whooops, I'm sorry, I didn't realise this was something that I had to do. I thought I had made sure that the two master branches didn't diverge, but I guess this wasn't the case. And this right was front of me for so long.. erm.. anyway.. it should be ok now.

schoeller · 2019-03-20T14:12:58Z

@PedrosWits
When trying

G = ox.graph_from_address('Bandung, Jawa, Indonesia', distance=3000, network_type='drive', infrastructure='way["highway"~"trunk|primary|secondary|tertiary"]', clean_periphery=True, simplify=True)
G = ox.project_graph(G, to_crs='+proj=utm +zone=48 +ellps=WGS84 +datum=WGS84 +units=m +no_defs +south')
G = ox.clean_intersections(G, tolerance=50, dead_ends=False)
gdf_nodes, gdf_edges = ox.graph_to_gdfs(G)
G = ox.gdfs_to_graph(gdf_nodes, gdf_edges)

I receive the following error

  File "<ipython-input-6-22a16ce1b815>", line 1, in <module>
    G = ox.gdfs_to_graph(gdf_nodes, gdf_edges)

  File "C:\ProgramData\Miniconda3\lib\site-packages\osmnx\save_load.py", line 610, in gdfs_to_graph
    attribute_values = {k:v for k, v in attributes[attribute_name].items() if pd.notnull(v)}

  File "C:\ProgramData\Miniconda3\lib\site-packages\osmnx\save_load.py", line 610, in <dictcomp>
    attribute_values = {k:v for k, v in attributes[attribute_name].items() if pd.notnull(v)}

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

ppintosilva · 2019-03-21T00:27:01Z

I receive the following error
...

attribute_values = {k:v for k, v in attributes[attribute_name].items() if pd.notnull(v)}

Hi @schoeller, thanks for trying out the new method!

Issue:
The 'osmid' attribute for nodes in the new cleaned graph can be a list as opposed to a single value. This is because the centroids (nodes in the cleaned graph) can be composed by multiple nodes (of the original graph), each with its own osmid. Thus when the condition pd.notnull(v) is evaluated and v is a list of osmids, we get an array of booleans, instead of a single true/false value, which causes the ValueError exception.

Solution:
We can store the entire list in a separate attribute, say 'osmids', and store a single value of that list on the 'osmid' key. Does this sound reasonable @gboeing?

But other than that it looks like the method is doing the job right.

Before	After

gboeing · 2019-03-21T01:00:05Z

That sounds like a good solution.

ppintosilva · 2019-03-21T14:18:37Z

That sounds like a good solution.

Oops, that solution didn't actually work as we were iterating over every k,v pair.
A few lines below in the method gdfs_to_graph the condition (isinstance(v, list) or pd.notnull(v)), is used instead of pd.notnull(value)), and that works for cases where v is a list. Matching the two seems to solve the issue.

…o its input, but with clusters of nodes merged into a single centroid node.

…d tests.

One of the conditions was causing a ValueError exception for nodes containing attributes whose value are lists

ppintosilva · 2019-06-06T14:42:22Z

* [ ]  Routes and their corresponding distance should remain similar for the original and cleaned graphs. Is this the case?

* [ ]  Is the choice of geometry for the new edges adequate? It seems to be the case by looking at the plots. However, I notice many instances of new edges with length greater than the expected maximum: max length (among the candidate edges) + 2*tolerance (maximum distance of old node to its centroid, for _u_ and _v_ centroids). So far I haven't come up with an explanation for this.

@gboeing , so I just had a look into these two points that I've highlighted previously.

So.. I've computed the shortest paths distances for all pairs in both the original and cleaned graphs and then, for each pair, I took the absolute difference between the two.

One thing that I've noticed is that some nodes in the original graph had a shortest distant greater than 2 times the threshold, but after cleaning, they belong to the same cluster! It's obvious to me now why the second bullet point quoted above doesn't make sense. Two nodes might be close together on a straight line, and hence be assigned to the same cluster, but that doesn't imply that their shortest distance along the network, is anywhere close to that value. That obviously depends on the topology of the network. Please see example below.

So that solves that mystery. But that makes it really difficult to test for correctness. By cleaning the graph we're effectively throwing away information, so there will always be some differences between paths computed in the original vs cleaned graphs. But can we do better than just visual inspection or is that enough?

Example

The dictionary key represents the ids of the nodes in the original graph (source, target) of the shortest path.
new_edge is the ids of the nodes in the cleaned graph
value is the shortest distance difference between the two (zero in the cleaned graph because they belong to the same cluster)

(4550426634, 13800765): {'new_edge': (277, 277), 'value': 455.99300000000005}

ppintosilva · 2019-06-06T15:08:14Z

* [ ]  More detailed performance profiling

So I rerun the performance test for the new method:

Params:

address = '2700 Shattuck Ave, Berkeley, CA'
distance = 10000
tolerance = 30

Overall times:

Got all network data : 24.20 sec
Created graph: 4.21 sec
Simplified graph: 4.17 sec
Projected graph: 48.28 sec
Cleaned graph: 20.25 sec

Times inside the method:

checkpoint 1 (copy graph, remove dead ends, calculate centroids): 15.01 sec
checkpoint 2 (calling get_nearest_nodes and computing the dicts nearest_centroid and centroid_elements): 0.49 sec
checkpoint 3 (main loop where new graph is built): 4.71 sec

Overall, the clean_intersections method does not take a massive time compared to other existing methods. My code only incurs additional time from checkpoints 2 and 3. Up until checkpoint 1 is the same code for clean_intersections as per before the pull request.

So these both of the issues that I've highlighted previously don't appear to be issues at all.
I think that we're in a good position to merge this PR @gboeing

gboeing · 2019-06-07T14:05:52Z

@ppintosilva thanks! Can you resolve the merge conflict and I'll take a final reviewing pass then merge?

osmnx/simplify.py

In cases where there is a single edge between the centroid and its neighbor, we can keep rather than discard parallel edges

gboeing · 2019-06-10T13:45:08Z

@ppintosilva thanks!

ppintosilva · 2019-06-10T15:01:02Z

@ppintosilva thanks!

@gboeing thanks for the support, patience and guidance getting this PR through! Cheers 🙌

psohouenou · 2020-01-20T17:17:11Z

Hi @gboeing, it seems that the new function is not integrated/merged with the master (the latest version of Osmnx still have the version that return a Geoseries). Is there a reason for that?

gboeing · 2020-01-20T17:20:43Z

@psohouenou it was merged into a feature branch that went stale due to some unresolved issues (see discussion in #299). A new PR for similar functionality is now open (see #339) but needs a little more work before it's ready.

psohouenou · 2020-01-20T17:27:29Z

All right. Thank you very much for your prompt reply!

gboeing · 2020-05-12T23:15:45Z

Superseded by #450

ppintosilva added 4 commits January 28, 2019 18:11

Extended clean_intersections to return a graph, topologically equal t…

5f9f606

…o its input, but with clusters of nodes merged into a single centroid node.

Added tests for clean_intersections

3e65da0

Simplified warning message in clean_intersections

f43186b

Removed unused imports

453ba9d

gboeing reviewed Jan 30, 2019

View reviewed changes

osmnx/simplify.py Show resolved Hide resolved

osmnx/simplify.py Outdated Show resolved Hide resolved

osmnx/simplify.py Outdated Show resolved Hide resolved

osmnx/simplify.py Outdated Show resolved Hide resolved

osmnx/simplify.py Show resolved Hide resolved

gboeing added the enhancement label Jan 30, 2019

ppintosilva added 2 commits January 31, 2019 17:04

Small improvement to clean_intersections and added comments

eea1bd6

Fixed order of format arguments in log message in clean_intersections

582fbb9

clean_intersections bugfix: attribute lists are now flattened. Update…

073d8e8

…d tests.

Corrected reference to cleaned graph in test

980ca89

samuelduchesne reviewed Feb 6, 2019

View reviewed changes

osmnx/simplify.py Outdated Show resolved Hide resolved

Updated crs check and docstring in clean_intersections

ab1235f

This comment has been minimized.

Sign in to view

Merged master with upstream

fb1b6e8

ppintosilva added 13 commits June 3, 2019 00:31

Extended clean_intersections to return a graph, topologically equal t…

ebfcb97

…o its input, but with clusters of nodes merged into a single centroid node.

Added tests for clean_intersections

4d37547

Simplified warning message in clean_intersections

be55199

Removed unused imports

76e3776

Small improvement to clean_intersections and added comments

5e671cf

Fixed order of format arguments in log message in clean_intersections

ef65619

clean_intersections bugfix: attribute lists are now flattened. Update…

a623894

…d tests.

Corrected reference to cleaned graph in test

5f881fe

Updated crs check and docstring in clean_intersections

749aa15

Matched two similar if statements in gdfs_to_graph

e68012c

One of the conditions was causing a ValueError exception for nodes containing attributes whose value are lists

Updated clean_intersections and the corresponding tests

c8da70a

Simplified clean_intersections: now reuses ox.get_nearest_nodes

014c39a

Merge branch 'master' of github.com:ppintosilva/osmnx

b3364c4

gboeing reviewed Jun 7, 2019

View reviewed changes

osmnx/simplify.py Outdated Show resolved Hide resolved

ppintosilva added 2 commits June 7, 2019 15:42

Merge branch 'master' into master

ec9a69b

Updated clean_intersections to keep parallel edges in some cases.

877ba56

In cases where there is a single edge between the centroid and its neighbor, we can keep rather than discard parallel edges

This comment has been minimized.

Sign in to view

gboeing changed the base branch from master to clean_intersections June 10, 2019 13:43

gboeing merged commit 51fbb6f into gboeing:clean_intersections Jun 10, 2019

gboeing mentioned this pull request Oct 28, 2019

ENH: added coarse graining functionality to simplify #339

Merged

gboeing mentioned this pull request May 12, 2020

Topological intersection consolidation #450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend clean_intersections to return a new graph, topologically equal to the input graph, but with clusters of nodes merged into a single centroid node. #249

Extend clean_intersections to return a new graph, topologically equal to the input graph, but with clusters of nodes merged into a single centroid node. #249

ppintosilva commented Jan 28, 2019

ppintosilva commented Jan 28, 2019 •

edited

coveralls commented Jan 28, 2019 •

edited

samuelduchesne commented Jan 30, 2019

ppintosilva commented Jan 30, 2019 •

edited

gboeing left a comment

gboeing commented Jan 30, 2019

ppintosilva commented Jan 31, 2019

ppintosilva commented Jan 31, 2019

samuelduchesne commented Feb 1, 2019 •

edited

ppintosilva commented Feb 4, 2019

ppintosilva commented Feb 4, 2019

gboeing commented Feb 6, 2019

ppintosilva commented Feb 6, 2019

This comment has been minimized.

schoeller commented Mar 20, 2019

ppintosilva commented Mar 21, 2019

gboeing commented Mar 21, 2019

ppintosilva commented Mar 21, 2019

ppintosilva commented Jun 6, 2019

ppintosilva commented Jun 6, 2019 •

edited

gboeing commented Jun 7, 2019

This comment has been minimized.

gboeing commented Jun 10, 2019

ppintosilva commented Jun 10, 2019

psohouenou commented Jan 20, 2020

gboeing commented Jan 20, 2020

psohouenou commented Jan 20, 2020

gboeing commented May 12, 2020

Extend clean_intersections to return a new graph, topologically equal to the input graph, but with clusters of nodes merged into a single centroid node. #249

Extend clean_intersections to return a new graph, topologically equal to the input graph, but with clusters of nodes merged into a single centroid node. #249

Conversation

ppintosilva commented Jan 28, 2019

ppintosilva commented Jan 28, 2019 • edited

coveralls commented Jan 28, 2019 • edited

samuelduchesne commented Jan 30, 2019

ppintosilva commented Jan 30, 2019 • edited

gboeing left a comment

Choose a reason for hiding this comment

gboeing commented Jan 30, 2019

ppintosilva commented Jan 31, 2019

ppintosilva commented Jan 31, 2019

samuelduchesne commented Feb 1, 2019 • edited

ppintosilva commented Feb 4, 2019

ppintosilva commented Feb 4, 2019

gboeing commented Feb 6, 2019

ppintosilva commented Feb 6, 2019

This comment has been minimized.

schoeller commented Mar 20, 2019

ppintosilva commented Mar 21, 2019

gboeing commented Mar 21, 2019

ppintosilva commented Mar 21, 2019

ppintosilva commented Jun 6, 2019

ppintosilva commented Jun 6, 2019 • edited

gboeing commented Jun 7, 2019

This comment has been minimized.

gboeing commented Jun 10, 2019

ppintosilva commented Jun 10, 2019

psohouenou commented Jan 20, 2020

gboeing commented Jan 20, 2020

psohouenou commented Jan 20, 2020

gboeing commented May 12, 2020

ppintosilva commented Jan 28, 2019 •

edited

coveralls commented Jan 28, 2019 •

edited

ppintosilva commented Jan 30, 2019 •

edited

samuelduchesne commented Feb 1, 2019 •

edited

ppintosilva commented Jun 6, 2019 •

edited