get_nearest_node and shortest_path performance issue #30

sephib · 2017-02-02T12:47:49Z

I'm trying to calculate routes from thousands of points.
Running the get_nearest_node and shortest_path on the following network takes a long time.

{
	'circuity_avg' : 1.084601551622338,
	'count_intersections' : 58456,
	'edge_density_km' : None,
	'edge_length_avg' : 119.35486740722354,
	'edge_length_total' : 19018004.572667,
	'intersection_density_km' : None,
	'k_avg' : 4.536693003060716,
	'm' : 159340,
	'n' : 70245,
	'node_density_km' : None,
	'self_loop_proportion' : 0.004022844232458893,
	'street_density_km' : None,
	'street_length_avg' : 123.93589363482089,
	'street_length_total' : 12220326.984180609,
	'street_segments_count' : 98602,
	'streets_per_node_avg' : 2.80183642963912,
	'streets_per_node_counts' : {
		0 : 0,
		1 : 11789,
		2 : 499,
		3 : 48094,
		4 : 9584,
		5 : 265,
		6 : 13,
		7 : 1
	},
	'streets_per_node_proportion' : {
		0 : 0.0,
		1 : 0.16782689159370773,
		2 : 0.007103708448999929,
		3 : 0.6846608299523098,
		4 : 0.13643675706455977,
		5 : 0.003772510498967898,
		6 : 0.00018506655277955727,
		7 : 1.4235888675350559e-05
	}
}

In order to increase the performance of the function I think the function should select a subset of nodes (based on some parameter), there after run the great_circle() method.
I'm thinking of using nx.ego_graph - do you think this is a good direction?

In the meantime since I have some points that repeat themselves I've used some caching techniques. Do you think it is relevant to add to the core.py?

from functools import lru_cache

@lru_cache(maxsize=None)
def get_nearst_node_(graph, coordinate):
    return ox.get_nearest_node(graph, coordinate)

@lru_cache(maxsize=None)
def shortest_path_(graph, origin_node, destination_node):
    return nx.shortest_path(graph, origin_node, destination_node)

The text was updated successfully, but these errors were encountered:

gboeing · 2017-02-03T20:48:55Z

Using an LRU cache is an interesting idea. It that preferable to using an LFU cache in your use case?

Yes I think using nx.ego_graph could be a smart way to speed up route calculation when an approximate max radius is known. You could pass it a radius and optionally a distance (that is, the weight parameter length) to induce a small subgraph to then calculate the route on.

sephib · 2017-02-04T20:23:34Z

Regarding LRU - used it since it is easier to implement (since I"m not limiting the maxsize then I'm not sure how much of a difference it is). Will check later on regarding LFU .
Will update regarding nx.ego_graph

sephib · 2017-02-05T10:45:36Z

Hi
I'm still having performance issues with the get_nearest_node method.
running the code :

origin_node = get_nearst_node_(graph, origin)

takes between 3-30 seconds per point

Trying to run the function on the entire dataset did not increase the performance (Here i'm running on a subset of the dataset).

dfgp.loc[dfgp['group_id'] == id, 'origin_node'] = dfgp.loc[dfgp['group_id'] == id].apply(lambda x: ox.get_nearest_node(graph, point=(x['lat'], x['long'])), axis=1)

not sure if the ego_graph is relevant since I need a node to run the function.
Is there a way to modify get_nearest_node function to take a maximum distance? I did not see any such thing with the 'great_circle' function.

Any ideas?

gboeing · 2017-02-08T18:44:45Z

It could be modified to, say, take a point and a radius, then create a circle around the point, then do a spatial intersection with geopandas to retain only those nodes that lie within the circle, and then create a subgraph of only those nodes and their incident edges.

But, you also said:

not sure if the ego_graph is relevant since I need a node to run the function.

You can get the node nearest an arbitrary point with the get_nearest_node function. Then you can use ego_graph to induce a subgraph of nodes within some network distance from this node.

gboeing · 2017-02-22T21:30:29Z

I've had some time to look further into this. In your example, I see you have 70245 nodes in your graph. I created a similarly sized graph for the city of Los Angeles (n=63922) but I am not seeing the performance issues you described. However, I did find a way to improve the performance time of get_nearest_node by 5x to 10x.

Testing the performance

import osmnx as ox, networkx as nx
ox.config(log_console=True, use_cache=True)
G = ox.graph_from_place('Los Angeles, California, USA', network_type='drive_service')
print(len(G))

Next I define origin and destination lat-long points on opposite sides of LA:

orig_point = (34.234, -118.535)
dest_point = (33.939, -118.242)

And then get the nearest node to each:

%%time
orig_node = ox.get_nearest_node(G, orig_point)

Wall time: 805 ms

%%time
dest_node = ox.get_nearest_node(G, dest_point)

Wall time: 814 ms

So, get_nearest_node is finding the node nearest to an arbitrary point (in a large graph) in less than one second. Inducing a subgraph is also fast:

%%time
eg = nx.ego_graph(G, orig_node, radius=2000, distance='length')

Wall time: 119 ms

This produces a subgraph with the 383 nodes within 2 km of orig_node.

But I'm more interested in the performance of the full graph. So, next I calculate the shortest path between these two nodes using the full graph:

%%time
route = nx.shortest_path(G, source=orig_node, target=dest_node, weight='length')

Wall time: 703 ms

So, it was able to calculate the path in less than a second.

route_length_km = sum([G.edge[u][v][0]['length'] for u, v in zip(route, route[1:])]) / 1000.

There are 341 nodes in the path and it has a total length of 51.54 km. Lastly, we can plot it to get a sense of the scale:

fig, ax = ox.plot_graph_route(G, route, fig_height=20, edge_linewidth=0.3, node_size=0)

Can we make this faster with vectorization?

In these tests, get_nearest_node performs reasonably fast. However, I also played around with ways to make it faster. The main performance constraint is how it calculates the great-circle distance from the arbitrary point to the coordinates of each node in the graph, then finds the minimum. We can vectorize this calculation. It seems that a sufficiently large graph of hundreds of thousands or millions of nodes would benefit from this vectorization approach.

I will test and commit these updates shortly.

gboeing · 2017-02-22T23:58:22Z

Fast vectorized great circle calculations are added in 75a7dc3

sephib · 2017-02-23T06:42:30Z

Thank you for the review and the great work !
I'll try and check my running environment in addition to your code improvement.
(Sorry didn't mean to close this issue without your permission).

simmonssong · 2018-05-09T07:36:43Z

Hi, when I use shortest_path, I got TypeError: unhashable type: 'list'.
The parameter G loaded from graphml that I saved recently
save:
ox.save_graphml(roads, 'graph.graphml', folder='/data/sqy/')
load:
graph = ox.load_graphml('graph.graphml', '/data/sqy/')

The version of osmnx is 0.8, networkx is 2.0

gboeing · 2018-05-09T13:53:40Z

@SqySimmon please provide a complete working example of code to reproduce this issue and we'll take a look. If this is a new issue, please open a new issue.

simmonssong · 2018-05-10T01:31:13Z

Sorry, I made a mistake in using:
nodes, edges = ox.graph_to_gdfs(graph) graph = ox.gdfs_to_graph(nodes, edges)
I screwed up the orders of nodes and edges
Thank you for you answer.

ovdavid28 · 2020-03-12T11:52:28Z

Is there any error in this code:

route_length_km = sum([G.edge[u][v][0]['length'] for u, v in zip(route, route[1:])]) / 1000.

Not working for me
Thank you

gboeing · 2020-03-13T17:43:43Z

@ovdavid28 if you have general usage questions, please ask on StackOverflow per the contributing guidelines.

gboeing referenced this issue Feb 22, 2017

vectorize all great circle calculations

75a7dc3

sephib closed this as completed Feb 23, 2017

gboeing added the enhancement label Feb 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_nearest_node and shortest_path performance issue #30

get_nearest_node and shortest_path performance issue #30

sephib commented Feb 2, 2017 •

edited by gboeing

gboeing commented Feb 3, 2017

sephib commented Feb 4, 2017

sephib commented Feb 5, 2017 •

edited

gboeing commented Feb 8, 2017

gboeing commented Feb 22, 2017

gboeing commented Feb 22, 2017

sephib commented Feb 23, 2017 •

edited

simmonssong commented May 9, 2018

gboeing commented May 9, 2018

simmonssong commented May 10, 2018

ovdavid28 commented Mar 12, 2020

gboeing commented Mar 13, 2020

get_nearest_node and shortest_path performance issue #30

get_nearest_node and shortest_path performance issue #30

Comments

sephib commented Feb 2, 2017 • edited by gboeing

gboeing commented Feb 3, 2017

sephib commented Feb 4, 2017

sephib commented Feb 5, 2017 • edited

gboeing commented Feb 8, 2017

gboeing commented Feb 22, 2017

Testing the performance

Can we make this faster with vectorization?

gboeing commented Feb 22, 2017

sephib commented Feb 23, 2017 • edited

simmonssong commented May 9, 2018

gboeing commented May 9, 2018

simmonssong commented May 10, 2018

ovdavid28 commented Mar 12, 2020

gboeing commented Mar 13, 2020

sephib commented Feb 2, 2017 •

edited by gboeing

sephib commented Feb 5, 2017 •

edited

sephib commented Feb 23, 2017 •

edited