Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_nearest_node and shortest_path performance issue #30

Closed
sephib opened this issue Feb 2, 2017 · 12 comments
Closed

get_nearest_node and shortest_path performance issue #30

sephib opened this issue Feb 2, 2017 · 12 comments

Comments

@sephib
Copy link
Contributor

sephib commented Feb 2, 2017

I'm trying to calculate routes from thousands of points.
Running the get_nearest_node and shortest_path on the following network takes a long time.

{
	'circuity_avg' : 1.084601551622338,
	'count_intersections' : 58456,
	'edge_density_km' : None,
	'edge_length_avg' : 119.35486740722354,
	'edge_length_total' : 19018004.572667,
	'intersection_density_km' : None,
	'k_avg' : 4.536693003060716,
	'm' : 159340,
	'n' : 70245,
	'node_density_km' : None,
	'self_loop_proportion' : 0.004022844232458893,
	'street_density_km' : None,
	'street_length_avg' : 123.93589363482089,
	'street_length_total' : 12220326.984180609,
	'street_segments_count' : 98602,
	'streets_per_node_avg' : 2.80183642963912,
	'streets_per_node_counts' : {
		0 : 0,
		1 : 11789,
		2 : 499,
		3 : 48094,
		4 : 9584,
		5 : 265,
		6 : 13,
		7 : 1
	},
	'streets_per_node_proportion' : {
		0 : 0.0,
		1 : 0.16782689159370773,
		2 : 0.007103708448999929,
		3 : 0.6846608299523098,
		4 : 0.13643675706455977,
		5 : 0.003772510498967898,
		6 : 0.00018506655277955727,
		7 : 1.4235888675350559e-05
	}
}

In order to increase the performance of the function I think the function should select a subset of nodes (based on some parameter), there after run the great_circle() method.
I'm thinking of using nx.ego_graph - do you think this is a good direction?

In the meantime since I have some points that repeat themselves I've used some caching techniques. Do you think it is relevant to add to the core.py?

from functools import lru_cache

@lru_cache(maxsize=None)
def get_nearst_node_(graph, coordinate):
    return ox.get_nearest_node(graph, coordinate)

@lru_cache(maxsize=None)
def shortest_path_(graph, origin_node, destination_node):
    return nx.shortest_path(graph, origin_node, destination_node)
@gboeing
Copy link
Owner

gboeing commented Feb 3, 2017

Using an LRU cache is an interesting idea. It that preferable to using an LFU cache in your use case?

Yes I think using nx.ego_graph could be a smart way to speed up route calculation when an approximate max radius is known. You could pass it a radius and optionally a distance (that is, the weight parameter length) to induce a small subgraph to then calculate the route on.

@sephib
Copy link
Contributor Author

sephib commented Feb 4, 2017

  1. Regarding LRU - used it since it is easier to implement (since I"m not limiting the maxsize then I'm not sure how much of a difference it is). Will check later on regarding LFU .

  2. Will update regarding nx.ego_graph

@sephib
Copy link
Contributor Author

sephib commented Feb 5, 2017

Hi
I'm still having performance issues with the get_nearest_node method.
running the code :

origin_node = get_nearst_node_(graph, origin) 

takes between 3-30 seconds per point

Trying to run the function on the entire dataset did not increase the performance (Here i'm running on a subset of the dataset).

dfgp.loc[dfgp['group_id'] == id, 'origin_node'] = dfgp.loc[dfgp['group_id'] == id].apply(lambda x: ox.get_nearest_node(graph, point=(x['lat'], x['long'])), axis=1)

not sure if the ego_graph is relevant since I need a node to run the function.
Is there a way to modify get_nearest_node function to take a maximum distance? I did not see any such thing with the 'great_circle' function.

Any ideas?

@gboeing
Copy link
Owner

gboeing commented Feb 8, 2017

It could be modified to, say, take a point and a radius, then create a circle around the point, then do a spatial intersection with geopandas to retain only those nodes that lie within the circle, and then create a subgraph of only those nodes and their incident edges.

But, you also said:

not sure if the ego_graph is relevant since I need a node to run the function.

You can get the node nearest an arbitrary point with the get_nearest_node function. Then you can use ego_graph to induce a subgraph of nodes within some network distance from this node.

@gboeing
Copy link
Owner

gboeing commented Feb 22, 2017

I've had some time to look further into this. In your example, I see you have 70245 nodes in your graph. I created a similarly sized graph for the city of Los Angeles (n=63922) but I am not seeing the performance issues you described. However, I did find a way to improve the performance time of get_nearest_node by 5x to 10x.

Testing the performance

import osmnx as ox, networkx as nx
ox.config(log_console=True, use_cache=True)
G = ox.graph_from_place('Los Angeles, California, USA', network_type='drive_service')
print(len(G))

Next I define origin and destination lat-long points on opposite sides of LA:

orig_point = (34.234, -118.535)
dest_point = (33.939, -118.242)

And then get the nearest node to each:

%%time
orig_node = ox.get_nearest_node(G, orig_point)

Wall time: 805 ms

%%time
dest_node = ox.get_nearest_node(G, dest_point)

Wall time: 814 ms

So, get_nearest_node is finding the node nearest to an arbitrary point (in a large graph) in less than one second. Inducing a subgraph is also fast:

%%time
eg = nx.ego_graph(G, orig_node, radius=2000, distance='length')

Wall time: 119 ms

This produces a subgraph with the 383 nodes within 2 km of orig_node.

But I'm more interested in the performance of the full graph. So, next I calculate the shortest path between these two nodes using the full graph:

%%time
route = nx.shortest_path(G, source=orig_node, target=dest_node, weight='length')

Wall time: 703 ms

So, it was able to calculate the path in less than a second.

route_length_km = sum([G.edge[u][v][0]['length'] for u, v in zip(route, route[1:])]) / 1000.

There are 341 nodes in the path and it has a total length of 51.54 km. Lastly, we can plot it to get a sense of the scale:

fig, ax = ox.plot_graph_route(G, route, fig_height=20, edge_linewidth=0.3, node_size=0)

Can we make this faster with vectorization?

In these tests, get_nearest_node performs reasonably fast. However, I also played around with ways to make it faster. The main performance constraint is how it calculates the great-circle distance from the arbitrary point to the coordinates of each node in the graph, then finds the minimum. We can vectorize this calculation. It seems that a sufficiently large graph of hundreds of thousands or millions of nodes would benefit from this vectorization approach.

I will test and commit these updates shortly.

@gboeing
Copy link
Owner

gboeing commented Feb 22, 2017

Fast vectorized great circle calculations are added in 75a7dc3

@sephib
Copy link
Contributor Author

sephib commented Feb 23, 2017

Thank you for the review and the great work !
I'll try and check my running environment in addition to your code improvement.
(Sorry didn't mean to close this issue without your permission).

@simmonssong
Copy link

Hi, when I use shortest_path, I got TypeError: unhashable type: 'list'.
The parameter G loaded from graphml that I saved recently
save:
ox.save_graphml(roads, 'graph.graphml', folder='/data/sqy/')
load:
graph = ox.load_graphml('graph.graphml', '/data/sqy/')

The version of osmnx is 0.8, networkx is 2.0

@gboeing
Copy link
Owner

gboeing commented May 9, 2018

@SqySimmon please provide a complete working example of code to reproduce this issue and we'll take a look. If this is a new issue, please open a new issue.

@simmonssong
Copy link

Sorry, I made a mistake in using:
nodes, edges = ox.graph_to_gdfs(graph) graph = ox.gdfs_to_graph(nodes, edges)
I screwed up the orders of nodes and edges
Thank you for you answer.

@ovdavid28
Copy link

Is there any error in this code:

route_length_km = sum([G.edge[u][v][0]['length'] for u, v in zip(route, route[1:])]) / 1000.

Not working for me
Thank you

@gboeing
Copy link
Owner

gboeing commented Mar 13, 2020

@ovdavid28 if you have general usage questions, please ask on StackOverflow per the contributing guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants