### Introduction

The source graph for this notebook was prepared using the map taken from: https://github.com/pszufe/OpenStreetMapX.jl/blob/master/test/data/reno_east3.osm.

In order to follow the notebook you need to make sure you have the `folium` package installed. You can add it to your Python environment e.g. using the following command `conda install -c conda-forge folium` (or similar, depending on the Python configuration you use).

In [None]:
## path to datasets
datadir='../Datasets/'

In [None]:
import folium as flm 
import igraph as ig
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In this notebook we want to analyze the real data where the graph is representing the road network of Reno, NV, USA.

We are interested in finding which intersections can be expected to be most busy. In order to perform this analysys we assume that citizens want to travel between any two randomly picked intersections via a shortest path linking them.

In the notebook, in order to highlight how sensitive the results of the analysis are to the level of detail reflected by the graph, we present three scenarios:
* assuming that each edge in the road graph has the same length and travel time;
* assigning real road length but assuming that the driving speed on each edge is the same;
* assuming real road lengths and differentiating driving speeds between roads (e.g. you can drive faster using a highway than using a gravel road).

Of course our analysis will still be lacking many real-life details that are potentially important in practice like. Here are some major things we ignore in the analysis:
* non-uniform distribution of source and destination locations of travel;
* number of lanes on each road;
* relationship between traffic on a road and effective average driving speed;
* road usage restrictions for certain classes of vehicles;
* effect of street lights;
* restrictions on turning on intersections.

We left these details from the analysis to keep the example simple enough. However, we encourage readers to try to experiment and extend the presented model with some of these details to check how they would influence the results.

### Ingesting the data

We first read in the source data. It is stored in two files:
* `nodeloc.csv` that for each node of the graph (intersection) contains information on its geographic location;
* `weights.csv` that for each edge of the graph (road) contains information on its length (weight), and speed a car can drive on a given road.

In [None]:
## build undirected weighted graph
g_edges = pd.read_csv(datadir+'Reno/weights.csv')
nv = 1 + max(max(g_edges["from"]), max(g_edges["to"]))

g = ig.Graph(directed=True)
g.add_vertices(nv)

for i in range(len(g_edges)):
    g.add_edge(g_edges["from"][i], g_edges["to"][i])

g.es['weight'] = g_edges['w']
g.es['speed'] = g_edges['speed']

In [None]:
## read lat/lon position of nodes (intersections)
meta = pd.read_csv(datadir+'Reno/nodeloc.csv')

g.vs['longitude'] = list(meta['lon'])
g.vs['latitude'] = list(meta['lat'])
g.vs['layout'] = [(v['longitude'],v['latitude']) for v in g.vs]

g.vs['color'] = "black"

Check that the graph is connected:

In [None]:
g.is_connected()

Check the number of nodes and edges:

In [None]:
print(g.vcount(),'nodes and',g.ecount(),'edges')

Verify the degree distribution of nodes:

In [None]:
pd.Series(g.indegree()).value_counts(normalize=True, sort=False)

In [None]:
pd.Series(g.outdegree()).value_counts(normalize=True, sort=False)

Note that interestingly nodes having in- and out- degree 1, 2, and 3 have similar frequency, and in- and out- degree equal to 4 is less frequent.

Finally lest us visualize our graph.

First we do it using standard iGraph plotting:

In [None]:
ly = ig.Layout(g.vs['layout'])
ly.mirror(1)
ig.plot(g, layout=ly, vertex_size=3, edge_arrow_size=0.01, edge_arrow_width=0.01, edge_curved=0)

Let us also learn how we can nicely overlay a graph on top of a map using the `folium` package:

In [None]:
MAP_BOUNDS = ((39.5001-0.001, -119.802-0.001), (39.5435+0.001, -119.7065+0.001))
m_plot = flm.Map()

for v in g.vs:
    flm.Circle(
        (v['latitude'], v['longitude']),
        radius=1, weight=1,
        color=v['color'], fill=True, fill_color=v['color']).add_to(m_plot)

for e in g.es:
    v1 = g.vs[e.source]
    v2 = g.vs[e.target]
    flm.PolyLine(
        [(v1['latitude'], v1['longitude']), (v2['latitude'], v2['longitude'])],
        color="black", weight=1).add_to(m_plot)

flm.Rectangle(MAP_BOUNDS, color="blue",weight=4).add_to(m_plot)
m_plot.fit_bounds(MAP_BOUNDS)
m_plot

Observe that the plot produced by `folium` is interactive: you can zoom it and move it around.

This plot confirms that ineed we have nodes correctly aligned with intersections on a map of Reno, NV, USA. For instance we see that there are no roads crossing the airport.

Let us now show how to plot nodes of different in- and out- degrees using different colors:

In [None]:
# in-degree:
# yellow - 1
# blue - 2
# red - 3
# green - 4
ig.plot(g, layout=ly, vertex_color=list(np.array(['yellow', 'blue', 'red', 'green'])[np.array(g.indegree())-1]),
        vertex_size=5, edge_arrow_size=0.01, edge_arrow_width=0.01, edge_curved=0)

In [None]:
# out-degree:
# yellow - 1
# blue - 2
# red - 3
# green - 4
ig.plot(g, layout=ly, vertex_color=list(np.array(['yellow', 'blue', 'red', 'green'])[np.array(g.outdegree())-1]),
        vertex_size=5, edge_arrow_size=0.01, edge_arrow_width=0.01, edge_curved=0)

On the plots we see that we have large differences in road lengths in our graph. Let us inverstigate it.

Notice in particular that most nodes lying on the highway are of in- and out- degree 1. This is due to the fact that the highway has only few entry/exit points but its representation in OpenStreetMaps consists of many road segments.

In [None]:
plt.hist(g_edges["w"], 50);

Indeed most of the roads are short, but some of them are very long.

Similarly we see that there are different road classes in our graph: there are highways, but there are also many local roads. 

In [None]:
pd.Series(g_edges["speed"]).value_counts(normalize=True)

Indeed we see that the vast majority of the roads allow for the lowest speed.

Let us check if roads allowing speed `120` coincide with the highways on the map. This is easy to do visually using the following code:

In [None]:
MAP_BOUNDS = ((39.5001-0.001, -119.802-0.001), (39.5435+0.001, -119.7065+0.001))
m_plot = flm.Map()

for v in g.vs:
    flm.Circle(
        (v['latitude'], v['longitude']),
        radius=1, weight=1,
        color=v['color'], fill=True, fill_color=v['color']).add_to(m_plot)

for i in range(g.ecount()):
    e = g.es[i]
    v1 = g.vs[e.source]
    v2 = g.vs[e.target]
    if g.es["speed"][i] == 120:
        w = 3
        c = 'green'
    else:
        w = 1
        c = 'black'
    flm.PolyLine(
        [(v1['latitude'], v1['longitude']), (v2['latitude'], v2['longitude'])],
        color=c, weight=w).add_to(m_plot)

flm.Rectangle(MAP_BOUNDS, color="blue",weight=4).add_to(m_plot)
m_plot.fit_bounds(MAP_BOUNDS)
m_plot

Indeed we see that the thick green edges are covering a highway.

Having checked our input data we may turn to the analysis trying to answer which intersections can be expected to be most busy on the map.

## Basic analysis - each edge has weight 1

We use betweenness centrality to identify how busy a given intersection is, as it measures number of shortest paths in the graph that go through a given node.

In the plots we distinguish 3 types of nodes with respect to their betweenness centrality:
* the very heavy ones (big circle), 99th percentile
* heavy ones (small circle), 90th percentile
* others (very small circle)

In [None]:
## compute betweenness and plot distribution
bet = g.betweenness()
plt.hist(bet, 50);

In [None]:
## size w.r.t. 3 types of nodes
very_heavy_usage = np.quantile(bet, 0.99)
heavy_usage = np.quantile(bet, 0.9)

g.vs['size'] = [1 if b < heavy_usage else 7 if b < very_heavy_usage else 14 for b in bet]

In [None]:
## plot highlighting intersections with high betweenness
ly = ig.Layout(g.vs['layout'])
ly.mirror(1)
ig.plot(g, layout=ly, vertex_size=g.vs['size'], vertex_color=g.vs['color'], edge_arrow_size=0.01, edge_arrow_width=0.01, edge_curved=0)

In [None]:
MAP_BOUNDS = ((39.5001-0.001, -119.802-0.001), (39.5435+0.001, -119.7065+0.001))
m_plot = flm.Map()

for v in g.vs:
    flm.Circle(
        (v['latitude'], v['longitude']),
        radius=1, color=v['color'], weight= v['size'],
        fill=True, fill_color=v['color']).add_to(m_plot)

for e in g.es:
    v1 = g.vs[e.source]
    v2 = g.vs[e.target]
    flm.PolyLine(
        [(v1['latitude'], v1['longitude']), (v2['latitude'], v2['longitude'])],
        color="black", weight=1).add_to(m_plot)

flm.Rectangle(MAP_BOUNDS, color="blue",weight=4).add_to(m_plot)
m_plot.fit_bounds(MAP_BOUNDS)
m_plot

In this simple analysis the most busy nodes are lying around the center of the map. We also have a set of relatively busy intersections in the most dense regions of the map.

However, it seems that this analysis is too simple. We are ignoring the fact how distant are nodes in the calculation of betweenness. Let us include the road lengths in our model.

## Using betweenness with road length


In [None]:
## compute betweenness and plot distribution
bet = g.betweenness(weights=g.es['weight'])
plt.hist(bet, 50);

In [None]:
## size w.r.t. 3 types of nodes
very_heavy_usage = np.quantile(bet, 0.99)
heavy_usage = np.quantile(bet, 0.9)

g.vs['size'] = [1 if b < heavy_usage else 7 if b < very_heavy_usage else 14 for b in bet]

In [None]:
## plot highlighting intersections with high betweenness
ly = ig.Layout(g.vs['layout'])
ly.mirror(1)
ig.plot(g, layout=ly, vertex_size=g.vs['size'], vertex_color=g.vs['color'], edge_arrow_size=0.01, edge_arrow_width=0.01, edge_curved=0)

In [None]:
MAP_BOUNDS = ((39.5001-0.001, -119.802-0.001), (39.5435+0.001, -119.7065+0.001))
m_plot = flm.Map()

for v in g.vs:
    flm.Circle(
        (v['latitude'], v['longitude']),
        radius=10, color=v['color'], weight= v['size'],
        fill=True, fill_color=v['color']).add_to(m_plot)

for e in g.es:
    v1 = g.vs[e.source]
    v2 = g.vs[e.target]
    flm.PolyLine(
        [(v1['latitude'], v1['longitude']), (v2['latitude'], v2['longitude'])],
        color="black", weight=1).add_to(m_plot)

flm.Rectangle(MAP_BOUNDS, color="blue",weight=4).add_to(m_plot)
m_plot.fit_bounds(MAP_BOUNDS)
m_plot

This time we see that the most busy intersections lie on the main roads. However, surprisinly, the highways seem not bo be used much. This clearly is related to the fact that we ignore the speed that cars can drive with on different roads. Let us then add this dimension to our analysis.

## Using betweenness with travel time

In [None]:
## compute betweenness and plot distribution
bet = g.betweenness(weights=g.es['weight'] / g_edges["speed"])
plt.hist(bet, 50);

In [None]:
## size w.r.t. 3 types of nodes
very_heavy_usage = np.quantile(bet, 0.99)
heavy_usage = np.quantile(bet, 0.9)

g.vs['size'] = [1 if b < heavy_usage else 7 if b < very_heavy_usage else 14 for b in bet]

In [None]:
## plot highlighting intersections with high betweenness
ly = ig.Layout(g.vs['layout'])
ly.mirror(1)
ig.plot(g, layout=ly, vertex_size=g.vs['size'], vertex_color=g.vs['color'], edge_arrow_size=0.01, edge_arrow_width=0.01, edge_curved=0)

In [None]:
MAP_BOUNDS = ((39.5001-0.001, -119.802-0.001), (39.5435+0.001, -119.7065+0.001))
m_plot = flm.Map()

for v in g.vs:
    flm.Circle(
        (v['latitude'], v['longitude']),
        radius=10, color=v['color'], weight= v['size'],
        fill=True, fill_color=v['color']).add_to(m_plot)

for e in g.es:
    v1 = g.vs[e.source]
    v2 = g.vs[e.target]
    flm.PolyLine(
        [(v1['latitude'], v1['longitude']), (v2['latitude'], v2['longitude'])],
        color="black", weight=1).add_to(m_plot)

flm.Rectangle(MAP_BOUNDS, color="blue",weight=4).add_to(m_plot)
m_plot.fit_bounds(MAP_BOUNDS)
m_plot

We finally get what we would expect in practice - the most busy intersections go along the highway as it is the fastest way to travel.

In this experiment we could observe that relatively small changes to the setting of the problem might lead to significantly different conclusions. Fortunately, in this case, the most realistic assumptions lead to the most realistic outcome!