# Urban Networks II

Overview of today's topics:
  - Network modeling and analysis in a study site
  - Simulating commutes
  - Network efficiency
  - Network perturbation
  - Comparative network analysis
  - Urban accessibility

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import osmnx as ox
import pandana
import pandas as pd
from shapely.geometry import Point

# consistent randomization
np.random.seed(0)

# configure OSMnx
cache_folder = '../../data/cache2'
ox.config(log_console=True, use_cache=True, cache_folder=cache_folder)

## 1. Model a study site

First, we will identify a study site, model its street network, and calculate some simple indicators.

In [None]:
# create a study site: geocode city hall, convert coords to shapely geometry,
# project geometry to UTM, buffer by 5km, project back to lat-lng
latlng_coords = ox.geocode('Los Angeles City Hall')
latlng_point = Point(latlng_coords[1], latlng_coords[0])
latlng_point_proj, crs = ox.projection.project_geometry(latlng_point)
polygon_proj = latlng_point_proj.buffer(5000)
polygon, crs = ox.projection.project_geometry(polygon_proj, crs=crs, to_latlong=True)
polygon

In [None]:
# model the street network within study site
# your parameterization makes assumptions about your interests here
G = ox.graph_from_polygon(polygon, network_type='drive', truncate_by_edge=True)
fig, ax = ox.plot_graph(G, node_size=0, edge_color='w', edge_linewidth=0.3)

In [None]:
# add speeds and travel times
G = ox.add_edge_speeds(G)
G = ox.add_edge_travel_times(G)

In [None]:
# study site area in km^2
polygon_proj.area / 1e6

In [None]:
# how many intersections does it contain?
street_counts = pd.Series(dict(G.nodes(data='street_count')))
intersect_count = len(street_counts[street_counts > 2])
intersect_count

In [None]:
# what's the intersection density?
intersect_count / (polygon_proj.area / 1e6)

In [None]:
# now clean up the intersections and re-calculate
clean_intersects = ox.consolidate_intersections(ox.project_graph(G),
                                                rebuild_graph=False,
                                                tolerance=10)
clean_intersect_count = len(clean_intersects)
clean_intersect_count

In [None]:
# what's the cleaned intersection density?
clean_intersect_count / (polygon_proj.area / 1e6)

## 2. Simulate commutes

We'll use a random sample of LEHD LODES data to get home/work coordinates. This is an imperfect proxy for "true" work locations from a payroll enumeration. You can read more about LODES and its limitations [here](https://doi.org/10.1080/21681376.2018.1455535). These data are processed in a separate [notebook](process-lodes.ipynb) to keep the data easy on your CPU and memory for this lecture. Our trip simulation will use naive assumptions about travel time (e.g., free flow, no congestion, rough imputation of speed limits) for simplicity, but these can be enriched with effort.

In [None]:
od = pd.read_csv('../../data/od.csv').sample(1000)
od.shape

In [None]:
od

In [None]:
# get home/work network nodes
home_nodes = ox.get_nearest_nodes(G, X=od['home_lng'], Y=od['home_lat'], method='balltree')
work_nodes = ox.get_nearest_nodes(G, X=od['work_lng'], Y=od['work_lat'], method='balltree')

In [None]:
def calc_path(G, orig, dest, weight='travel_time'):
    try:
        return ox.shortest_path(G, orig, dest, weight)
    except nx.exception.NetworkXNoPath:
        # if path cannot be solved
        return None

In [None]:
%%time
paths = [calc_path(G, orig, dest) for orig, dest in zip(home_nodes, work_nodes)]
len(paths)

In [None]:
# filter out any nulls (ie, not successfully solved)
paths = [path for path in paths if path is not None]
len(paths)

In [None]:
# plot 100 routes
fig, ax = ox.plot_graph_routes(G,
                               routes=paths[0:100],
                               node_size=0,
                               edge_linewidth=0.2,
                               orig_dest_size=0,
                               route_colors='c',
                               route_linewidth=2,
                               route_alpha=0.2)

In [None]:
# now it's your turn
# how do these routes change if we minimize distance traveled instead?
# what kinds of streets get more/fewer trips assigned to them?


## 3. Network efficiency

How "efficient" are our commuter's routes? That is, how does their distance traveled compare to straight-line distances from home to work?

In [None]:
def calc_efficiency(G, route, attr='length'):
    # sum the edge lengths in the route
    trip_distance = sum(ox.utils_graph.get_route_edge_attributes(G,
                                                                 route=route,
                                                                 attribute=attr))
    # fast vectorized great-circle distance calculator
    gc_distance = ox.distance.great_circle_vec(lat1=G.nodes[route[0]]['y'],
                                               lng1=G.nodes[route[0]]['x'],
                                               lat2=G.nodes[route[-1]]['y'],
                                               lng2=G.nodes[route[-1]]['x'])
    return gc_distance / trip_distance

# calculate each trip's efficiency and make a pandas series
trip_efficiency = pd.Series([calc_efficiency(G, path) for path in paths])

In [None]:
# the straight-line distance is what % of each network distance traveled?
trip_efficiency

In [None]:
trip_efficiency.describe()

In [None]:
# now it's your turn
# what if i were instead interested in how much longer trips are than straight-line would be?


## 4. Network perturbation

Oh no! There's been an earthquake!

The earthquake has knocked out 10% of the street network. Let's simulate that perturbation and see how routes have to change.

In [None]:
# randomly knock-out 10% of the network's nodes
frac = 0.10
n = int(len(G.nodes) * frac)
nodes_to_remove = pd.Series(G.nodes).sample(n).index
G_per = G.copy()
G_per.remove_nodes_from(nodes_to_remove)

In [None]:
# get home/work network nodes again, calculate routes, drop nulls
home_nodes_per = ox.get_nearest_nodes(G_per, X=od['home_lng'], Y=od['home_lat'], method='balltree')
work_nodes_per = ox.get_nearest_nodes(G_per, X=od['work_lng'], Y=od['work_lat'], method='balltree')
paths_per = [calc_path(G_per, orig, dest) for orig, dest in zip(home_nodes_per, work_nodes_per)]
paths_per = [path for path in paths_per if path is not None]
len(paths_per)

In [None]:
# calculate each trip's efficiency and make a pandas series
trip_efficiency_per = pd.Series([calc_efficiency(G_per, path) for path in paths_per])
trip_efficiency_per.describe()

How many routes are now disconnected? How did trip efficiency change?

In [None]:
# what % of formerly solvable routes are now unsolvable?
1 - (len(paths_per) / len(paths))

In [None]:
# knocking out x% of the network made (solvable) trips what % less efficient?
1 - (trip_efficiency_per.mean() / trip_efficiency.mean())

In [None]:
# plot n routes apiece, before (cyan) and after (yellow) perturbation
n = 100
all_paths = paths[:n] + paths_per[:n]
colors = ['c'] * n + ['y'] * n

# shuffle the order, so you don't just plot new atop old
paths_colors = pd.DataFrame({'path': all_paths, 'color': colors}).sample(frac=1)

fig, ax = ox.plot_graph_routes(G,
                               routes=paths_colors['path'],
                               node_size=0,
                               edge_linewidth=0.2,
                               orig_dest_size=0,
                               route_colors=paths_colors['color'],
                               route_linewidth=2,
                               route_alpha=0.3)

Central LA performs relatively well because it has a relatively dense and gridlike network that offers multiple redundancy options. 

  1. What if you conduct this analysis in a disconnected, dendritic suburb on the urban fringe?
  2. What if you model a walkable network rather than a drivable one?
  3. What if the network perturbation isn't a spatially random process?

Take these questions as prompts for self-paced exercise. For example, let's say the LA river has flooded. Use OSMnx to attach elevations to all the nodes in our street network, then knock-out the 10% at the lowest elevation (ie, around the river). How does that change network characteristics like connectivity and efficiency? Or, model a coastal town Miami Beach, then knock-out the network nodes below some sea-level rise threshold. What happens? What neighborhoods are most affected? What communities live in those vulnerable places?

In [None]:
# now it's your turn
# use the prompts above to conduct a self-directed analysis of network perturbation
# either using elevation/flooding or any of the 3 prompts above


## 5. Compare places to each other

Here we'll model and analyze a set of sub-sites within a study area to compare their characteristics.

In [None]:
# study area within 1/2 mile of SF Civic Center
latlng_coords = ox.geocode('Civic Center, San Francisco, CA, USA')
latlng_point = Point(latlng_coords[1], latlng_coords[0])
latlng_point_proj, crs = ox.projection.project_geometry(latlng_point)
polygon_proj = latlng_point_proj.buffer(800)
sf_polygon, crs = ox.projection.project_geometry(polygon_proj, crs=crs, to_latlong=True)

In [None]:
# get the tracts that intersect the study area polygon
tracts = gpd.read_file('../../data/tl_2020_06_tract/').set_index('GEOID')
mask = tracts.intersects(sf_polygon)
cols = ['ALAND', 'geometry']
sf_tracts = tracts.loc[mask, cols]
sf_tracts.head()

Let's use a custom filter to model "surface streets." You get to pick what to include and exclude, using the [Overpass Query Language](https://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL).

In [None]:
# build a custom filter
cf1 = '["highway"~"residential|living_street|tertiary|secondary|primary"]'
cf2 = '["service"!~"alley|driveway|emergency_access|parking|parking_aisle|private"]'
cf3 = '["area"!~"yes"]'
custom_filter = cf1 + cf2 + cf3
custom_filter

In [None]:
# model the street network across all the study sub-sites
G_all = ox.graph_from_polygon(sf_tracts.unary_union, custom_filter=custom_filter)
len(G_all.nodes)

In [None]:
%%time
# calculate clean intersection counts per tract
intersect_counts = {}
for label, geom in zip(sf_tracts.index, sf_tracts['geometry']):
    G_tmp = ox.graph_from_polygon(geom, custom_filter=custom_filter)
    clean_intersects = ox.consolidate_intersections(ox.project_graph(G_tmp),
                                                    rebuild_graph=False)
    intersect_counts[label] = len(clean_intersects)

In [None]:
# calculate intersection density per km^2
sf_tracts['intersect_count'] = pd.Series(intersect_counts)
sf_tracts['intersect_density'] = sf_tracts['intersect_count'] / (sf_tracts['ALAND'] / 1e6)
sf_tracts['intersect_density'].describe()

In [None]:
# plot the tracts and the network
plt.style.use('dark_background')
fig, ax = plt.subplots(figsize=(6, 6))
ax.axis('off')
ax.set_title('Intersection density (per km2)')
ax = sf_tracts.plot(ax=ax, column='intersect_density', cmap='Reds_r',
                    legend=True, legend_kwds={'shrink': 0.8})
fig, ax = ox.plot_graph(G_all, ax=ax, node_size=0, edge_color='#111111')
fig.savefig('map.png', dpi=300, facecolor='#111111', bbox_inches='tight')

Our simplified, naive assumptions in this analysis have some shortcomings that resulting in analytical problems. How would you improve it?
  1. Periphery effects?
  2. Incorrect study site sizes?
  3. What are we counting and not counting here?

In [None]:
# now it's your turn
# how would you improve this analysis to make it more meaningful and interpretable?


## 6. Urban accessibility

If you're interested in isochrone mapping, see the [OSMnx examples](https://github.com/gboeing/osmnx-examples) for a demonstration.

Here, we'll analyze food deserts in central LA using OSMnx and [Pandana](https://udst.github.io/pandana/). Pandana uses contraction hierarchies for imprecise but very fast shortest path calculation.

In [None]:
# specify some parameters for the analysis
walk_time = 20  # max walking horizon in minutes
walk_speed = 4.5  # km per hour

In [None]:
# model the walkable network within our original study site
G_walk = ox.graph_from_polygon(polygon, network_type='walk')
fig, ax = ox.plot_graph(G_walk, node_size=0, edge_color='w', edge_linewidth=0.3)

In [None]:
# set a uniform walking speed on every edge
for u, v, data in G_walk.edges(data=True):
    data['speed_kph'] = walk_speed
G_walk = ox.add_edge_travel_times(G_walk)

In [None]:
# extract node/edge GeoDataFrames, retaining only necessary columns (for pandana)
nodes = ox.graph_to_gdfs(G_walk, edges=False)[['x', 'y']]
edges = ox.graph_to_gdfs(G_walk, nodes=False).reset_index()[['u', 'v', 'travel_time']]

In [None]:
# get all the "fresh food" stores on OSM within the study site
# you could load any amenities DataFrame, but we'll get ours from OSM
tags = {'shop': ['grocery', 'greengrocer', 'supermarket']}
amenities = ox.geometries_from_bbox(north=nodes['y'].max(),
                                    south=nodes['y'].min(),
                                    east=nodes['x'].min(),
                                    west=nodes['x'].max(),
                                    tags=tags)
amenities.shape

In [None]:
# construct the pandana network model
network = pandana.Network(node_x=nodes['x'],
                          node_y=nodes['y'], 
                          edge_from=edges['u'],
                          edge_to=edges['v'],
                          edge_weights=edges[['travel_time']])

In [None]:
# extract (approximate, unprojected) centroids from the amenities' geometries
centroids = amenities.centroid

In [None]:
# specify a max travel distance for this analysis
# then set the amenities' locations on the network
maxdist = walk_time * 60  # minutes -> seconds, to match travel_time units
network.set_pois(category='grocery',
                 maxdist=maxdist,
                 maxitems=3,
                 x_col=centroids.x, 
                 y_col=centroids.y)

In [None]:
# calculate travel time to nearest amenity from each node in network
distances = network.nearest_pois(distance=maxdist,
                                 category='grocery',
                                 num_pois=3)
distances.astype(int).head()

In [None]:
# plot distance to nearest amenity
fig, ax = ox.plot_graph(G_walk, node_size=0, edge_linewidth=0.1,
                        edge_color='gray', show=False, close=False)

sc = ax.scatter(x=nodes['x'],
                y=nodes['y'], 
                c=distances[1],
                s=1,
                cmap='inferno_r')

ax.set_title(f'Walking time to nearest grocery store')
plt.colorbar(sc, shrink=0.7).outline.set_edgecolor('none')

This tells us about the travel time to the nearest amenities, from each node in the network. What if we're instead interested in how many amenities we can reach within our time horizon?

In [None]:
# set a variable on the network, using the amenities' nodes
node_ids = network.get_node_ids(centroids.x, centroids.y)
network.set(node_ids, name='grocery')

# aggregate the variable to all the nodes in the network
# when counting, the decay doesn't matter (but would for summing)
access = network.aggregate(distance=maxdist,
                           type='count',
                           decay='linear',
                           name='grocery')

# let's cap it at 5, assuming no further utility from a larger choice set
access = access.clip(upper=5)
access.describe()

In [None]:
# plot amenity count within your walking horizon
fig, ax = ox.plot_graph(G_walk, node_size=0, edge_linewidth=0.1,
                        edge_color='gray', show=False, close=False)

sc = ax.scatter(x=nodes['x'],
                y=nodes['y'], 
                c=access,
                s=1,
                cmap='inferno')

ax.set_title(f'Grocery stores within a {walk_time} minute walk')
plt.colorbar(sc, shrink=0.7).outline.set_edgecolor('none')

In [None]:
# now it's your turn
# map walking time to nearest school in our study site, capped at 30 minutes
# what kinds of communities have better/worse walking access to schools?
# see documentation at https://wiki.openstreetmap.org/wiki/Tag:amenity=school
