In [1]:
%run ./resources/library.py

# Notebook 1 - Advanced: Analyzing the John Snow Cholera Outbreak Using OpenStreetMaps and Networkx - Part 1

## Review of Notebooks 1 and 2 - Basic Notebooks

**Note:** This is an advanced notebook and optional for GEEKS Tier 1.

Picking up from Notebook 2, from the Cholera Case Study, Basic Notebooks, we calculated the mean center of all death points on the map. The result is shown on Figure 1 below for your review.

<img src='images/choleramap-level1-1.png'/>

**Figure 1.** Mean center of all death points in relation to location of deaths and pumps.

## Learning Objectives

By the end of this notebook, you should be able to:
1. Describe in your own words Dr. John Snow's goal for creating the second map of the catchment area for the Broad Street pump.
2. Explain how the Python package `osmnx` applied to street network data helps visualize Dr. John Snow's catchment area map.
3. Explain how the translation of data into graph and network structure (nodes and edges) can help solve problems in public health.

## Transportation and Water Supply in 18th Century London

London, in the 18th Century was practically a walking city. Pedestrians walked on narrow streets with animals, different forms of horse-drawn carriages, and with various carts and wagons used to transport goods. In that same time period, running water and toilets did not exist. People used town wells and communal pumps to get drinking water and dumped their untreated sewage either into the Thames River or in cesspools (open pits).

## Catchment area for the Broad Street Pump

Dr. John Snow suspected that many of the cases he found used the pump at Broad Street. He drew a second map that illustrated the catchment area for the Broad Street pump.

![](./images/voronoi-snow.jpg)

**Figure 2.** Second map drawn by Dr. John Snow showing catchment area of the Broad Street Pump

The following diagram is a more modern representation of Dr. John Snow's second map above. Shown enclosed in the black polygon is the catchment area for the Broad Street pump. An adjustment to the polygon (gray line) was made to accommodate the catchment area for the greyed out pump.

![](./images/7-Figure4-1.png)

**Figure 3.** Modern version of Figure 2 above showing correction (extended area beyong the gray line) [Shiode, 2012]

The following diagram is a color-coded map of the catchment areas of the Soho pumps.
The pump catchment areas are represented by the colors of the streets between the pumps.![](./images/6-Figure3-1.png)

**Figure 4.** Catchment areas from Figure 3 in color [Shiode, 2012]

## Using OpenStreetMap and NetworkX (OSMNx)

In this notebook, we will produce a map visualization that illustrates the catchment area for the Broad Street pump but from a walkability perspective. We will use two new packages, `osmnx` (OpenStreetMap and Networkx) and `networkx`. 

**OpenStreetMap-Networkx (OSMnx)**: A Python package for downloading administrative boundary shapes and street networks from OpenStreetMap, and allows you to easily construct, project, visualize, and analyze complex street networks in Python with NetworkX (see References). 

**Networkx**: A Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks (See References).

Using the graph representation of the street networks of Soho District, we will calculate the mean walkable distance of all death points to each of the pumps (See Figure 5 for an illustration of point-to-point calculations). 

![](./images/choleramap-level1-2.png)

**Figure 5.** The computation approach to calculating mean distance values between each pump and all death coordinates on the Soho district map.

We will carry out the following steps:

**Notebook 3:**
1. Read the street network graph of Soho district using OSMnx. 
2. Plot the street network graph on a `folium` map.

**Notebook 4:**
1. Prepare the original data sets from Notebook 1.

2. To represent the pumps and deaths points (coordinates) from the Notebook 1 in OSMnx graph format, we have to find the nearest OSMnx nodes to those points. We will add new columns to the pumps and deaths dataframes to accomodate new information coming from OSMnx. We will also store the short distances between original points to the nearest OSMnx points and store it in the respective dataframes.

3. To calculate mean distances from deaths points to pumps points we will create a double loop through records of both pumps and deaths dataframes so we can do pairwise distance calculations between each pumps point and each deaths point. We will add the short distances from #1 to the pump point to death point distance and store this in a new dataframe called `routes_df`.

4. We will then create the map representation pump-to-death-points mean distances using `folium` and superimpose this from the death dataframe markers generated in Notebook 1.

5. We will illustrate the degrees of  walkability using an isochrone map.

First, let's `import osmnx`, provide some configuration parameters and print its version. The default setting for `osmnx` is to show debugging information. You can turn this off by setting everything to `False` below.

In [2]:
import osmnx as ox
# some configuration parameters for osmnx
# set log to True to for debugging

ox.config(
    log_console=True, 
    use_cache=True, 
    log_file=True,
    overpass_endpoint='https://overpass-api.de/api/interpreter',
    overpass_rate_limit=True,
    timeout=240
)

ox.__version__

2022-01-31 09:57:36 Configured OSMnx 1.1.2


'1.1.2'

2022-01-31 09:57:36 HTTP response caching is on


## Step 1. Read street network information from OSMnx

### Reading Graph, `G`, from `SOHO_COORDINATES`

The <font color='red'>`ox.graph_from_point()`</font> dot function below loads a street network graph into variable `G`. The `ox.plot_graph()` for function plots a static street network graph using graph `G`.

> **Note:** The code below might take some time to execute (several seconds). We will use the `%%time` magic to time the code execution.

In [3]:
%%time
SOHO_COORDINATES = (51.513578, -0.136722)

G = ox.graph_from_point(SOHO_COORDINATES, dist=500)

2022-01-31 09:57:36 Created bbox 500 m from (51.513578, -0.136722): 51.51807460167747,51.509081398322536,-0.12949656048175065,-0.14394743951824937
2022-01-31 09:57:36 Projected GeoDataFrame to +proj=utm +zone=30 +ellps=WGS84 +datum=WGS84 +units=m +no_defs +type=crs
2022-01-31 09:57:36 Projected GeoDataFrame to epsg:4326
2022-01-31 09:57:36 Projected GeoDataFrame to +proj=utm +zone=30 +ellps=WGS84 +datum=WGS84 +units=m +no_defs +type=crs
2022-01-31 09:57:36 Projected GeoDataFrame to epsg:4326
2022-01-31 09:57:36 Requesting data within polygon from API in 1 request(s)
2022-01-31 09:57:36 Retrieved response from cache file "cache/1936a0e117f0e2d017e1ea47911c13bd3880af6b.json"
2022-01-31 09:57:36 Got all network data within polygon from API in 1 request(s)
2022-01-31 09:57:36 Creating graph from downloaded OSM data...
2022-01-31 09:57:36 Created graph with 8833 nodes and 16451 edges
2022-01-31 09:57:36 Added length attributes to graph edges
2022-01-31 09:57:36 Identifying all nodes that li

Typing `G` by itself exposes the object type of `G`  a `networkx.classes.multidigraph.MultiDiGraph`. 

In [4]:
G

<networkx.classes.multidigraph.MultiDiGraph at 0x1ab368400>

We used the `walk`-type street network. There are several OSMnx network types:

1. `drive` - get drivable public streets (but not service roads)
2. `drive_service` - get drivable streets, including service roads
3. <font color='red'>`walk`</font> - get all streets and paths that pedestrians can use (this network type ignores one-way directionality)
4. `bike` - get all streets and paths that cyclists can use
5. `all` - download all non-private OSM streets and paths
6. `all_private` - download all OSM streets and paths, including private-access ones

Because we are after walkability to the pump, we will use "walk" as our network type.

Let's print some graph statistics using a Python 3 built-in pretty printing function, `pprint`.

In [5]:
from pprint import pprint

basic_stats = ox.basic_stats(G)
pprint(basic_stats)

2022-01-31 09:57:38 Created edges GeoDataFrame from graph
2022-01-31 09:57:38 Converted MultiDiGraph to undirected MultiGraph
{'circuity_avg': 1.0328785598833348,
 'edge_length_avg': 30.02694600804139,
 'edge_length_total': 52276.91300000006,
 'intersection_count': 648,
 'k_avg': 4.407594936708861,
 'm': 1741,
 'n': 790,
 'self_loop_proportion': 0.0,
 'street_length_avg': 31.807450319051974,
 'street_length_total': 34892.773000000016,
 'street_segment_count': 1097,
 'streets_per_node_avg': 2.869620253164557,
 'streets_per_node_counts': {0: 0, 1: 142, 2: 9, 3: 465, 4: 163, 5: 6, 6: 5},
 'streets_per_node_proportions': {0: 0.0,
                                  1: 0.17974683544303796,
                                  2: 0.01139240506329114,
                                  3: 0.5886075949367089,
                                  4: 0.20632911392405062,
                                  5: 0.007594936708860759,
                                  6: 0.006329113924050633}}


### Saving Graph, `G`, as GraphML file

OSMnx has the ability to save street network graphs as GraphML files. Let's save the graph to a GraphML file. To learn more about GraphML, click [here](http://graphml.graphdrawing.org/). The GraphML primer can be found [here](http://graphml.graphdrawing.org/primer/graphml-primer.html).

In [6]:
ox.save_graphml(G, filepath='outputs/soho.graphml')

2022-01-31 09:57:39 Saved graph as GraphML file at "outputs/soho.graphml"


> ### Notes on GraphML
> From the GraphML [web site](http://graphml.graphdrawing.org/):
>   
> `GraphML` (graph markup language) is a comprehensive and easy-to-use file format for **graphs**. A **graph** (see Figure 6 below) is made up of nodes (yellow circles) and edges (connecting lines). It consists of a language core to describe the structural properties of a graph and a flexible extension mechanism to add application-specific data.*
>   
> ![](images/simple.png)

> **Figure 6.** A simple graph with nodes and edges 

In [7]:
!ls -la outputs

total 4448
drwxr-xr-x  39 hermantolentino  staff     1248 Jan 30 23:49 [34m.[m[m
drwxr-xr-x  21 hermantolentino  staff      672 Jan 31 09:52 [34m..[m[m
-rw-r--r--   1 hermantolentino  staff       10 Nov  5 11:51 deaths.cpg
-rw-r--r--   1 hermantolentino  staff    21412 Nov  5 11:51 deaths.dbf
-rw-r--r--   1 hermantolentino  staff      143 Nov  5 11:51 deaths.prj
-rw-r--r--   1 hermantolentino  staff     7100 Nov  5 11:51 deaths.shp
-rw-r--r--   1 hermantolentino  staff     2100 Nov  5 11:51 deaths.shx
-rw-r--r--   1 hermantolentino  staff     8855 Nov  5 11:51 deaths_df.pickle
drwxr-xr-x   3 hermantolentino  staff       96 Nov  5 11:51 [34mgraph[m[m
-rw-r--r--   1 hermantolentino  staff  1225896 Jan 30 23:49 graph.html
-rw-r--r--   1 hermantolentino  staff       10 Nov  5 11:51 isopoly_1.cpg
-rw-r--r--   1 hermantolentino  staff       78 Nov  5 11:51 isopoly_1.dbf
-rw-r--r--   1 hermantolentino  staff      143 Nov  5 11:51 isopoly_1.prj
-rw-r--r--   1 hermantolentino  staff   

## Step 2. View the street network graph, `G`, with Folium

In [8]:
from IPython.display import IFrame
import pandas as pd
import folium
import numpy as np

# let's import the folium plugins
from folium import plugins

In [9]:
pd.__version__, folium.__version__

('1.3.4', '0.12.1.post1')

### Create `folium` map, `map1`, using `osmnx`, then overlay graph, `G`

`osmnx` allows us to create a `folium` map from graph `G`. This enables us to combine this graph with other data (deaths and pumps). Let's reuse the same map name from Notebooks 1 and 2, `map1`.

In [10]:
map1 = ox.plot_graph_folium(G, \
                 edge_width=2, \
                 tiles='cartodbpositron', \
                 edge_color='gray', \
                 popup_attribute=None,\
                 zoom = 17, \
                 edge_opacity=0.5)

2022-01-31 09:57:39 Created edges GeoDataFrame from graph


In [17]:
map1

### Load pumps and deaths data points

In [12]:
deaths_mean_center_df = pd.read_pickle('outputs/mean_center_df.pickle')
pumps_df = pd.read_pickle('outputs/pumps_df.pickle')

Note that we used the version of `deaths_df` with values from mean center calculations. 

In [13]:
deaths_mean_center_df.dtypes

FID              int64
DEATHS           int64
LON            float64
LAT            float64
product_LAT    float64
product_LON    float64
dtype: object

### Plot pumps and data points 

We copy (re-use) some of the Python code we wrote in Notebooks 1 and 2. The code below should look familiar to you. The main difference is we changed the `folium` map name from `map1` to `graph_map`.

In [14]:
deaths_mean_center_df.head()

Unnamed: 0,FID,DEATHS,LON,LAT,product_LAT,product_LON
0,0,3,-0.13793,51.513418,154.540254,-0.41379
1,1,2,-0.137883,51.513361,103.026722,-0.275766
2,2,1,-0.137853,51.513317,51.513317,-0.137853
3,3,1,-0.137812,51.513262,51.513262,-0.137812
4,4,4,-0.137767,51.513204,206.052816,-0.551068


In [15]:
locationlist = deaths_mean_center_df[["LAT","LON"]].values.tolist()
radiuslist = deaths_mean_center_df[["DEATHS"]].values.tolist()

for i in range(0, len(locationlist)):
    popup = folium.Popup('Location: '+'('+str(locationlist[i][0])+\
                         ', '+str(locationlist[i][1])+')'+\
                         '<br/>'+\
                        'Deaths: '+ str(radiuslist[i][0]))
    # add each location with deaths to map1
    folium.CircleMarker(
                    location=locationlist[i], \
                    radius=radiuslist[i], \
                    popup=popup, \
                    color='black', \
                    weight=1, \
                    fill=True, \
                    fill_color='red', \
                    fill_opacity=1).add_to(map1)
    
for each in pumps_df.iterrows():
    popup = folium.Popup('Location: '+'('+str(each[1]['LAT'])+', '+str(each[1]['LON'])+')')
    #add each water pump to map1
    folium.RegularPolygonMarker([each[1]['LAT'],each[1]['LON']], \
                                color='black', \
                                weight=1,\
                                fill_opacity=1, \
                                fill_color='blue', \
                                number_of_sides=4, \
                                popup=popup, \
                                radius=10).add_to(map1)
    
# let's use the "Fullscreen" plugin
# add the button to the top right corner
plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True
).add_to(map1)

<folium.plugins.fullscreen.Fullscreen at 0x1aacf99d0>

### Add mean center point to `graph_map`

In [16]:
mean_LON = np.sum(deaths_mean_center_df['product_LON'])/np.sum(deaths_mean_center_df['DEATHS'])
mean_LAT = np.sum(deaths_mean_center_df['product_LAT'])/np.sum(deaths_mean_center_df['DEATHS'])

mean_center_POINT = (mean_LAT, mean_LON)

folium.CircleMarker(
            location=mean_center_POINT, \
            color='black', \
            weight=2, \
            fill_opacity=1, \
            fill_color="yellowgreen", \
            popup=folium.Popup('Mean Center Point: '+ \
                str(mean_center_POINT)), \
            radius=10).add_to(map1)
map1

You can zoom in to see the map at a higher resolution.

### Summary

In this notebook, explored a few topics:
1. A short review of the Cholera Basic notebooks and learned about the cholera outbreak in 1854 in Soho District and the maps that Dr. John Snow created to explain his theory that cholera is not spread by bad air.
2. Used `folium` and `pandas` for basic representation of data points (pumps and deaths) in code and visualizing those data points on a folium map.
3. Used **mean center point** algorithm to determine where the centroid is of all death data points in relation to pump locations.
4. Used a powerful Python package called `osmnx` to display the street network graph of Soho District.
5. Overlay the street network graph with cholera data points on a `folium` map

In Notebooks 4 and 5, we will calculate "walkability" characteristics of the Soho district street network graph using point-to-point calculations of mean distance values. 

### Discussion questions

1. With your rudimentary knowledge of graphs, what kinds of data problems can we represent with graphs and graph structure?

## Congratulations!

You have completed Notebook 3. Please proceed to Notebook 4 to go over some more exciting applications of `osmnx`.

## References

1. Boeing, Geoff. OSMnx: Python for Street Networks. URL: https://geoffboeing.com/2016/11/osmnx-python-street-networks/
2. Networkx. URL: https://networkx.github.io/
3. Shiode S. Revisiting John Snow's map: network-based spatial demarcation of cholera area. International Journal of Geographical Information Science Volume 26, 2012 - Issue 1. URL: https://www.tandfonline.com/doi/abs/10.1080/13658816.2011.577433.

*For case study suggestions for improvement, please contact Herman Tolentino, Jan MacGregor, James Tobias or Zhanar Haimovich.*