# Lab 7: Advance spatial accessibility measurements

In this lab, you will advance spatial accessibility measurements by considering **travel time and distance decay**. In particular, you will **reuse the data utilized in Lab 6** but incorporate two advancements into the measurements. Then, you will compare the result of Lab 6, which was based on travel distance, and Lab 7 and investigate how the measures of spatial accessibility **can be biased** under the influence of travel time and distance decay. <br>

Again, you will choose **your own study area** and the data can be obtained from the links below.
* Supply: <a href=https://hifld-geoplatform.opendata.arcgis.com/> Homeland Infrastructure Foundation-Level Data (HIFLD) </a> - <a href=https://hifld-geoplatform.opendata.arcgis.com/datasets/geoplatform::hospitals-1/>Hospitals</a>.
* Demand: US Census Bureau - <a href="https://data.census.gov/cedsci/table?q=population">Decennial Census - Race </a>
* Geographical Area of the reference: <a href=https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2020.html>Topologically Integrated Geographic Encoding and Referencing (TIGER) data </a>

The example below aims to measure hospital accessibility in Harris County, Texas. You can follow the instruction or come up with your own study area. 


In [None]:
import geopandas as gpd
import pandas as pd
import osmnx as ox
import time
from tqdm import tqdm, trange
from shapely.geometry import Point, MultiPoint
import networkx as nx
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
import numpy as np
import warnings
warnings.filterwarnings("ignore")

## 1. (1 point) Data preparation

Bring the data you used in Lab6. You can reuse the code to format the data properly from the scratch or just import the data which has been cleaned. <br>
**1.1.** (0.25 point) Import the population data with geometry and the hospital data with the variable names `demand` and `supply`, respectively. <br>
**1.2.**  (0.25 point) Be sure both are stored as `GeoDataFrame` and has a coordinate system as `NAD83 / Conus Albers` (epsg=5070). 
<br><br>
**Supply** GeoDataFrame should look like the below.
![](./data/supply_screenshot.jpg)

**Demand** GeoDataFrame should look like the below.
![](./data/demand_screenshot.jpg)

In [None]:
# Your code here
supply = 
demand = 

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(supply) == gpd.GeoDataFrame
assert type(demand) == gpd.GeoDataFrame
assert supply.crs.name == 'NAD83 / Conus Albers'
assert demand.crs.name == 'NAD83 / Conus Albers'

print('Success!')

**1.3.** (0.25 point) Grab the road network data from Open Street Map, using `ox.graph_from_place()`. Here you want to use attribute `network_type` equals to `drive` and attribute `simplify` equals to `True` to expedite your analysis.  
**1.4.** (0.25 point) Project the road network to the same coordinate system (epsg=5070) by using `ox.project_graph()`.

In [None]:
# Your code here
##  Obtain network dataset based on text


## Change the projection of the network dataset


ox.plot_graph(G)

Use the function below to make a backup for your network dataset. With the code in the second line, you can store it back. 

```python
    G_ = G.copy()  # Make a copy for a backup
    # G = G_.copy()  # Restore network dataset from the backup
```

In [None]:
G_ = G.copy()
# G = G_.copy()

Also, run the following code to trim your network dataset. 

In [None]:
def remove_uncenessary_nodes(network):
    _nodes_removed = len([n for (n, deg) in network.out_degree() if deg == 0])
    network.remove_nodes_from([n for (n, deg) in network.out_degree() if deg == 0])
    for component in list(nx.strongly_connected_components(network)):
        if len(component) < 10:
            for node in component:
                _nodes_removed += 1
                network.remove_node(node)

    print("Removed {} nodes ({:2.4f}%) from the OSMNX network".format(_nodes_removed, _nodes_removed / float(network.number_of_nodes())))
    print("Number of nodes: {}".format(network.number_of_nodes()))
    print("Number of edges: {}".format(network.number_of_edges()))

    return network


# Simplify Graph: Remove edges
for u, v, data in G.copy().edges(data=True):
    if data['highway'] not in ['motorway', 'motorway_link', 
                               'trunk', 'trunk_link',
                               'primary', 'primary_link', 
#                                'secondary', 'secondary_link'  
                              ]:
        G.remove_edge(u, v)

# Simplify Graph: Remove nodes
G.remove_nodes_from(list(nx.isolates(G)))
G = remove_uncenessary_nodes(G)
ox.plot_graph(G)

When every data is ready, you will find the nearest OSM node with the following equations. Be sure that you have assigned hospital dataset as `supply`, population dataset as `demand`, and network dataset as `G`. 

In [None]:
def find_nearest_osm(network, gdf):
    """
    # This function helps you to find the nearest OSM node from a given GeoDataFrame
    # If geom type is point, it will take it without modification, but 
    # IF geom type is polygon or multipolygon, it will take its centroid to calculate the nearest element. 
    
    Input: 
    - network (NetworkX MultiDiGraph): Network Dataset obtained from OSMnx
    - gdf (GeoDataFrame): stores locations in its `geometry` column 
    
    Output:
    - gdf (GeoDataFrame): will have `nearest_osm` column, which describes the nearest OSM node 
                          that was computed based on its geometry column
      
    """
    for idx, row in tqdm(gdf.iterrows(), total=gdf.shape[0]):
        if row.geometry.geom_type == 'Point':
            nearest_osm = ox.distance.nearest_nodes(network, 
                                                    X=row.geometry.x, 
                                                    Y=row.geometry.y
                                                   )
        elif row.geometry.geom_type == 'Polygon' or row.geometry.geom_type == 'MultiPolygon':
            nearest_osm = ox.distance.nearest_nodes(network, 
                                        X=row.geometry.centroid.x, 
                                        Y=row.geometry.centroid.y
                                       )
        else:
            print(row.geometry.geom_type)
            continue

        gdf.at[idx, 'nearest_osm'] = nearest_osm

    return gdf

supply = find_nearest_osm(G, supply)
demand = find_nearest_osm(G, demand)

## 2. (1.5 point) Advancement 1: Caculate the estimated travel time for each edge

First advancement is to create a catchment area based on the travel time, instead of distance. By utilizing `length` and `maxspeed` attributes in the network dataset (`G`), you can calculate the estimated travel time for each edge. 

**2.1.** (0.25 point) Investigate the contents in each row of the network dataset. You can iterate each row of the network dataset with the function below, assuming `G` is the variable of network dataset. For more information, visit <a href=https://networkx.org/documentation/stable/reference/classes/generated/networkx.Graph.edges.html>G.edges()</a>.
```python
for u, v, data in G.edges(data=True):
    print(data)
```

**2.2.** (0.25 point) Create a list `collect_type`, and append the data type of `maxspeed` to the list by iterating through rows of the network dataset `G`. Given that not every row has the key `maxspeed`, you probably need to use the `if` statement. 

In [None]:
# Your code here
collect_type = []
for u, v, data in G.edges(data=True):
    print(data)


In [None]:
""" Check your answer here. This cell should only give you the data types (e.g., str, list)."""

print(set(collect_type))

**2.3.** (0.25 point) Slice only the numerical portion from the entry in `maxspeed` column. You can use `str.split()` function and list slicing to accomplish this task. <br><br>
**Note**: Most of times, the entry of `maxspeed` consists of `str` and `list`. But, it may have a different data type, such as `dict`, given that Open Street Map is a Volunteered Geographic Information (VGI). If you happen to face a different data type, consult with the instructor. 

**2.4.** (0.25 point) Assign the numerical portion of the `maxspeed` back to the original column (i.e., `data['maxspeed']`). 

In [None]:
for u, v, data in G.edges(data=True):
    if 'maxspeed' in data.keys():
        # Your code here
        
        

In [None]:
""" Check your answer here. This cell should only give you either number or nan value ."""

# Extract the nodes and edges of the network dataset for the future analysis. 
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True, node_geometry=True)
print(edges['maxspeed'].unique())

**2.5.** (0.25 point) Investigate the maximum speed of each edge based on their `highway` type. Then update the values of `max_speed_per_type` dictionary with the observed travel speed. 

For more information about the `highway` attribute, visit <a href=https://wiki.openstreetmap.org/wiki/Key:highway>OSM wiki</a>.

In [None]:
max_speed_per_type = {'motorway': 60, 
                      'motorway_link': 45, 
                      'trunk': 60,
                      'trunk_link': 45, 
                      'primary': 50,
                      'primary_link': 35, 
                      'secondary': 45,
                      'secondary_link': 35,
                      'tertiary': 40, 
                      'tertiary_link': 20,
                      'residential': 30,
                      'living_street': 20,
                      'unclassified': 20
         }

for highway_type in max_speed_per_type.keys():
    speed_per_type = edges.loc[edges['highway'] == highway_type]['maxspeed'].unique()
    print(highway_type, speed_per_type)

**2.6.** (0.25 point) Finish the missing portion of the following `for loop`. The purpose of the following loop is to update the `maxspeed` attribute based on the `highway` attribute in case a row does not have `maxspeed` attriubte. 

**Note**: It is also possible the `highway` attribute has a `list`. Come up with your solution that can differentiate `str` and `list` to get the road type of edges.

In [None]:
for u, v, data in G.edges(data=True):
    if 'maxspeed' in data.keys():
        pass
    
    else:
        # Your code here
        

In [None]:
""" Check your answer here. This cell should only give you numbers."""

# Extract the nodes and edges of the network dataset for the future analysis. 
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True, node_geometry=True)
print(edges['maxspeed'].unique())

When all the materials are ready, run the following cell to properly assign the estimated travel time to `time` attribute of each edge. <br>

**Be sure** the `edges` dataframe has `maxspeed_meters` and `time` columns. Each column should have all the records populated. 

In [None]:
for u, v, data in G.edges(data=True):
    data['maxspeed_meters'] = int(data['maxspeed']) * 26.8223 # MPH * 1.6 * 1000 / 60; Unit: meters per minute
    data['time'] = float(data['length'] / data['maxspeed_meters'])  # Unit: minutes
    
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True, node_geometry=True)
edges

## 3. (1 point) Advancement 2: Apply distance decay functions for catchment areas

Here, we will create a function that divides a catchment area into three subzones. Then, apply a weight based on a distance decay function. For example, we will assign a high weight if the subzone is close to the supply facility, and a low weight if the subzone is far from the supply facility. 

**3.1.** (0.2 point) Update the missing information of <a href=https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.weighted.single_source_dijkstra_path_length.html>`nx.single_source_dijkstra_path_length()`</a> function below to collect the OSM ID of the accessible nodes. <br>
**3.2.** (0.2 point) Properly slice `nodes` GeoDataFrame, which stores all the nodes of the network dataset (`G`) so that you can create a convex hull from the `accessible_nodes`. The result should be stored as `convex_hull` and its type will be `shapely.geometry.polygon.Polygon`.

In [None]:
# Your code here
# Calculate accessible nodes in the network dataset from a given location 
access_nodes = nx.single_source_dijkstra_path_length(G= , 
                                                     source= , 
                                                     cutoff= , 
                                                     weight=
                                                    )
# Extract the locations (or coordinates) of accessible nodes based on the OSMID, then create convex hull.
convex_hull = nodes.loc[]

convex_hull

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
import shapely

assert type(access_nodes) == dict
assert type(convex_hull) == shapely.geometry.polygon.Polygon

print('Success!')

Now we will take advantage of `for loop` and create a series of convex hulls, which are `shapely.Polygon`. Then, we will assign them into a GeoDataFrame with its travel time as the index. 

**3.3.** (0.2 point) Utilize the codes in step 3.1. and 3.2. and complete the `for loop`. For each iteration, a different `shapely.polygon` (i.e., convex hull) will be assigned to the `geometry` column of `polygons` GeoDataFrame with is threshold travel time `minute`. 

In [None]:
minutes = [5, 10, 15]

polygons = gpd.GeoDataFrame(index=minutes, crs='EPSG:5070')

# Your code here
for minute in minutes:
    # Calculate accessible nodes in the network dataset from a given location 
    access_nodes = nx.single_source_dijkstra_path_length(G= , 
                                                         source= , 
                                                         cutoff= , 
                                                         weight=
                                                        )
    # Extract the locations (or coordinates) of accessible nodes based on the OSMID, then create convex hull.
    convex_hull = nodes.loc[]

    # Insert a convex hull to `polygons` GeoDataFrame
    polygons.at[minute, 'geometry'] = convex_hull


In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""
assert type(polygons) == gpd.GeoDataFrame
assert 'geometry' in polygons.columns
assert type(polygons.at[5, 'geometry']) == shapely.geometry.polygon.Polygon

print('Success!')

**3.4.** (0.2 point) Complete the following for loop to subtract a polygon (drawn from a shorter threshold travel time) from a bigger polygon (drawn with a longer threshold travel time). <a href=https://geopandas.org/en/stable/docs/reference/api/geopandas.overlay.html>`gpd.overlay()`</a> will help you to finish this task. 

**Note** Be sure to make a copy of the `polygons` GeoDataFrame, and the assign the subtraction result to the copied GeoDataFrame. 

In [None]:
minutes = [5, 10, 15]

polygons_ = polygons.copy(deep=True)

# Your code here
for idx, minute in enumerate(minutes):


polygons_

In [None]:
""" Test code for the previous code. This cell should NOT give any errors when it is run."""

assert not polygons_.at[5, 'geometry'].within(polygons_.at[10, 'geometry'])
assert not polygons_.at[5, 'geometry'].within(polygons_.at[15, 'geometry'])
assert not polygons_.at[10, 'geometry'].within(polygons_.at[15, 'geometry'])

print('Success!')

**3.5** (0.2 point) Complete the following function `calculate_catchment_area` with the code you worked on in the previous steps. This function will be used in the next section. It is supposed to return a GeoDataFrame which has one polygon without a hole, and two polygons with holes in it. 

In [None]:
def calculate_catchment_area(network, nearest_osm, minutes, distance_unit='time'):
    polygons = gpd.GeoDataFrame(crs='EPSG:5070')

    # Your code here
    # Create convex hull for each travel time (minutes), respectively.
    for minute in minutes:
        
  

    # Calculate the differences between convex hulls which created in the previous section.
    polygons_ = polygons.copy(deep=True)
    for idx, minute in enumerate(minutes):
        
        
    return polygons_.copy(deep=True)

In [None]:
# Test your function here. The output should be a GeoDataFrame (index = minutes, column=geometry)
catchment_areas = calculate_catchment_area(network= G, 
                                           nearest_osm = supply.at[5, 'nearest_osm'], 
                                           minutes = [5, 10, 15], 
                                           distance_unit = 'time'
                                          )
catchment_areas

In [None]:
""" Check your answer here. This cell should NOT give any errors when it is run."""

assert type(catchment_areas) == gpd.GeoDataFrame
assert catchment_areas.shape == (3, 1)

fig, ax = plt.subplots(1, 1, figsize=(10, 10))

catchment_areas.boundary.plot(ax=ax, color='black', zorder=2)
demand.plot(ax=ax, color='#999999', zorder=1)

## 4. (1.5 point) Measure spatial accessibility with the advanced features

The following is the codes for the first step of the original two-step catchment area method. You need to incorporate the advanced feature (highlited in blue in the equation), which is included in `catchment_catchment_area` function, into the measurement. 

$$\huge R_j = \frac{S_j}{\sum_{k\in {\left\{\color{blue}{t_{kj}} \le \color{blue}{t_0} \right\}}}^{}{P_k}\color{blue}{W_k}}$$
where<br>
$R_j$: the supply-to-demand ratio of location $j$. <br>
$S_j$: the degree of supply (e.g., number of doctors) at location $j$. <br>
$P_k$: the degree of demand (e.g., population) at location $k$. <br>
$\color{blue}{t_{kj}}$: the travel <font color='blue'>time</font> between locations $k$ and $j$. <br>
$\color{blue}{t_0}$: the threshold travel <font color='blue'>time</font> of the analysis. <br>
$\color{blue}{W_k}$: Weight based on a distance decay function

```python

step1 = supply.copy(deep=True)
step1['ratio'] = 0

for i in trange(supply.shape[0]):
    
    # Create a catchment area from a given location
    ## Get the list of accessible nodes ID
    access_nodes = nx.single_source_dijkstra_path_length(G=G, 
                                                         source=supply.loc[i, 'nearest_osm'], 
                                                         cutoff=15, 
                                                         weight='time'
                                                        )
    ## Create a convex hull based on the points
    convex_hull = nodes.loc[nodes.index.isin(access_nodes.keys()), 'geometry'
                           ].unary_union.convex_hull

    # Calculate the population within the catchment area
    temp_demand = demand.loc[demand['geometry'].centroid.within(convex_hull), 'pop'].sum()

    # Calculate the number of hospital beds in each hospital
    temp_supply = supply.loc[i, 'BEDS']
    
    # Calculate the supply-to-demand ratio of each supply location
    step1.at[i, 'ratio'] = temp_supply / temp_demand * 100000
    
    print(f'Hospital {i}: {temp_supply} BEDS / Surrounding population: {temp_demand} / Ratio: {step1.at[i, "ratio"]}')


```


**4.1.** (0.5 point) Find the location where you can put the function `calculate_catchment_area` within this `for loop`. Since we have not applied any weight based on distance decay, the result should be the same as the original 2SFCA method. 

In [None]:
# Extract the nodes and edges of the network dataset for the future analysis. 
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True, node_geometry=True)

In [None]:
step1 = supply.copy(deep=True)
step1['ratio'] = 0

for i in trange(supply.shape[0]):

    # Create a catchment area from a given location
    ## Get the list of accessible nodes ID
    access_nodes = nx.single_source_dijkstra_path_length(G=G, 
                                                         source=supply.loc[i, 'nearest_osm'], 
                                                         cutoff=15, 
                                                         weight='time'
                                                        )
    ## Create a convex hull based on the points
    convex_hull = nodes.loc[nodes.index.isin(access_nodes.keys()), 'geometry'
                           ].unary_union.convex_hull

    # Calculate the population within the catchment area
    temp_demand = demand.loc[demand['geometry'].centroid.within(convex_hull), 'pop'].sum()

    # Calculate the number of hospital beds in each hospital
    temp_supply = supply.loc[i, 'BEDS']

    # Calculate the supply-to-demand ratio of each supply location
    step1.at[i, 'ratio'] = temp_supply / temp_demand * 100000

    print(f'Hospital {i}: {temp_supply} BEDS / Surrounding population: {temp_demand} / Ratio: {step1.at[i, "ratio"]}')
    

Once you implement `calculate_catchment_area` and it still gives the same result, we are ready to incorporate the distance decay into the function. 

**4.2.** (0.5 point) Utilize the following `minutes` list and `weights` dictionary, to appreciate the influence of distance decay within the first step of 2SFCA method.  

```python
minutes = [5, 10, 15]
weights = {5: 1, 10: 0.68, 15: 0.22}
```

In [None]:
minutes = [5, 10, 15]
weights = {5: 1, 10: 0.68, 15: 0.22}

step1 = supply.copy(deep=True)
step1['ratio'] = 0

# Your code here 
for i in trange(supply.shape[0]):

    # Create catchment areas from a given location (use `calculate_catchment_area` function)
    catchment_areas = 
    
    # Calculate the population within the catchment areas
    catchment_pop = 0
    for idx, row in catchment_areas.iterrows():
    
        
    # Calculate the number of hospital beds in each hospital
    temp_supply = supply.loc[i, 'BEDS']
    
    # Calculate the supply-to-demand ratio of each supply location
    step1.at[i, 'ratio'] = temp_supply / catchment_pop * 100000
        
    print(f'Hospital {i}: {temp_supply} BEDS / Surrounding population: {catchment_pop} / Ratio: {step1.at[i, "ratio"]}')

step1

In the same manner, the following code is the second step of the original 2SFCA method. Modify this to have `calculate_catchment_area`. 

$$\huge A_i = \sum_{j\in {\left\{\color{blue}{t_{ij}} \le \color{blue}{t_0} \right\}}} R_j\color{blue}{W_j}$$
where<br>
$A_i$: the accessibility measures at location $i$. <br>
$R_j$: the supply-to-demand ratio of location $j$. <br>
$\color{blue}{W_j}$: Weight based on a distance decay function<br>

```python
step2 = demand.copy(deep=True)
step2['access'] = 0

for j in trange(demand.shape[0]):
    
    # Create a catchment area from a given location
    ## Get the list of accessible nodes ID
    access_nodes = nx.single_source_dijkstra_path_length(G=G, 
                                                         source=demand.loc[j, 'nearest_osm'], 
                                                         cutoff=15, 
                                                         weight='time'
                                                        )
    
    ## Create a convex hull based on the points
    convex_hull = nodes.loc[nodes.index.isin(access_nodes.keys()), 'geometry'
                           ].unary_union.convex_hull

    # Calculate the population within the catchment area
    temp_ratio = step1.loc[step1['geometry'].centroid.within(convex_hull), 'ratio'].sum()
    
    # Assign the accumulated ratio of accessible supply facilities to each demand location
    step2.at[j, 'access'] = temp_ratio

```


**4.3.** (0.5 point) Implement `calculate_catchment_area` function and utilize the following `minutes` list and `weights` dictionary, to appreciate the influence of distance decay within the second step of 2SFCA method. 

```python
minutes = [5, 10, 15]
weights = {5: 1, 10: 0.68, 15: 0.22}
```

In [None]:
minutes = [5, 10, 15]
weights = {5: 1, 10: 0.68, 15: 0.22}

step2 = demand.copy(deep=True)
step2['access'] = 0

## Your code here
for j in trange(demand.shape[0]):
    # Create catchment areas from a given location
    catchment_areas = 
    
    # Calculate the population within the catchment areas
    catchment_ratio = 0
    for idx, row in catchment_areas.iterrows():
        
        
    # Assign the accumulated ratio of accessible supply facilities to each demand location
    step2.at[j, 'access'] = catchment_ratio
    
step2

In [None]:
""" Check your answer here. This cell should NOT give any errors when it is run."""
fig, ax = plt.subplots(figsize=(10,10))

step1 = step1[~step1.isin([np.nan, np.inf, -np.inf]).any(1)]
step2 = step2[~step2.isin([np.nan, np.inf, -np.inf]).any(1)]

step1.plot(markersize='BEDS', ax=ax, zorder=2, color='black')
step2.plot('access', ax=ax, legend=True, cmap='Blues', scheme='FisherJenks', zorder=1)
step2.loc[step2['access'] == 0].plot(ax=ax, color='grey', zorder=1)
step2.boundary.plot(ax=ax, linestyle='dotted', lw=0.5, color='black', zorder=2)

plt.show()

print('Success!')

### *You have finished Lab 7: Advance Spatial Accessibility Measurements*

Please name your jupyter notebook as `GEOG489_Lab7_[YOUR_NET_ID].ipynb`, and upload it to https://learn.illinois.edu **ALONG WITH YOUR DATA**.