# GEOG489 SP22 Final

## Instruction
Your final exam consists of three major parts. <br>
**First**, you will prepare supply, demand, and mobility data for measuring spatial accessibility to healthcare resources in Champaign County. <br>
**Second**, you will measure spatial accessibility considering distance decay. <br>
**Third**, you will calculate spatial autocorrelation based on the accessibility measures.
<br><br>
**When you finish the tasks, please save/download your Jupyter notebook and submit it to learn.illinois.edu.**

In [None]:
import geopandas as gpd
import pandas as pd
import osmnx as ox
import networkx as nx
import matplotlib.pyplot as plt
import esda
import libpysal

# 1. Data preprocessing (3 points)
## 1.1. Supply (1 point)
* Load `healthcare.shp` in the data folder and name it as `supply`. 
* Create a column named `weight` and assign weights based on `TYPE` of healthcare (10 for `Hospital` and 5 for `Urgent Care`). 
* Change the coordinate system of the dataframe to State Plane Coordinate System - Illinois East (NAD83) (epsg:26971).
<br><br>

**Note**: The below is the expected result. 
<img src="./images/supply.jpg" width="60%"/>

In [None]:
# Your code here



## 1.2. Demand (1 point)
* With `census_block_groups.shp` and `pop_census.csv` in the data folder, create a GeoDataFrame named `demand` by merging them based on a column that shares information between them.
* Drop the `GEO_ID` column after the merge. 
* Change the coordinate system of the dataframe to State Plane Coordinate System - Illinois East (NAD83) (epsg:26971).
<br><br>

**Note**: The below is the expected result. 
<img src="./images/demand.jpg" width="40%"/>

In [None]:
# Your code here



## 1.3. Mobility (1 point)

* Utilize `OSMnx` package to obtain road network for `Champaign County` and assign the result to a variable `G`.
* Project the road network to State Plane Coordinate System - Illinois East (NAD83) (epsg:26971).
* Utilize the `remove_uncenessary_nodes` function below, and remove unnecessary nodes from the imported road network. 
```python
def remove_uncenessary_nodes(network):
    _nodes_removed = len([n for (n, deg) in network.out_degree() if deg == 0])
    network.remove_nodes_from([n for (n, deg) in network.out_degree() if deg == 0])
    for component in list(nx.strongly_connected_components(network)):
        if len(component) < 30:
            for node in component:
                _nodes_removed += 1
                network.remove_node(node)

    print("Removed {} nodes ({:2.4f}%) from the OSMNX network".format(_nodes_removed, _nodes_removed / float(network.number_of_nodes())))
    print("Number of nodes: {}".format(network.number_of_nodes()))
    print("Number of edges: {}".format(network.number_of_edges()))

    return network
```

In [None]:
def remove_uncenessary_nodes(network):
    _nodes_removed = len([n for (n, deg) in network.out_degree() if deg == 0])
    network.remove_nodes_from([n for (n, deg) in network.out_degree() if deg == 0])
    for component in list(nx.strongly_connected_components(network)):
        if len(component) < 30:
            for node in component:
                _nodes_removed += 1
                network.remove_node(node)

    print("Removed {} nodes ({:2.4f}%) from the OSMNX network".format(_nodes_removed, _nodes_removed / float(network.number_of_nodes())))
    print("Number of nodes: {}".format(network.number_of_nodes()))
    print("Number of edges: {}".format(network.number_of_edges()))

    return network

# Your code here




# 2. Measuring accessibility to healthcare resources (5 points)

## 2.1. Find the nearest OSM node from `supply` and `demand`. (1 point)

* Use the following `find_nearest_osm` function to search the nearest OSM node from `supply` and `demand` GeoDataFrame, respectively.
```python
def find_nearest_osm(network, gdf):
    """
    # This function helps you to find the nearest OSM node from a given GeoDataFrame
    # If geom type is point, it will take it without modification, but 
    # IF geom type is polygon or multipolygon, it will take its centroid to calculate the nearest element. 
    
    Input: 
    - network (NetworkX MultiDiGraph): Network Dataset obtained from OSMnx
    - gdf (GeoDataFrame): stores locations in its `geometry` column 
    
    Output:
    - gdf (GeoDataFrame): will have `nearest_osm` column, which describes the nearest OSM node 
                          that was computed based on its geometry column
      
    """
    for idx, row in gdf.iterrows():
        if row.geometry.geom_type == 'Point':
            nearest_osm = ox.distance.nearest_nodes(network, 
                                                    X=row.geometry.x, 
                                                    Y=row.geometry.y
                                                   )
        elif row.geometry.geom_type == 'Polygon' or row.geometry.geom_type == 'MultiPolygon':
            nearest_osm = ox.distance.nearest_nodes(network, 
                                        X=row.geometry.centroid.x, 
                                        Y=row.geometry.centroid.y
                                       )
        else:
            print(row.geometry.geom_type)
            continue

        gdf.at[idx, 'nearest_osm'] = nearest_osm

    return gdf
```

In [None]:
# Your code here

def find_nearest_osm(network, gdf):
    """
    # This function helps you to find the nearest OSM node from a given GeoDataFrame
    # If geom type is point, it will take it without modification, but 
    # IF geom type is polygon or multipolygon, it will take its centroid to calculate the nearest element. 
    
    Input: 
    - network (NetworkX MultiDiGraph): Network Dataset obtained from OSMnx
    - gdf (GeoDataFrame): stores locations in its `geometry` column 
    
    Output:
    - gdf (GeoDataFrame): will have `nearest_osm` column, which describes the nearest OSM node 
                          that was computed based on its geometry column
      
    """
    for idx, row in gdf.iterrows():
        if row.geometry.geom_type == 'Point':
            nearest_osm = ox.distance.nearest_nodes(network, 
                                                    X=row.geometry.x, 
                                                    Y=row.geometry.y
                                                   )
        elif row.geometry.geom_type == 'Polygon' or row.geometry.geom_type == 'MultiPolygon':
            nearest_osm = ox.distance.nearest_nodes(network, 
                                        X=row.geometry.centroid.x, 
                                        Y=row.geometry.centroid.y
                                       )
        else:
            print(row.geometry.geom_type)
            continue

        gdf.at[idx, 'nearest_osm'] = nearest_osm

    return gdf



## 2.2. Calculate estimated travel time for edges in the road network (1 points)

* Investigate the road network `G` and compute the `time` column in `G`. This will include the subtasks below. 
* If `maxspeed` exists in each row, maintain the current value. 
* If `maxspeed` is missing, assign `maxspeed` value of each row based on `max_speed_per_type` dictionary below.
```python
max_speed_per_type = {'motorway': 60, 
                      'motorway_link': 45, 
                      'trunk': 60,
                      'trunk_link': 45, 
                      'primary': 50,
                      'primary_link': 35, 
                      'secondary': 40,
                      'secondary_link': 35,
                      'tertiary': 40, 
                      'tertiary_link': 35,
                      'residential': 20,
                      'living_street': 20,
                      'unclassified': 20,
                      'road': 20,
                      'busway': 20
         }
```

**Note**: Be aware that the `length` column of `G` is based on meters, but `maxspeed` is MPH. You need to multiply `maxspeed` column with 26.8223 to compute meters per minute from mile per hour. 

In [None]:
# Your code here
max_speed_per_type = {'motorway': 60, 
                      'motorway_link': 45, 
                      'trunk': 60,
                      'trunk_link': 45, 
                      'primary': 50,
                      'primary_link': 35, 
                      'secondary': 40,
                      'secondary_link': 35,
                      'tertiary': 40, 
                      'tertiary_link': 35,
                      'residential': 20,
                      'living_street': 20,
                      'unclassified': 20,
                      'road': 20,
                      'busway': 20
         }

# Your code here



## 2.3. Measure accessibility (Enhanced two-step floating catchment area method) (2 points)

Now, you will interpret the following two equations into code.

### First step:

$$ R_j = \frac{S_j}{\sum_{k\in {\left\{{t_{kj}} \le {t_0} \right\}}}^{}{P_k}{W_k}}$$
where<br>
$R_j$: the supply-to-demand ratio of location $j$. <br>
$S_j$: the degree of supply (e.g., number of doctors) at location $j$. <br>
$P_k$: the degree of demand (e.g., population) at location $k$. <br>
$t_{kj}$: the travel time between locations $k$ and $j$. <br>
$t_0$: the threshold travel time of the analysis. <br>
${W_k}$: Weight based on a distance decay function

### Second step:
$$ A_i = \sum_{j\in {\left\{{t_{ij}} \le {t_0} \right\}}} R_j{W_j}$$
where<br>
$A_i$: the accessibility measures at location $i$. <br>
$R_j$: the supply-to-demand ratio of location $j$. <br>
${W_j}$: Weight based on a distance decay function<br>

### 2.3.1. Step1: Calculate the supply-to-demand ratio of each healthcare facility (1 point)

In this stage, you will calculate supply-to-demand ratio ($R_j$) of each healthcare resource, and store the ratio into `ratio` column in the `supply` GeoDataFrame. The ratio should be depreciated based on the travel time and the weights provided below. <br>
In other words, each facility will have a catchment area that consists of three subzones. The inner subzone will be drawn from a 10-minute travel time and has a weight of 1. The middle subzone will be drawn from a 20-minute travel time and has a weight of 0.68. The outer subzone will be drawn from a 30-minute travel time and has a weight of 0.22. 

```python
minutes = [10, 20, 30]
weights = {10: 1, 20: 0.68, 30: 0.22}
```

The function `calculate_catchment_area` will help you to calculate the three subzones for each facility. 

```python
def calculate_catchment_area(network, nearest_osm, minutes, distance_unit='time'):
    polygons = gpd.GeoDataFrame()

    # Create convex hull for each travel time (minutes), respectively.
    for minute in minutes:
        access_nodes = nx.single_source_dijkstra_path_length(G=network, 
                                                             source=nearest_osm, 
                                                             cutoff=minute, 
                                                             weight=distance_unit
                                                            )
        convex_hull = nodes.loc[
                                nodes.index.isin(access_nodes.keys()), 'geometry'
                               ].unary_union.convex_hull

        polygons.at[minute, 'geometry'] = convex_hull
  

    # Calculate the differences between convex hulls which created in the previous section.
    polygons_ = polygons.copy(deep=True)
    for idx, minute in enumerate(minutes):
        if idx != 0:
            current_polygon = polygons.loc[[minute]]
            previous_polygons = polygons.loc[[minutes[idx-1]]]
            diff_polygon = gpd.overlay(current_polygon, previous_polygons, how="difference")
            if diff_polygon.shape[0] != 0:
                polygons_.at[minute, 'geometry'] = diff_polygon['geometry'].values[0]

    if polygons_.shape[0]:
        polygons_ = polygons_.set_crs(epsg=26971)
                
    return polygons_.copy(deep=True)

```

In [None]:
# Extract the nodes and edges of the network dataset for the future analysis. 
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True, node_geometry=True)

In [None]:
def calculate_catchment_area(network, nearest_osm, minutes, distance_unit='time'):
    polygons = gpd.GeoDataFrame()

    # Create convex hull for each travel time (minutes), respectively.
    for minute in minutes:
        access_nodes = nx.single_source_dijkstra_path_length(G=network, 
                                                             source=nearest_osm, 
                                                             cutoff=minute, 
                                                             weight=distance_unit
                                                            )
        convex_hull = nodes.loc[
                                nodes.index.isin(access_nodes.keys()), 'geometry'
                               ].unary_union.convex_hull

        polygons.at[minute, 'geometry'] = convex_hull
  

    # Calculate the differences between convex hulls which created in the previous section.
    polygons_ = polygons.copy(deep=True)
    for idx, minute in enumerate(minutes):
        if idx != 0:
            current_polygon = polygons.loc[[minute]]
            previous_polygons = polygons.loc[[minutes[idx-1]]]
            diff_polygon = gpd.overlay(current_polygon, previous_polygons, how="difference")
            if diff_polygon.shape[0] != 0:
                polygons_.at[minute, 'geometry'] = diff_polygon['geometry'].values[0]

    if polygons_.shape[0]:
        polygons_ = polygons_.set_crs(epsg=26971)
                
    return polygons_.copy(deep=True)

**Note**: The below is the expected result. 
<img src="./images/step1.jpg" height="60%"/>

In [None]:
supply['ratio'] = 0

minutes = [10, 20, 30]
weights = {10: 1, 20: 0.68, 30: 0.22}

# Your code here 



### 2.3.2. Step2: Aggregate the supply-to-demand ratio for each census block group (1 point)

In this stage, you will aggregate the supply-to-demand ratio, which was calculated in the step above, for each census block group (`demand`). Assign the aggregated result into `access` column at `demand` GeoDataFrame. You can still utilize `calculate_catchment_area` function to facilitate your analysis. 

**Note**: The below is the expected result. 
<img src="./images/step2.jpg" height="60%"/>

In [None]:
demand['access'] = 0

# Your code here



## 2.4. Plot the measures of accessibility (1 point)

Try your best to mimic the map shown below, which demonstrate the measure of accessibility to healthcare resource at Champaign County. <br>
To achieve this, you need to 
1) Plot the location of healthcare resources (`supply`). <br>
2) Plot a Choropleth map with the `access` column in `demand`. <br>
3) Use grey color to visualize locations without access <br>
4) Hide x-axis and y-axis of the figure. 

**Note**: The below is the expected result. 
<img src="./images/access.jpg" height="60%"/>

In [None]:
# Your code here



# 3. Calculate spatial autocorrelation based on the accessibility measure (2 points)

Calculate **Moran's I** and **Local Moran's I** based on the accessibility measures. If you fail to finish the accessibility measurements, you can use `step2.shp` in the data folder for this task. 

* Compute weights (`w`) with `libpysal.weights.DistanceBand`, which will be utilized for calculating spatial autocorrelation. 
* Fixed distance will be 10000 and alpha value for distance decay is -1.  

If you are looking for places to search, visit <a href=https://pysal.org/libpysal/generated/libpysal.weights.DistanceBand.html>`libpysal.weights.DistanceBand()`</a>, <a href=https://pysal.org/esda/generated/esda.Moran.html>`esda.Moran()`</a>, <a href=https://pysal.org/esda/generated/esda.Moran_Local.html>`esda.Moran_Local()`</a>. 


## 3.1. Calculate Moran's I of accessibility measure (1 point)

Utilize `esda.moran.Moran()` and print the `Moran's I`.

In [None]:
# Your code here



## 3.2. Calculate Local Moran's I (1 point)

Utilize `esda.moran.Moran_Local()` function and plot the Local Moran's I result as shown below. Use the following code to color your result if the classification is statistically significant (p-value < 0.05). 

```python
lm_dict = {1: 'HH', 2: 'LH', 3: 'LL', 4: 'HL'}
lisa_color = {'HH': 'red', 'LL': 'blue', 'HL': 'orange', 'LH': 'skyblue', 'Not_Sig': 'lightgrey'}
```

**Note**: The map can be slightly different for every run, since the equation is based on a simulation.

<img src="./images/lisa.jpg" height="50%"/>

In [None]:
# Your code here

