# Suitability Analysis: Best ZCTA Within the Boston Region to Live Without a Car 
UEP-239 Final Project\
By: Justina Cheng

The purpose of this suitability analysis is to find the best Zip Code Tabulation Area (ZCTA) within the Boston region to live without a car. The original self-serving conception was to find the best ZCTA for a Tufts University student and a Boston University (BU) student to live together without a car (i.e. somewhere equally convenient for both), but the scope was changed for a more general analysis. The locations of Tufts and BU were  still used as a guide and as landmarks for the analysis.

---

## Import Dependencies

In [None]:
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import math
import statistics
from scipy import stats

import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster

import osmnx as ox
import networkx as nx
from geopy.geocoders import Nominatim
from pyproj import CRS
from shapely.geometry import LineString, Point, Polygon, box

import rasterio
from rasterio.plot import show
from rasterio import features

import richdem as rd
from scipy import ndimage
from rasterstats import zonal_stats

---

## Create and View Base Map of Boston Region Zip Code Tabulation Areas (ZCTAs)
To create a GeoDataFrame of the Boston Region ZCTAs, the following steps were used:
1. Massachusetts outline with detailed coastline was imported from MassGIS as a GeoDataFrame.
1. Massachusetts ZCTAs were imported from the Census Bureau as a GeoDataFrame.
1. The outline and ZCTAs GeoDataFrames were converted to the coordinate reference system (CRS) for the Massachusetts Mainland EPSG 6491.
1. Boundaries for the Boston Region Metropolitan Planning Organization (MPO) were imported from MassDOT as a GeoDataFrame, and the CRS was converted to EPSG 6491.
1. The Boston Region was extracted from the MPO.
1. Massachusetts ZCTAs within the Boston Region were extracted using the centroid of the ZCTAs.
1. Function `convert_n_clip` was created to convert a GDF to the CRS of another GDF and clip to the other's extent.
1. Function `read_n_clip` was created to read in a shapefile and use `convert_n_clip` to convert it to the CRS of another GDF and clip to the other's extent.
1. Massachusetts Surface Water data from MassGIS was processed with `read_n_clip` with the extent of Boston ZCTAs. 

### Massachusetts Coastline

In [None]:
# Import outline of detailed Massachusetts coastline.
outline_25k = gpd.read_file("./data/outline25k/OUTLINE25K_POLY.shp")
outline_25k.info()

In [None]:
# View CRS and plot.
print(outline_25k.crs)
outline_25k.plot(figsize=(12,12))
plt.title('Massachusetts Detailed Coastline', fontsize=16)
plt.show()

### Massachusetts ZCTAs

In [None]:
# Import Zip Code Tabulation Areas within Massachusetts.
ma_zcta = gpd.read_file("./data/tl_2010_25_zcta500/tl_2010_25_zcta500.shp")
ma_zcta.info()

In [None]:
# View CRS and plot.
print(ma_zcta.crs)
ma_zcta.plot(figsize=(12,12))
plt.title('Massachusetts ZCTAs', fontsize=16)
plt.show()

In [None]:
# Convert CRSs to Massachusetts Mainland EPSG 6491.
outline_25k = outline_25k.to_crs('epsg:6491')
ma_zcta = ma_zcta.to_crs('epsg:6491')
# Confirm CRSs match.
outline_25k.crs == ma_zcta.crs

In [None]:
# Clip ZCTA GDF to 25k MA outline.
ma_zcta_25k = gpd.clip(ma_zcta, outline_25k)
ma_zcta_25k.info()

### Boston Region Metropolitan Planning Organization (MPO)

In [None]:
# Import boundaries from Boston Region Metropolitan Planning Organization.
mpo = gpd.read_file("./data/MPO_Boundaries/MPO_Boundaries.shp")
mpo.info()

In [None]:
# View MPO dataset.
mpo

In [None]:
# Convert MPO CRS to EPSG 6491 and plot.
mpo = mpo.to_crs('epsg:6491')
mpo.plot(figsize=(12,12))
plt.title('Boston Region MPO Boundaries', fontsize=16)
plt.show()

In [None]:
# Extract only Boston Region from MPO.
boston_region = mpo.loc[mpo.MPO == 'Boston Region'].reset_index()
boston_region

In [None]:
# Extract ZCTAs within the Boston Region using the centroid of the ZCTAs.
boston_zcta = ma_zcta_25k[ma_zcta_25k.centroid.within(boston_region.geometry[0])].reset_index()
boston_zcta.info()

In [None]:
# View the Boston Region ZCTAs.
boston_zcta

In [None]:
# Plot the Boston Region ZCTAs.
boston_zcta.plot(figsize=(12,12))
plt.title('ZCTAs within Boston Region', fontsize=16)
plt.show()

### Define Functions `convert_n_clip` and `read_n_clip`
These two functions will be used to read in, convert the CRS, and clip the extent of GDFs throughout the analysis.

`convert_n_clip` takes two GeoDataFrames (GDF): one to process (gdf) and one whose extent will be used to clip. The function converts the coordinate reference system (CRS) of the original GDF and clips it to the extent of the extent GDF.

In [None]:
def convert_n_clip(orig_gdf, extent_gdf):
    """
    Takes two GeoDataFrames (GDF): one to process (orig_gdf) and one whose extent will be used to clip (extent_gdf).
    Converts the coordinate reference system (CRS) of the orig_gdf to the CRS of extent_gdf.
    Clips to the extent of orig_gdf to the extent of extent_gdf.
    Returns clipped GDF.
    Requires GeoPandas to run.
    
    Inputs:
    orig_gdf = GDF to process
    extent_gdf = GDF whose extent to use
    
    Example:
    ma_schools = convert_n_clip(usa_schools, ma_boundary)
    """
    orig_gdf = orig_gdf.to_crs(extent_gdf.crs)
    clipped_gdf = gpd.clip(orig_gdf, extent_gdf)
    return clipped_gdf   

`read_n_clip` takes a filepath for a shapefile and a GDF whose extent will be used to clip the shapefile. The function reads in the shapefile and uses `convert_n_clip` to convert the coordinate reference system (CRS) of the original GDF and clip it to the extent of the extent GDF.

In [None]:
def read_n_clip(filepath, extent_gdf):
    """
    Takes a filepath for a shapefile and a GeoDataFrame (GDF).
    Reads in the file.
    Uses convert_n_clip function to convert to the coordinate reference system (CRS)
    of the GDF and clip to the extent of the GDF. 
    Returns clipped GDF.
    Requires GeoPandas to run.
    
    Inputs:
    filepath = relative filepath for shapefile to read
    extent_gdf = GDF whose extent to use
    
    Example:
    ma_water = read_n_clip('./data/usa/water.shp', ma_boundary)
    """
    shapefile = gpd.read_file(filepath)
    clipped_shapefile = convert_n_clip(shapefile, extent_gdf)
    return clipped_shapefile

### Boston Region Surface Water

In [None]:
# read_n_clip Boston surface water.
boston_water = read_n_clip('./data/hydro25k/HYDRO25K_POLY.shp', boston_zcta)
print(boston_water.crs)
boston_water.info()

In [None]:
# Plot the Boston Region ZCTAs with surface water.
ax = boston_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax)
plt.title('ZCTAs and Waterways within Boston Region', fontsize=16)
plt.show()

---

## Find Tufts University and Boston University Locations
To find the locations of Tufts University and Boston University (BU), the Massachusetts Colleges and Universities shapefile was processed with `read_n_clip` to read the shapefile and clip it to the extent of `boston_zcta`. Tufts and BU were then extracted into a GeoDataGrame.

In [None]:
# read_n_clip Boston MPO colleges.
colleges = read_n_clip('./data/colleges/COLLEGES_PT.shp', boston_zcta)
print(colleges.crs)
colleges.info()

In [None]:
# Plot Boston MPO colleges.
ax = boston_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
colleges.plot(ax=ax, color='maroon')
plt.title('Colleges in Boston Region MPO', fontsize=16)
plt.show()

In [None]:
# List all college names.
college_list = list(colleges.COLLEGE.unique())
college_list

In [None]:
# Select only names matching Tufts University or Boston University.
colleges_select = colleges.loc[colleges.COLLEGE.isin(['Tufts University', 'Boston University'])]
colleges_select

In [None]:
# Select only the Medford/Somerville Tufts Campus and the main BU Campus.
tufts_bu = colleges_select.iloc[[0, 1]]
tufts_bu

In [None]:
# Plot Tufts and BU on top of the base map.
ax = boston_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50)
plt.title('Tufts University and Boston Universtiy Over Boston Region MPO', fontsize=16)
plt.show()

---

## Import Mass Transit Stops and Routes

### MBTA Bus Stops

In [None]:
# read_n_clip MBTA bus stops, check CRS, and view info.
bos_bus = read_n_clip('./data/MBTA_Bus_Routes_and_Stops/MBTA_Bus_Routes_and_Stops.shp', boston_zcta)
print(bos_bus.crs)
bos_bus.info()

### MBTA Rapid Transit (T) Stops and Routes
Stops were read in and processed with `read_n_clip`. However, because routes often go over or under water, routes were read in with `gpd.read_file` and converted to the proper CRS.

In [None]:
# read_n_clip MBTA rapid transit (T) stops, check CRS, and view info.
bos_rt_node = read_n_clip('./data/mbta_rapid_transit/MBTA_NODE.shp', boston_zcta)
print(bos_rt_node.crs)
bos_rt_node.info()

In [None]:
# Read in MBTA rapid transit (T) routes, convert CRS, check CRS, and view info.
bos_rt_route = gpd.read_file('./data/mbta_rapid_transit/MBTA_ARC.shp')
bos_rt_route = bos_rt_route.to_crs(boston_zcta.crs)
print(bos_rt_route.crs)
bos_rt_route.info()

### Commuter Rail Stops and Routes
Stops were read in and processed with `read_n_clip`. Routes were originally read in with `gpd.read_file` and converted to the proper CRS to preserve areas over or under water, but the extent greatly exceeded that of the Boston Region MPO. Therefore, routes were read in and processed with `read_n_clip`.

In [None]:
# read_n_clip Commuter Rail stops, check CRS, and view info.
bos_train_node = read_n_clip('./data/trains/TRAINS_NODE.shp', boston_zcta)
print(bos_train_node.crs)
bos_train_node.info()

In [None]:
# read_n_clip Commuter Rail routes, check CRS, and view info.
bos_train_route = read_n_clip('./data/trains/TRAINS_RTE_TRAIN.shp', boston_zcta)
print(bos_train_route.crs)
bos_train_route.info()

### Map Mass Transit with Base Map

In [None]:
ax = boston_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
bos_bus.plot(ax=ax, color='red', markersize=5, label='Bus Stop')
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='Rapid Transit Stop')
bos_rt_route.plot(ax=ax, color='green', label='Rapid Transit Route')
bos_train_node.plot(ax=ax, color='purple', markersize=20, label='Train Stop')
bos_train_route.plot(ax=ax, color='purple', label='Train Route')
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50)
plt.title('Mass Transit in Boston Region MPO', fontsize=16)
plt.legend()
plt.show()

---

## Limit Study Area to Extent of Rapid Transit
Judging by the vast extent of the commuter rail, the locations of Tufts and BU, and the density of bus and T stops, the outer ZCTAs within the Boston Region MPO are more untenable for regular commutes to work or to campus. Mass transit density analysis that include the outer ZCTAs will skew those within the range of the bus and the T, so they have been excluded from the study.

### Limit with Rectangular Bounds
To limit the study area to the extent of the T for a more realistic comparison of ZCTAs, the following steps were used:
1. Extract the rectangular bounds of MBTA Rapid Transit (T) stops.
1. Create a bounding box with `shapely.geometry.box`.
1. Add a buffer to the bounding box and store as a new extent.
1. Extract Boston Region ZCTAs whose centroids are within the extent.

In [None]:
# Extract bounds of Boston Rapid Transit (T) nodes.
rt_bounds = bos_rt_node.geometry.total_bounds
rt_bounds

In [None]:
# Creating bounding box with shapely.geometry.box
# shapely.geometry.box(minx, miny, maxx, maxy, ccw=True)
rt_bound_box = box(rt_bounds[0], rt_bounds[1], rt_bounds[2], rt_bounds[3])
rt_bound_box

In [None]:
# Store the extent as a Shapely Polygon in a variable called graph_extent.
graph_extent = rt_bound_box.buffer(0.1, join_style=2)
graph_extent

In [None]:
# Extract Boston Region ZCTAs within the graph extent using the centroid of the ZCTAs.
rt_zcta_box = boston_zcta[boston_zcta.centroid.within(graph_extent)]
rt_zcta_box.info()

In [None]:
# View first five rows of rt_zcta.
rt_zcta_box.head()

In [None]:
# Plot the Boston Region ZCTAs within graph extent with the T to confirm success.
ax = rt_zcta_box.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='Rapid Transit Stop')
bos_rt_route.plot(ax=ax, color='green', label='Rapid Transit Route')
plt.title('ZCTAs within Boston Region Rapid Transit Extent', fontsize=16)
plt.show()

### Limit with Convex Hull
There are a number of ZCTAs that are within the defined rectangular bounds but are not close to a T stop. The extent was further limited to the convex hull of T stops with the following steps:
1. Create a convex hull of T stops with `unary_union.convex_hull`.
1. Add a buffer to the convex hull and store as a new extent.
1. Extract Boston Region ZCTAs that intersect with the extent.

In [None]:
# Create a convex hull from T stops.
convex_bounds = bos_rt_node.unary_union.convex_hull
convex_bounds

In [None]:
# Store the extent as a Shapely Polygon in a variable called convex_graph_extent.
convex_graph_extent = convex_bounds.buffer(0.1)
convex_graph_extent

In [None]:
# Extract Boston Region ZCTAs that intersect with the graph extent.
rt_zcta = boston_zcta[boston_zcta.intersects(convex_graph_extent)]
rt_zcta.info()

In [None]:
# Plot the Boston Region ZCTAs within convex graph extent to confirm success.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='Rapid Transit Stop')
bos_rt_route.plot(ax=ax, color='green', label='Rapid Transit Route')
plt.title('ZCTAs within Boston Region Rapid Transit Extent', fontsize=16)
plt.show()

### Clip All Relevant GDFs
The following GDFs were clipped to the new `rt_zcta` extent:
- `boston_water`
- `bos_bus`
- `bos_train_node`
- `bos_train_route`

Because `bos_rt_node` was used to create the extent and `bos_rt_route` connects all T stops, `bos_rt_route` does not need to be clipped.

In [None]:
# Clip all relevant GDFs.
boston_water = gpd.clip(boston_water, rt_zcta)
bos_bus = gpd.clip(bos_bus, rt_zcta)
bos_train_route = gpd.clip(bos_train_route, rt_zcta)
bos_train_node = gpd.clip(bos_train_node, rt_zcta)

In [None]:
# Plot new extent with mass transit and schools.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
bos_bus.plot(ax=ax, color='red', markersize=1, label='Bus Stop')
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='Rapid Transit Stop')
bos_rt_route.plot(ax=ax, color='green', label='Rapid Transit Route')
bos_train_node.plot(ax=ax, color='purple', markersize=20, label='Train Stop')
bos_train_route.plot(ax=ax, color='purple', label='Train Route')
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=5)
plt.title('Mass Transit within Boston Region Rapid Transit Extent', fontsize=16)
plt.legend()
plt.show()

#### TEST CODE

bos_bus.head()

bos_bus_latlong = bos_bus.to_crs('epsg:4326')
bos_bus_latlong

bos_bus['latitude'] = bos_bus_latlong.geometry.y
bos_bus['longitude'] = bos_bus_latlong.geometry.x
bos_bus.head()

m_1 = folium.Map(location=[42.32, -71.0589], tiles='openstreetmap', zoom_start=20)
mc_1 = MarkerCluster()
for idx, row in bos_bus.iterrows():
    mc_1.add_child(folium.Marker([row['latitude'], row['longitude']], icon=folium.Icon(color='red', icon='bus')))

mc_1.add_to(m_1)
m_1

---

## Mass Transit Accessibility
As the study's premise is living without a car, it is crucial that the home's location be easily accessible by mass transit. Transit stops were selected as an indicator for convenience of mass transit. While the routes are also important to consider, the stops are the on-off points to transit lines and necessary to accessing the transit systems.

The density of mass transit stops was calculated with the following steps:
1. Function `count_records` was created to count the number of records in a GDF within polygons of another GDF (e.g. number of bus stops within a ZCTA).
1. `count_records` was used on the GDFs for bus stops, T stops, and train stops in the limited `rt_zcta` extent.
1. Function `multimerge` was created to merge multiple DataFrames on the same column or list of columns. 
1. `multimerge` was used to add all mass transit stop counts to `rt_zcta` 
1. Total stops and stop density were calculated for all ZCTAs and mapped.

### Count Public Transit Nodes per ZCTA

#### Define Function `count_records`
`count_records` takes a GeoDataFrame and counts the number of records within another GDF of polygons. It outputs a DataFrame with the specified polygon column values and counts column. The optional argument `op` for `gpd.sjoin` defaults to `'within'` unless otherwise specified.

In [None]:
def count_records(records_gdf, polygon_gdf, polygon_col, count_col, op='within'):
    """
    Takes a GeoDataFrame and counts the number of records within another GDF of polygons. 
    Outputs a DataFrame with the specified polygon column values and counts.
    Optional argument op defaults to 'within' unless otherwise specified.
    Requires Pandas and GeoPandas to run.
    
    Inputs:
        records_gdf = GDF of records to count
        polygon_gdf = GDF of polygons to count from
        polygon_col = name of column in polygon_gdf
            e.g. 'name'
        count_col = name of column for counts in output
            e.g. 'tree_count'
        op = op argument for sjoin; defaults to 'within' unless otherwise specified
    
    Example:
    >>> tree_count = count_records(trees, towns, 'name', 'tree_count')
    >>> tree_count
        name          tree_count
    0   Plainsville   68
    1   Springfield   40
    2   Fairfield     81
    3   Greenville    105
    """
    
    # Conduct a spatial join of records and polygons.
    spatial_join = gpd.sjoin(records_gdf, polygon_gdf, how='left', op=op)
    
    # Count the number of records within each polygon.
    records_count = spatial_join[polygon_col].value_counts().reset_index()
    
    # Add columns to the new DF.
    records_count.columns = [polygon_col, count_col]
    
    return records_count

#### Count MBTA Bus Stops per ZCTA

In [None]:
# count_records for bus stops within rt_zcta.
zcta_bus_count = count_records(bos_bus, rt_zcta, 'ZCTA5CE00', 'bus_stop_count')
zcta_bus_count.describe()

#### Count MBTA T Stops per ZCTA

In [None]:
# count_records for T stops within rt_zcta.
zcta_rt_count = count_records(bos_rt_node, rt_zcta, 'ZCTA5CE00', 'rt_stop_count')
zcta_rt_count.describe()

#### Count Commuter Rail Stops per ZCTA

In [None]:
# count_records for Commuter Rail stops within rt_zcta.
zcta_train_count = count_records(bos_train_node, rt_zcta, 'ZCTA5CE00', 'train_stop_count')
zcta_train_count.describe()

### Define Function `multimerge`
`multimerge` takes a base DataFrame and merges it with each DataFrame in a list of DataFrames on the specified column or list of columns and with the specified `how`. The function assumes the specified column(s) exist(s) across all DataFrames.

In [None]:
def multimerge(left_df, df_list, on_col, how):
    """
    Takes a base DataFrame and merges with each DataFrame in a list of DataFrames 
    on the specified column or list of columns and with the specified 'how'.
    Assumes on_col exists across all DFs.
    Requires Pandas to run merge method.
    
    Inputs:
        left_df = base DF
        df_list = list of DFs
            e.g. [df1, df2, df3]
        on_col = bracketed column name or list of columns (same across DFs)
            e.g. ['name'], ['name', 'address', 'zip_code']
        how = how argument
            e.g. 'left'
    
    Example:
    >>> town_schools = multimerge(town, [elem, middle, high], ['town_name'], 'left')
    """
    
    # Create a copy of the base DataFrame.
    merge_df = left_df.copy()
    
    # Merge each DataFrame within the list.
    for i in range(len(df_list)):
        merge_df = merge_df.merge(df_list[i], on=on_col, how=how)
        
    return merge_df

### Calculate Mass Transit Density per ZCTA

In [None]:
# Merge rt_zcta with all transit stop counts.
count_list = [zcta_bus_count, zcta_rt_count, zcta_train_count]
zcta_nodes = multimerge(rt_zcta, count_list, ['ZCTA5CE00'], 'left').fillna(0)
zcta_nodes.info()

In [None]:
# View first five rows of new GDF.
zcta_nodes.head()

In [None]:
# Calculate total transit stops in each ZCTA.
zcta_nodes['nodes_count'] = zcta_nodes.bus_stop_count + zcta_nodes.rt_stop_count + zcta_nodes.train_stop_count
zcta_nodes

In [None]:
# Map the number of transit stops in each ZCTA.
ax = zcta_nodes.plot(column='nodes_count',
                      legend=True,
                      edgecolor='black',
                      cmap='OrRd',
                      figsize=(12, 12),
                      legend_kwds={'label': "Number of Mass Transit Stops"})
plt.title('Number of Mass Transit Stops by ZCTA in Boston Region', fontsize=16)
plt.show()

In [None]:
# Calculate node density in nodes/sqkm.
zcta_nodes['nodes_density'] = zcta_nodes.nodes_count/zcta_nodes.area*(10**6)
zcta_nodes.sort_values(by='nodes_density', ascending=False).head()

In [None]:
# View statistics for nodes_density.
zcta_nodes.nodes_density.describe()

The top value for `nodes_density` greatly exceeds the next value, despite having a low `nodes_count`, indicating it is an outlier. The ZCTA in question, ZCTA 02222, appears to contain only TD Garden and North Station. Though setting the `nodes_density` value for ZCTA 02222 to the median value to prevent skewing the analysis was contemplated, the decision was ultimately made to allow the outlier as reclassification would nearly negate the skew.

##### ZCTA 02222 value for nodes_density set to median value.
zcta_nodes.loc[61, 'nodes_density'] = zcta_nodes.nodes_density.median()
zcta_nodes.sort_values(by='nodes_density', ascending=False).head()

##### View statistics for nodes_density.
zcta_nodes.nodes_density.describe()

In [None]:
# Map the density of transit nodes in each ZCTA.
ax = zcta_nodes.plot(column='nodes_density',
                      legend=True,
                      edgecolor='black',
                      cmap='YlGnBu',
                      figsize=(12, 12),
                      legend_kwds={'label': "Mass Transit Stops per sqkm"})
plt.title('Density of Mass Transit Stops by ZCTA in Boston Region', fontsize=16)
plt.show()

### Reclassify Mass Transit Density
To reclassify indicators, quantile values need to be found for each indicator and used to reclassify values in roughly equal segments. The `quantiles` function was created to find any number of quantiles, while the `reclass_5` function was created to reclassify indicators into five classes.

#### Define Functions `quantiles` and `reclass_5`
`quantiles` takes a DataFrame, a column name, and a list of quantile thresholds (between 0 and 1) and outputs a list of quantile values for the column.

In [None]:
def quantiles(df, col, threshold_list):
    """
    Takes a DataFrame, a column name, and a list of quantile thresholds 
    (between 0 and 1) and outputs a list of quantile values for the column.
    Requires NumPy to run numpy.quantile function.
    
    Inputs:
        df = DataFrame variable name
        col = column name as a string
            e.g. 'mean'
        threshold list = bracketed list of quantile thresholds between 0 and 1
            e.g. [0.25, 0.5, 0.75], [0.2, 0.4, 0.6, 0.8]
    
    Example:
    >>> quarts = [0.25, 0.5, 0.75]
    >>> quantiles(student, 'grades', quarts)
    [65.6, 80.5, 88.0]
    
    """
    # Create empty list.
    quant_list = []
    
    # Calculate and append quantiles.
    for i in range(len(threshold_list)):
        quant_list.append(np.quantile(df[col], threshold_list[i]))
        
    return quant_list                                       

`reclass_5` takes a value and reclassifies it into 1 of 5 classes given a list of values for class thresholds and order preference.

In [None]:
# Create a function that reclassifies an array into 5 classes.
def reclass_5(val, class_list, order):
    """
    Takes a value and reclassifies it into 1 of 5 classes given a list
    of values for class thresholds and order preference.
    Assumes no overall minimum or maximum.
    Requires NumPy to run to account for np.NaN values.
    
    Inputs:
        val = value to classify
        class_list = numeric list of class thresholds in any order
            e.g. [1.5, 3, 6, 4.5] or [100, 200, 400, 800]
        order = 'low' or 'high' for which values are preferable
            e.g. 'low' indicates lower values are preferable
    
    Example:
    >>> thresholds = [200, 800, 400, 600]
    >>> reclass_5(693, thresholds, 'high')
    4
    
    """
    
    # Assert class_list is a list, has four values, and all values are numeric.
    assert type(class_list)==list, "class_list must be a list."
    assert len(class_list)==4, "class_list must have four values."
    assert all(isinstance(x, (int, float)) for x in class_list), 
        "class_list must be comprised of only numbers."
    
    # Sort class_list descending.
    class_sort = sorted(class_list, reverse=True)
    
    # Return np.NaN if value is np.NaN.
    if np.isnan(val):
        return np.NaN
    
    # Reclassify if lower values are preferred.
    elif order=='low':
        if val >= class_sort[0]:
            return 1
        elif val >= class_sort[1]:
            return 2
        elif val >= class_sort[2]:
            return 3
        elif val >= class_sort[3]:
            return 4
        else:
            return 5
    
    # Reclassify if higher values are preferred.
    elif order=='high':
        if val >= class_sort[0]:
            return 5
        elif val >= class_sort[1]:
            return 4
        elif val >= class_sort[2]:
            return 3
        elif val >= class_sort[3]:
            return 2
        else:
            return 1

In [None]:
# Calculate values to separate median into five quantiles.
quintiles = [0.2, 0.4, 0.6, 0.8]
nodes_quints = quantiles(zcta_nodes, 'nodes_density', quintiles)
nodes_quints

In [None]:
# Reclassify nodes_density.
zcta_nodes['nodes_density_reclass'] = zcta_nodes['nodes_density'].apply(lambda x: reclass_5(x, nodes_quints, 'high'))

In [None]:
# View top and bottom five transit-dense ZCTAs.
zcta_nodes.sort_values(by='nodes_density_reclass', ascending=False)

In [None]:
# Map the reclassified mass transit density in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_nodes.plot(column='nodes_density_reclass',
                 legend=True,
                 edgecolor='black',
                 cmap='plasma_r',
                 legend_kwds={'label': "Reclassified mass transit density"},
                 ax=ax)
plt.title('Reclassified Density of Mass Transit by ZCTA in Boston Region', fontsize=16)
plt.show()

---

## Rent Affordability
Rent is a crucial expense for any non-homeowner and a large cost-burden, especially in an urban area. Rent affordability in this study was judged by the median rent for a two- or three-bedroom home in each ZCTA. 

Data was obtained from [Jeff Kaufman's Apartment Price Map](https://www.jefftk.com/apartment_prices/details), which scraps data from [Padmapper](https://www.padmapper.com/). Though the dataset does not cover the entire defined extent, it was the best source of point data found for the Boston Region.

Zillow data was also considered as a source, but it was much more limited in extent. Only 37 of the 62 ZCTAs in question were available in the Zillow dataset. Zillow analysis has been included as an appendix at the end of the study for those who are curious. 

The following steps were used to analyze rental data:
1. Read in CSV file with point data on rental listings.
1. Conduct a spatial join to match listings with ZCTAs.
1. Calculate statistics on rental prices, including `len`, `min`, `max`, `median`, `mean`, `std`. 
1. Reclassify median rent prices in quintiles.

In [None]:
# Read in CSV file of Boston region rental data, obtained from Padmapper via Jeff Kaufman.
rent_df = pd.read_csv('./data/20200919_rental_data.csv')
# Convert to GeoDataFrame, setting CRS to EPSG:4326
rent = gpd.GeoDataFrame(rent_df, geometry=gpd.points_from_xy(rent_df.longitude, rent_df.latitude))
rent = rent.set_crs('epsg:4326')

In [None]:
# View original rent dataset.
rent

In [None]:
# convert_n_clip rental data to CRS and extent of study area rt_zcta.
rent = convert_n_clip(rent, rt_zcta)
rent

In [None]:
# Narrow down listings dataset to 2 or 3 bedrooms.
rent_br = rent.loc[rent.rooms.isin([2, 3])]
rent_br

In [None]:
# Plot rental listings with schools.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
rent_br.plot(ax=ax, column='price', cmap='YlGnBu', markersize=1, label='2-3BR Rentals')
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='Rapid Transit Stop')
bos_rt_route.plot(ax=ax, color='green', label='Rapid Transit Route')
tufts_bu.plot(ax=ax, color='maroon', markersize=50)
plt.title('2-3BR Rentals within Boston Region Rapid Transit Extent', fontsize=16)
plt.legend()
plt.show()

### Conduct Spatial Join to Match Rents with ZCTAs

In [None]:
rent_br_zcta = gpd.sjoin(rent_br, rt_zcta, how='left', op='within')
rent_br_zcta

### Calculate Statistics for Rent Prices 

In [None]:
# Calculate statistics on price by ZCTA.
zcta_rent_stats = rent_br_zcta.groupby('ZCTA5CE00').price.agg([len, min, max, np.median, np.mean, np.std])
zcta_rent_stats

In [None]:
# View statistics for median rent prices.
zcta_rent_stats['median'].describe()

In [None]:
# Merge statistics with spatial ZCTA rental GDF.
zcta_rent = rt_zcta.merge(zcta_rent_stats, on='ZCTA5CE00', how='left')
zcta_rent.info()

In [None]:
# Rename columns to include rent (important for merging data later).
zcta_rent = zcta_rent.rename(columns={'len':'rent_len', 'min':'rent_min', 'max':'rent_max', 'median':'rent_median', 'mean':'rent_mean', 'std':'rent_std'})

In [None]:
# Map the median rent price in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_rent.plot(column='rent_median',
                 legend=True,
                 edgecolor='black',
                 cmap='YlGnBu',
                 legend_kwds={'label': "Median rental price"},
                 ax=ax)
plt.title('Median Rental Price for 2- or 3-BR Homes by ZCTA in Boston Region', fontsize=16)
plt.show()

### Reclassify Rent Prices

In [None]:
# Calculate values to separate median into five quantiles.
# Use same quintiles list of thresholds from reclassifying transit.
rent_quants = quantiles(zcta_rent_stats, 'median', quintiles)
rent_quants

In [None]:
# Reclassify median rent prices with reclass_5 function and quintile values.
zcta_rent['rent_median_reclass'] = zcta_rent['rent_median'].apply(lambda x: reclass_5(x, rent_quants, 'low'))

In [None]:
# View top and bottom five median rental ZCTAs.
zcta_rent.sort_values(by='rent_median_reclass', ascending=False)

In [None]:
# Map the median rent price in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_rent.plot(column='median_reclass',
                 legend=True,
                 edgecolor='black',
                 cmap='plasma_r', #reverse colormap to indicate lower values are preferable
                 legend_kwds={'label': "Reclassified median rental price"},
                 ax=ax)
plt.title('Reclassified Median Rental Price for 2- or 3-BR Homes by ZCTA in Boston Region', fontsize=16)
plt.show()

---

## Locations and Density of Necessities
Accessibility of necessities and amenities is crucial to living anywhere. For the purposes of this study, necessities were defined as follows:
- Food
    - Groceries (e.g. supermarkets and food purveryors, like  greengrocers, butchers, etc.)
    - Prepared food (e.g. restaurants, cafes, etc.)
    - Farmers markets
- Health Services
    - Community health centers
    - Hospitals
    - Healthcare facilities (e.g. doctors' offices, pharmacies, dentists, etc.)
- Public Services
    - Fire stations
    - Police stations
    - USPS Post Offices
    - Libraries

Some data was imported from sources such as MassGIS while others were retrieved using `OpenStreetMap` and `OSMnx`'s `geometries_from_polygon` function.

### Create Extent in Latitude-Longitude to Use With `OSMnx`
To use `OSMnx geometries_from_polygon`, a polygon needs to be created in latitude-longitude coordinates. This was accomplished with the following steps:
1. Convert the T stops shapefile to `EPSG:4326` for lat-long and extract its rectangular bounds.
1. Create a bounding box with `shapely.geometry.box`.
1. Add a buffer to the bounding box and store as a new extent.

In [None]:
# Extract bounds of T Stops.
rt_bounds_latlong = bos_rt_node.to_crs('epsg:4326').unary_union.convex_hull
rt_bounds_latlong

In [None]:
# Store the extent as a Shapely Polygon in a variable called graph_extent:
graph_extent_latlong = rt_bounds_latlong.buffer(0.1, join_style=2)
print(type(graph_extent_latlong))
graph_extent_latlong

### Food Within Extent
Groceries and prepared food establishments within this study were found using `OSMnx` and the appropriate OSM tags. 
- Groceries were defined as shops where one can purchase ingredients or products to prepare one's own meals. 
- Prepared food was defined as establishments where one can purchase already-prepared food to consume, e.g. restaurants, cafes, fast food restaurants. 
- Specialty shops, such as coffee, ice cream, or alcohol shops, were excluded from these categories. 

Farmers markets were obtained from MassGIS.

#### Groceries

In [None]:
# Retrieve groceries features within graph_extent_latlong from OSMnx and view info.
grocery_tags = {'shop':['supermarket', 'grocery', 'greengrocer', 'bakery', 'butcher', 'deli', 'dairy', 'farm', 'seafood']}
grocery = ox.geometries_from_polygon(graph_extent_latlong, grocery_tags)
grocery = convert_n_clip(grocery, rt_zcta)
grocery.info()

In [None]:
# View first five rows of groceries GDF.
grocery.head()

In [None]:
# Use count_records on groceries GDF.
zcta_grocery_count = count_records(grocery, rt_zcta, 'ZCTA5CE00', 'grocery_count')
zcta_grocery_count

#### Prepared food

In [None]:
# Retrieve prepared food features within graph_extent_latlong from OSMnx and view info.
prep_food_tags = {'amenity':['cafe', 'restaurant', 'fast_food']}
prep_food = ox.geometries_from_polygon(graph_extent_latlong, prep_food_tags)
prep_food = convert_n_clip(prep_food, rt_zcta)
prep_food.info()

In [None]:
# View first five rows of prepared food GDF.
prep_food.head()

In [None]:
# Use count_records on prepared food GDF.
zcta_prep_food_count = count_records(prep_food, rt_zcta, 'ZCTA5CE00', 'prep_food_count')
zcta_prep_food_count

#### Farmers Markets

In [None]:
# read_n_clip farmers markets shapefile from MassGIS and view info.
farmer_mrkt = read_n_clip('./data/farmersmarkets/FARMERSMARKETS_PT.shp', rt_zcta)
farmer_mrkt.info()

In [None]:
# View first five rows of farmer_mrkt.
farmer_mrkt.head()

In [None]:
# Use count_records on farmer_mrkt.
zcta_farmer_mrkt_count = count_records(farmer_mrkt, rt_zcta, 'ZCTA5CE00', 'farmer_mrkt_count')
zcta_farmer_mrkt_count

#### Define Function `calc_density`
This function was created to automate density calculations based on counts of records in a dataset (e.g. number of food establishments per square kilometer in each ZCTA).

The purpose of this function is to calculate the density of a set of records in a GeoDataFrame of polygons. It takes a base GeoDataFrame and, using function `multimerge`, merges it with a list of GeoDataFrames on the specified column or columns in the list of `on` columns and with the specified `how`. The GDFs within the list are expected to have columns with counts, and those columns are given in a list. The function then adds together the counts in specified columns and calculates the density.

In [None]:
def calc_density(orig_gdf, gdf_count_list, on_col, how, count_cols, total_count_col, density_col):
    """
    The purpose of this function is to calculate the density of a set of records
    in a GeoDataFrame of polygons.
    Takes a base GeoDataFrame and, using function multimerge, merges it with a list of GDFs 
    on the specified column or list of columns and with the specified 'how'.
    The GDFs within the list are expected to have columns with counts.
    Adds together the counts in specified columns, then calculates the density.
    Assumes on_col exists across all DFs.
    Assumes units of the GDF are in meters, with density output of per sqkm.
    Requires Pandas to run merge method (within multimerge) and fillna().
    
    Inputs:
        orig_gdf = base GDF
        gdf_count_list = list of GDFs with counts to calculate from
            e.g. [gdf1, gdf2, gdf3]
        on_col = bracketed column name or list of columns (same across GDFs)
            e.g. ['name'], ['name', 'address']
        how = how argument
            e.g. 'left'
        count_cols = bracketed list of column names with counts, order does not matter
            e.g. ['gdf1_count', 'gdf2_count', 'gdf3_count']
        total_count_col = string name for new column with total counts
            e.g. 'total_count'
        density_col = string name for new column with density (per sqkm)
            e.g. 'bike_density'
    
    Example:
    >>> schools = [elem, middle, high]
    >>> on_col = ['town_name', 'state']
    >>> count_cols = ['elem_classrooms', 'middle_classrooms', 'high_classrooms']
    >>> town_schools = multimerge(town, 
                                  schools, 
                                  on_col, 
                                  'left', 
                                  count_cols, 
                                  'total_classrooms', 
                                  'classroom_density')
    """
    
    # Run multimerge on the GDFs.
    density_gdf = multimerge(orig_gdf, gdf_count_list, on_col, how).fillna(0)
    
    # Create new total_count_col with dtype float (fill in with 0.0).
    density_gdf[total_count_col] = 0.0
    
    # Add count columns.
    for i in range(len(count_cols)):
        density_gdf[total_count_col] = density_gdf[total_count_col] + density_gdf[count_cols[i]]
    
    # Calculate density.
    density_gdf[density_col] = density_gdf[total_count_col]/density_gdf.area*(10**6)
    
    return density_gdf

#### Calculate Food Density per ZCTA

In [None]:
# Use calc_density to calculate density of food establishments per ZCTA.
food_gdfs = [zcta_grocery_count, zcta_prep_food_count, zcta_farmer_mrkt_count]
food_cols = ['grocery_count', 'prep_food_count', 'farmer_mrkt_count']
zcta_food = calc_density(rt_zcta, food_gdfs, ['ZCTA5CE00'], 'left', food_cols, 'food_count', 'food_density')

In [None]:
# View five most food-dense ZCTAs.
zcta_food.sort_values(by='food_density', ascending=False).head()

In [None]:
# View stats for food_density
zcta_food.food_density.describe()

In [None]:
# Map the density of food establishments in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_food.plot(column='food_density',
               legend=True,
               edgecolor='black',
               cmap='YlGnBu',
               legend_kwds={'label': "Food establishments per sqkm"},
               ax=ax)
plt.title('Density of Food Establishments by ZCTA in Boston Region', fontsize=16)
plt.show()

#### Reclassify Food Density

In [None]:
# Calculate values to separate food density into five quantiles.
food_quints = quantiles(zcta_food, 'food_density', quintiles)
food_quints

In [None]:
# Reclassify food density with reclass_5 function and quintile values.
zcta_food['food_density_reclass'] = zcta_food['food_density'].apply(lambda x: reclass_5(x, food_quints, 'high'))

In [None]:
# View top and bottom five food dense ZCTAs.
zcta_food.sort_values(by='food_density_reclass', ascending=False)

In [None]:
# Map the reclassified food density in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_food.plot(column='food_density_reclass',
               legend=True,
               edgecolor='black',
               cmap='plasma_r',
               legend_kwds={'label': "Reclassified food density"},
               ax=ax)
plt.title('Reclassified Density of Food Establishments by ZCTA in Boston Region', fontsize=16)
plt.show()

### Health Services Within Extent
Health services were defined as Community Health Centers and Hospitals from MassGIS and found with the appropriate OSM tags for healthcare and amenities, such as clinics and doctors offices. Social services facilities and veterinary servies were omitted.

#### Community Health Centers

In [None]:
# read_n_clip Community Health Centers shapefile from MassGIS and view info.
comm_health = read_n_clip('./data/chcs/CHCS_PT.shp', rt_zcta)
comm_health.info()

In [None]:
# View first five rows of comm_health.
comm_health.head()

In [None]:
# Use count_records on comm_health.
zcta_comm_health_count = count_records(comm_health, rt_zcta, 'ZCTA5CE00', 'comm_health_count')
zcta_comm_health_count

#### Hospitals

In [None]:
# read_n_clip Hospitals shapefile from MassGIS and view info.
hospitals = read_n_clip('./data/acute_care_hospitals/HOSPITALS_PT.shp', rt_zcta)
hospitals.info()

In [None]:
# View first five rows of hospitals.
hospitals.head()

In [None]:
# Use count_records on hospitals.
zcta_hospitals_count = count_records(hospitals, rt_zcta, 'ZCTA5CE00', 'hospitals_count')
zcta_hospitals_count

#### Healthcare

In [None]:
# Retrieve healthcare features within graph_extent_latlong from OSMnx and view info.
health_tags = {'healthcare':True, 'amenity':['clinic', 'doctors', 'dentist', 'health_post', 'pharmacy']}
healthcare = ox.geometries_from_polygon(graph_extent_latlong, health_tags)
healthcare = convert_n_clip(healthcare, rt_zcta)
healthcare.info()

In [None]:
# View first five rows of healthcare GDF.
healthcare.head()

In [None]:
# Use count_records on healthcare.
zcta_healthcare_count = count_records(healthcare, rt_zcta, 'ZCTA5CE00', 'healthcare_count')
zcta_healthcare_count

#### Calculate Health Services Density per ZCTA

In [None]:
# Use calc_density to calculate density of health services per ZCTA.
health_gdfs = [zcta_comm_health_count, zcta_hospitals_count, zcta_healthcare_count]
health_cols = ['comm_health_count', 'hospitals_count', 'healthcare_count']
zcta_health = calc_density(rt_zcta, health_gdfs, ['ZCTA5CE00'], 'left', health_cols, 'health_count', 'health_density')

In [None]:
# View five most food-dense ZCTAs.
zcta_health.sort_values(by='health_density', ascending=False).head()

In [None]:
# View stats for health_density
zcta_health.health_density.describe()

In [None]:
# Map the density of health services in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_health.plot(column='health_density',
                 legend=True,
                 edgecolor='black',
                 cmap='YlGnBu',
                 legend_kwds={'label': "Health services per sqkm"},
                 ax=ax)
plt.title('Density of Health Services by ZCTA in Boston Region', fontsize=16)
plt.show()

#### Reclassify Health Services Density

In [None]:
# Calculate values to separate health density into five quantiles.
health_quints = quantiles(zcta_health, 'health_density', quintiles)
health_quints

In [None]:
# Reclassify health density with reclass_5 function and quintile values.
zcta_health['health_density_reclass'] = zcta_health['health_density'].apply(lambda x: reclass_5(x, health_quints, 'high'))

In [None]:
# View top and bottom five health-service-dense ZCTAs.
zcta_health.sort_values(by='health_density_reclass', ascending=False)

In [None]:
# Map the average rent price in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_health.plot(column='health_density_reclass',
                 legend=True,
                 edgecolor='black',
                 cmap='plasma_r',
                 legend_kwds={'label': "Reclassified health density"},
                 ax=ax)
plt.title('Reclassified Density of Health Services by ZCTA in Boston Region', fontsize=16)
plt.show()

### Public Services Within Extent
Public services were defined in terms of safety (fire and police) and availability of public resources (USPS Post Offices and libraries). Fire stations and police stations were obtained from MassGIS. USPS data was obtained from [USPS](https://uspstools.maps.arcgis.com/apps/webappviewer/index.html?id=1fc1c26bb31246b39087606c65b83020) as a CSV of Destination Delivery Units (DDUs) in a specified map extent. Libraries were found using OSM.

#### Fire Stations

In [None]:
# read_n_clip firestations shapefile from MassGIS and view info.
fire = read_n_clip('./data/firestations_pt/FIRESTATIONS_PT_MEMA.shp', rt_zcta)
fire.info()

In [None]:
# View first five rows of fire GDF.
fire.head()

In [None]:
# Use count_records on fire.
zcta_fire_count = count_records(fire, rt_zcta, 'ZCTA5CE00', 'fire_count')
zcta_fire_count

#### Police Stations

In [None]:
# read_n_clip police stations shapefile from MassGIS and view info.
police = read_n_clip('./data/policestations/POLICESTATIONS_PT_MEMA.shp', rt_zcta)
police.info()

In [None]:
# View first five rows of police GDF.
police.head()

In [None]:
# Use count_records on police.
zcta_police_count = count_records(police, rt_zcta, 'ZCTA5CE00', 'police_count')
zcta_police_count

#### USPS Post Offices

In [None]:
# Read in CSV files of USPS DDUs (Destination Delivery Units) obtained from USPS.
usps_df = pd.read_csv('./data/USPS DDUs.csv')
# Convert to GeoDataFrame, setting CRS to WGS 84 Pseudo Mercator EPSG:3857.
usps = gpd.GeoDataFrame(usps_df, geometry=gpd.points_from_xy(usps_df.x, usps_df.y))
usps = usps.set_crs('epsg:3857')

In [None]:
# View usps GDF.
usps

In [None]:
# convert_n_clip usps using rt_zcta.
usps = convert_n_clip(usps, rt_zcta)
usps.info()

In [None]:
# View first five rows of usps GDF.
usps.head()

In [None]:
# Use count_records on usps.
zcta_usps_count = count_records(usps, rt_zcta, 'ZCTA5CE00', 'usps_count')
zcta_usps_count

#### Libraries

In [None]:
# Retrieve library features within graph_extent_latlong from OSMnx and view info.
library_tags = {'amenity':['library']}
library = ox.geometries_from_polygon(graph_extent_latlong, library_tags)
library = convert_n_clip(library, rt_zcta)
library.info()

In [None]:
# View first five rows of library GDF.
library.head()

In [None]:
# Use count_records on library.
zcta_library_count = count_records(library, rt_zcta, 'ZCTA5CE00', 'library_count')
zcta_library_count

#### Calculate Public Services Density per ZCTA

In [None]:
# Use calc_density to calculate density of public services per ZCTA.
public_service_gdfs = [zcta_fire_count, zcta_police_count, zcta_usps_count, zcta_library_count]
public_service_cols = ['fire_count', 'police_count', 'usps_count', 'library_count']
zcta_public_service = calc_density(rt_zcta, public_service_gdfs, ['ZCTA5CE00'], 'left', public_service_cols, 'public_service_count', 'public_service_density')

In [None]:
# View five most public-service-dense ZCTAs.
zcta_public_service.sort_values(by='public_service_density', ascending=False).head()

In [None]:
# View stats for public_service_density
zcta_public_service.public_service_density.describe()

In [None]:
# Map the density of public services in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_public_service.plot(column='public_service_density',
                         legend=True,
                         edgecolor='black',
                         cmap='YlGnBu',
                         legend_kwds={'label': "Public services per sqkm"},
                         ax=ax)
plt.title('Density of Public Services by ZCTA in Boston Region', fontsize=16)
plt.show()

#### Reclassify Public Services Density

In [None]:
# Calculate values to separate public service density into five quantiles.
public_service_quints = quantiles(zcta_public_service, 'public_service_density', quintiles)
public_service_quints

In [None]:
# Reclassify public service density with reclass_5 function and quintile values.
zcta_public_service['public_service_density_reclass'] = zcta_public_service['public_service_density'].apply(lambda x: reclass_5(x, public_service_quints, 'high'))

In [None]:
# View top and bottom five health-service-dense ZCTAs.
zcta_public_service.sort_values(by='public_service_density_reclass', ascending=False)

In [None]:
# Map reclassified public service density in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_public_service.plot(column='public_service_density_reclass',
                         legend=True,
                         edgecolor='black',
                         cmap='plasma_r',
                         legend_kwds={'label': "Reclassified public service density"},
                         ax=ax)
plt.title('Reclassified Density of Public Services by ZCTA in Boston Region', fontsize=16)
plt.show()

ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
grocery.plot(ax=ax, markersize=15, color='yellow', label='Groceries')
healthcare.plot(ax=ax, markersize=15, color='aquamarine', label='Healthcare')
comm_health.plot(ax=ax, markersize=15, color='coral', label='Community Health Centers')
hospitals.plot(ax=ax, markersize=20, color='orangered', label='Hospitals')
public_service.plot(ax=ax, markersize=10, color='dodgerblue', label='Public Services')
plt.title('Necessary Amenities within Boston Region Rapid Transit Extent', fontsize=16)
plt.legend()
plt.show()

---

## Calculate Weighted Suitability Index
Analyzing the reclassification maps of each indicator (mass transit, rent affordability, food establishment, health services, and public services), there is significant variation in desirable ZCTAs for each indicator. A weighted index was created to compare overall suitability across all indicators.

### Add Reclassification Columns to ZCTAs

In [None]:
# Add analysis and reclass columns to base rt_zcta using function multimerge and view info to confirm success.
indicators_list = [zcta_nodes, zcta_rent, zcta_food, zcta_health, zcta_public_service]
on_cols = ['index', 'STATEFP00', 'ZCTA5CE00', 'GEOID00', 'CLASSFP00', 'MTFCC00', 'FUNCSTAT00', 'ALAND00', 'AWATER00', 'INTPTLAT00', 'INTPTLON00', 'PARTFLG00', 'geometry']
zcta_index = multimerge(rt_zcta, indicators_list, on_cols, 'left')
zcta_index.info()

### Calculate Weighted Index
A weighted index is highly subjective. Some may value availability of food over cost of rent, while others may value convenience of public transit over all else. For the purposes of this study (ostensibly concerned with living as a student without a car), rent affordability was given the most weight (35%) while transit and food were given equal weights (20% each). Health services were valued next (15%) and public services last (10%). 

Because rental data was only available in a segment of ZCTAs, `NaN` values were preserved across index calculations to ensure a fair comparison.

In [None]:
# Calculate weighted index and view info.
zcta_index['weighted'] = zcta_index.nodes_density_reclass*0.2 + zcta_index.rent_median_reclass*0.35 + zcta_index.food_density_reclass*0.2 + zcta_index.health_density_reclass*0.15 + zcta_index.public_service_density_reclass*0.1
zcta_index.info()

In [None]:
# View top and bottom five weighted ZCTAs.
zcta_index.sort_values(by='weighted', ascending=False)

In [None]:
# Map the weighted index in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, color='white', alpha=0.3, zorder=10)
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=5)
zcta_index.plot(column='weighted',
                 legend=True,
                 edgecolor='black',
                 cmap='plasma_r',
                 legend_kwds={'label': "Weighted suitability index"},
                 ax=ax)
plt.title('Weighted Suitability of Living in Boston Region ZCTAs', fontsize=16)
plt.show()

## Bike Facilities

## Leisure

### Find Leisure Features with OSMnx

In [None]:
# Retrieve leisure features within graph_extent_latlong from OSMnx and view info.
leisure_tags = {'leisure':True, 'amenity':['bar', 'biergarten', 'ice_cream', 'pub', 'bicycle_rental', 'arts_centre', 'cinema', 'nightclub', 'planetarium', 'social_centre', 'theatre', 'bbq']}
leisure = ox.geometries_from_polygon(graph_extent_latlong, leisure_tags)
leisure = convert_n_clip(leisure, rt_zcta)
leisure.info()

In [None]:
# View first five rows of leisure GDF.
leisure.head()

### Calculate Leisure Density

In [None]:
# Use count_records on leisure.
zcta_leisure_count = count_records(leisure, rt_zcta, 'ZCTA5CE00', 'leisure_count')
zcta_leisure_count

In [None]:
# Calculate node density in nodes/sqkm.
zcta_nodes['nodes_density'] = zcta_nodes.nodes_count/zcta_nodes.area*(10**6)
zcta_nodes.sort_values(by='nodes_density', ascending=False).head()

In [None]:
# View statistics for nodes_density.
zcta_nodes.nodes_density.describe()

---

# Appendix

## Zillow Data Analysis
Zillow data was analyzed but ultimately not used for the following reasons:
- The dataset is national and has limited information--only zipcodes, region names, and average monthly rental values.
- Fewer zipcodes were represented in Zillow data than Padmapper data.
- Zillow data represents only an average across entire zipcodes. Analyzing specific home types (such as defining numbers of bedrooms) was not possible with this data.

The analysis was preserved in the Appendix as an example and to compare with Padmapper data for those interested.

### Read In and Pre-Process National Zillow Data

In [None]:
# Read in CSV file of Zillow rental data.
zillow_df = pd.read_csv('./data/Zip_ZORI_AllHomesPlusMultifamily_SSA.csv')
zillow_df

In [None]:
# View columns.
zillow_df.columns

In [None]:
# Narrow down to three years of data.
zillow_df = zillow_df.drop(columns=['2014-01', '2014-02',
       '2014-03', '2014-04', '2014-05', '2014-06', '2014-07', '2014-08',
       '2014-09', '2014-10', '2014-11', '2014-12', '2015-01', '2015-02',
       '2015-03', '2015-04', '2015-05', '2015-06', '2015-07', '2015-08',
       '2015-09', '2015-10', '2015-11', '2015-12', '2016-01', '2016-02',
       '2016-03', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08',
       '2016-09', '2016-10', '2016-11', '2016-12', '2017-01', '2017-02',
       '2017-03', '2017-04', '2017-05', '2017-06', '2017-07', '2017-08',
       '2017-09', '2017-10', '2017-11', '2017-12'])
zillow_df

In [None]:
# Before merging, view datatypes for zip code columns in zillow_df and rt_zcta.
print(rt_zcta.ZCTA5CE00.dtype)
print(zillow_df.RegionName.dtype)

In [None]:
# Convert Zillow's zipcode column to string and fill in leading zeroes.
zillow_df.RegionName = zillow_df.RegionName.astype('string').str.zfill(5)
zillow_df

In [None]:
# Merge zillow_df and rt_zcta on zipcode columns and view info.
zcta_zillow = rt_zcta.merge(zillow_df, left_on='ZCTA5CE00', right_on='RegionName', how='left')
zcta_zillow.info()

In [None]:
# View first and last five columns of Zillow data.
zcta_zillow

### Calculate Average Rent Across Years

In [None]:
# Define column lists for years.
cols_rent = ['2018-01', '2018-02',
             '2018-03', '2018-04', '2018-05', '2018-06', '2018-07', '2018-08',
             '2018-09', '2018-10', '2018-11', '2018-12', '2019-01', '2019-02',
             '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08',
             '2019-09', '2019-10', '2019-11', '2019-12', '2020-01', '2020-02',
             '2020-03', '2020-04', '2020-05', '2020-06', '2020-07', '2020-08',
             '2020-09', '2020-10', '2020-11', '2020-12', '2021-01', '2021-02',
             '2021-03']
cols_2018 = ['2018-01', '2018-02', 
             '2018-03', '2018-04', '2018-05', '2018-06', '2018-07', '2018-08',
             '2018-09', '2018-10', '2018-11', '2018-12']
cols_2019 = ['2019-01', '2019-02',
             '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08',
             '2019-09', '2019-10', '2019-11', '2019-12']
cols_2020 = ['2020-01', '2020-02',
             '2020-03', '2020-04', '2020-05', '2020-06', '2020-07', '2020-08',
             '2020-09', '2020-10', '2020-11', '2020-12']
cols_2021 = ['2021-01', '2021-02', '2021-03']

In [None]:
# Calculate yearly and overall average rental prices.
zcta_zillow['avg_2018'] = zcta_zillow[cols_2018].mean(axis=1)
zcta_zillow['avg_2019'] = zcta_zillow[cols_2019].mean(axis=1)
zcta_zillow['avg_2020'] = zcta_zillow[cols_2020].mean(axis=1)
zcta_zillow['avg_2021'] = zcta_zillow[cols_2021].mean(axis=1)
zcta_zillow['avg_rent'] = zcta_zillow[cols_rent].mean(axis=1)
zcta_zillow

In [None]:
# Map the average Zillow rent price in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_zillow.plot(column='avg_rent',
                 legend=True,
                 edgecolor='black',
                 cmap='OrRd',
                 legend_kwds={'label': "Average rental price"},
                 ax=ax)
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='Rapid Transit Stop')
bos_rt_route.plot(ax=ax, color='green', label='Rapid Transit Route')
plt.title('Average Zillow Rental Price by ZCTA in Boston Region', fontsize=16)
plt.show()