# UEP 239 Final Project
#### Jess Wilson - May 2021

---

### Goal of Project:
- Discover the most suitable zip code tabulation areas (ZCTAs) for young professionals to move to in Metro Boston. 

### Suitability Indicator Variables: 
1. Mean rent payment (USD)
2. Population density (population per square km)
3. Proximity to farmer's markets (km)
4. Transit stop density (stop per square km)
5. Proximity to supermarkets (km)

---

#### Instructions on How to Run:
- All instructions can be found in the README.md, including how to load and run environment and download necessary datasets 
- Environment (environment.yml) and data directory (uep239-final-project-data) are located in repository

---

#### Importing Relevant Libraries:

In [2]:
import numpy as np               # Load numpy, for scientific computing
import pandas as pd              # Load pandas, for data frame manipulation 
#import geopandas as gpd          # Load geopandas, for pandas manipulation with geospatial components
import matplotlib.pyplot as plt  # Load matplotlib, for plotting and mappingdata
import seaborn as sns            # Load seaborn, for additional plotting features 
import folium                    # Load folium, for interactive maps
import os                        # Load OS, for operating system work
import contextily as cx          # Load contextily, for basemaps

---

#### Setting Working Directory:

In [3]:
# Pass the raw string (r) path of the directory in which you downloaded the project data
os.chdir(os.path.dirname(r"C:/Users/tiago/OneDrive/Documentos/GPwP/uep239-final-project/uep239-final-project-data/"))
# Print working directory
print("Path changed to: "+os.getcwd())

Path changed to: C:\Users\tiago\OneDrive\Documentos\GPwP\uep239-final-project\uep239-final-project-data


---

#### Reading in Relevant Data:

In [None]:
#Read in csv tabular data:

# ZCTA population:
population_raw = pd.read_csv(r"tabular/Population/population.csv")
# Median monthly rent:
rent_raw = pd.read_csv(r"tabular/Rent/rent.csv")
# Grocery stores:
grocery_raw = pd.read_csv(r"tabular/Supermarkets/supermarkets.csv")


# Read in shp geospatial data:

# MPO boundaries:
mpo_raw = gpd.read_file(r"vector/MPO_Boundaries/mpo_boundaries.shp")
# MA town boundaries:
town_raw = gpd.read_file(r"vector/Town_Boundaries/town_boundaries.shp")
# ZCTA boundaries:
zcta_raw = gpd.read_file(r"vector/ZCTA_Boundaries/zcta.shp")
# Bus stops:
bus_raw = gpd.read_file(r"vector/Bus_Stops/bus_stops.shp")
# Transit stops:
transit_raw = gpd.read_file(r"vector/Transit_Stops/transit_stops.shp")
# Farmer's markets:
market_raw = gpd.read_file(r"vector/Farmers_Markets/farmers_markets.shp")

---

#### Data Cleaning:

In [None]:
population_raw.info()
population_raw.head()
population = 

In [None]:
rent_raw.info()
rent_raw.head()
rent = 

In [None]:
grocery_raw.info()
grocery_raw.head()
grocery = 

In [None]:
mpo_raw.info()
mpo_raw.head()
mpo = 

In [None]:
town_raw.info()
town_raw.head()
town = 

In [None]:
zcta_raw.info()
zcta_raw.head()
zcta = 

In [None]:
bus_raw.info()
bus_raw.head()
bus = 

In [None]:
transit_raw.info()
transit_raw.head()
transit = 

In [None]:
market_raw.info()
market_raw.head()
market = 

---

#### Data Manipulation, Joining, and Reprojecting:

In [4]:
# join MPO with zcta (intersect) and join zcta with town (keep all)
# population and rent joined to zcta 
# join transit with bus (keep all)
# convert grocery to gpd using lat/lon

---

#### Creation of Basemap:

In [None]:
# zcta map feat. town and mpo
# use contextily (cx)

---

#### Analysis of Suitability Indicator Variables:

1. Mean rent payment (USD)
2. Population density (population per square km)
3. Proximity to farmer's markets (km)
4. Transit stop density (stop per square km)
5. Proximity to supermarkets (km)

---

**1. Mean rent payment (USD):**

In [None]:
# Visualize spatial data - median rent per ZCTA:

In [None]:
# Summarize indicator values - mean/median rent per ZCTA:

In [None]:
# Visualize mean/med rent per ZCTA

In [None]:
# Produce ZCTA ranking based on mean/med rent, report highest and lowest ranking ZCTAs (e.g. top 5 and bottom 5)

In [None]:
# Convert indicator values into suitability score by normalizing values into suitability index ranging from zero to one

---

**2. Population density (population per square km):**

In [None]:
# Visualize spatial data - population per ZCTA

In [None]:
# Summarize indicator values - population per square km per ZCTA:
# Density per ZCTA function

In [None]:
# Visualize pop density per ZCTA

In [None]:
# Produce ZCTA ranking based on mean/med rent, report highest and lowest ranking ZCTAs (e.g. top 5 and bottom 5)

In [None]:
# Convert indicator values into suitability score by normalizing values into suitability index ranging from zero to one

---

**3. Proximity to farmer's markets (km):**

In [None]:
# Visualize spatial data - farmer's market locations:

In [None]:
# Summarize indicator values - closest farmer's market per ZCTA (euc dis):
# Closest POI function

In [None]:
# Visualize closest farmer's market per ZCTA

In [None]:
# Produce ZCTA ranking based on mean/med rent, report highest and lowest ranking ZCTAs (e.g. top 5 and bottom 5)

In [None]:
# Convert indicator values into suitability score by normalizing values into suitability index ranging from zero to one

---

**4. Transit stop density (stop per square km):**

In [None]:
# Visualize spatial data - bus and transit stop locations:

In [None]:
# Summarize indicator values - stop per square km per ZCTA:
# Density per ZCTA function

In [None]:
# Visualize transit stop density per ZCTA

In [None]:
# Produce ZCTA ranking based on mean/med rent, report highest and lowest ranking ZCTAs (e.g. top 5 and bottom 5)

In [None]:
# Convert indicator values into suitability score by normalizing values into suitability index ranging from zero to one

---

**5. Proximity to supermarkets (km):**

In [None]:
# Visualize spatial data - grocery store locations:

In [None]:
# Summarize indicator values - closest supermarket per ZCTA (euc dis):
# Closest POI function

In [None]:
# Visualize closest supermarket per ZCTA

In [None]:
# Produce ZCTA ranking based on mean/med rent, report highest and lowest ranking ZCTAs (e.g. top 5 and bottom 5)

In [None]:
# Convert indicator values into suitability score by normalizing values into suitability index ranging from zero to one

---

#### Unweighted Suitability Index:

- Mean rent payment (USD) = 25%
- Population density (population per square km) = 25%
- Proximity to farmer's markets (km) = 25%
- Transit stop density (stop per square km) = 25%
- Proximity to supermarkets (km) = 25%

In [None]:
unweighted = rent[score] + population[score] + market[score] + transit[score] + grocery [score]
unweighted_index_top = unweighted.sort_values('score',ascending=False).head()
unweighted_index_bottom = unweighted.sort_values('score',ascending=False).tail()

In [None]:
#plot unweighted

---

#### Weighted Suitability Index:

- Mean rent payment (USD) = 47.5%
- Population density (population per square km) = 6.5%
- Proximity to farmer's markets (km) = 3.3%
- Transit stop density (stop per square km) = 13.5%
- Proximity to supermarkets (km) = 29.2%

*Weights attributed using AHP method: https://bpmsg.com/ahp/ahp-calc.php*


In [None]:
weighted = rent[score] * 0.475 + population[score] * 0.065 + market[score] * 0.033 + transit[score] * 0.135 + grocery [score] * 0.292
weighted_index_top = weighted.sort_values('score',ascending=False).head()
weighted_index_bottom = weighted.sort_values('score',ascending=False).tail()

In [None]:
#plot weighted