# Estimating Parking Cost and Spatial Autocorrelation Analysis of Parking Data

Goals:
   1. Join csv cost data with spatial data for parking lots
   2. Make maps to show data
   3. Estimate Ratios of M to D, D to H, M to H to estimate missing rates values.
   4. Global and Local Spatial Autocorrelation (Moran's I and Getis-Ord Gi*)
       - If Local Spatial Autocorrelation found - attempt interpolation methodologies:
           - IDW for points to polygons (TAZs)
           - KNN for points to polygons (TAZs)

For inflation adjustment: https://www.inflationtool.com/us-dollar/2010-to-present-value    

In [1]:
import geopandas as gpd
import pandas as pd
import numpy as np
import pysal
from osgeo import gdal
import folium # interactive mapping
import branca.colormap as cm # for interactive mapping
import osmnx as ox
import networkx as nx
import copy
import libpysal as lps
from libpysal import weights
from esda.moran import Moran #spatial autocorrelation
from esda.moran import Moran_Local
import scipy
from splot.esda import moran_scatterplot
from splot.esda import plot_moran
import keplergl
from libpysal.weights.util import WSP

## Bring In Data

1. Lot Rates
2. Lot Points
3. Join Points and Rates
4. Filter Lots (must have at least one rate)
5. TAZs

In [2]:
# bring in data
base = "J:\\Shared drives\\TMD_TSA\\Data\\Parking\\WebScraped_ParkingCost\\required_inputs"
# parking costs
rates = pd.read_csv(base+"\parking_cost_fullrec_NAP_F16.csv")

# spatial points
points = gpd.read_file(base+"\GeocodedParkingLots\DKedits_parking_cost_fullrec_NAP.shp")
points = points.dropna(subset=["geometry"])

# join cost to points
lots = points[['IN_SingleL','geometry','USER_month','USER_lot_u']].merge(rates[['IN_SingleLine','USER_lot_url',
                                                                                'MR','DR','HR']],
                                                                         left_on='USER_lot_u',right_on='USER_lot_url')
# drop lots of columns
# reproject for easier mapping (From mass state plane to wgs84)
lots = lots.to_crs("EPSG:4326")

# filter out customer only parking (no rates for any category)
lots = lots[(~lots['MR'].isna()) | (~lots['DR'].isna()) | (~lots['HR'].isna())]

In [3]:
# bring in relevant TAZs
base2 = "J:\Shared drives\TMD_TSA\Data\GIS Data\TAZ"
alltazs = gpd.read_file(base2+"\\candidate_CTPS_TAZ_STATEWIDE_2019_wgs84.shp")
# filter to just relevant municipalities
districts = alltazs[(alltazs['town'].isin(["BOSTON","CAMBRIDGE","SOMERVILLE","BROOKLINE","NEWTON"])) & (alltazs['id'] < 200000)][["id","town","geometry"]]

## Estimate and Fill Missing Monthly Rates

Calculate the Monthly/Daily ratio per district by dividing the monthly column by the daily column to get lot level ratios and aggregate to the region. This region-wide ratio is multiplied by each lot's Daily Rate to calculate an estimated monthly rate. At this point, a new column is made where observed monthly rate data unless missing, then estimated monthly rate data is used if existing (aka if lot has a daily rate). 

This will be conducted for M/D, D/H, and M/H - M/D is used as the example for the explanation for ease of understanding.

In [4]:
# spatial join - for every lot get district
lotdist = lots.sjoin(districts.reset_index(), how="left")

### Monthly to Daily

In [5]:
#copy from prior section
estmonth = copy.copy(lotdist)
#get ratio at the lot level
estmonth['Monthly_to_Daily'] = estmonth['MR']/estmonth['DR']
estmonth[estmonth['Monthly_to_Daily'] > 0]

Unnamed: 0,IN_SingleL,geometry,USER_month,USER_lot_u,IN_SingleLine,USER_lot_url,MR,DR,HR,index_right,index,id,town,Monthly_to_Daily
7,"11 Stillings St Boston, MA 02210, US",POINT (-71.04717 42.35030),Monthly: $150-$480,https://www.parkme.com/lot/32201/stillings-gar...,"11 Stillings St Boston, MA 02210, US",https://www.parkme.com/lot/32201/stillings-gar...,480.0,25.0,12.0,175.0,2200.0,136.0,BOSTON,19.200000
8,"16 Charles St Boston, MA 02108, US",POINT (-71.06826 42.35432),Monthly: $150-$400,https://www.parkme.com/lot/32172,"16 Charles St Boston, MA 02108, US",https://www.parkme.com/lot/32172,400.0,28.0,10.0,558.0,4409.0,71.0,BOSTON,14.285714
9,"40 Beach St Boston, MA 02111, US",POINT (-71.06159 42.35165),Monthly: $165-$335,https://www.parkme.com/lot/75578/40-beach-st?e...,"40 Beach St Boston, MA 02111, US",https://www.parkme.com/lot/75578/40-beach-st?e...,335.0,28.0,12.0,152.0,1902.0,76.0,BOSTON,11.964286
12,"660 Washington St Boston, MA 02111, US",POINT (-71.06253 42.35172),Monthly: $335-$395,https://www.parkme.com/lot/15258/660-washingto...,"660 Washington St Boston, MA 02111, US",https://www.parkme.com/lot/15258/660-washingto...,395.0,27.0,13.0,152.0,1902.0,76.0,BOSTON,14.629630
13,"19 Lancaster St Boston, MA 02114, US",POINT (-71.06214 42.36409),Monthly: $465,https://www.parkme.com/lot/24108,"19 Lancaster St Boston, MA 02114, US",https://www.parkme.com/lot/24108,465.0,25.0,13.0,520.0,4369.0,3.0,BOSTON,18.600000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
454,"1 Sawyer Rd Waltham, MA 02453, US",POINT (-71.26032 42.36179),Monthly: $70,https://www.parkme.com/lot/215055,"1 Sawyer Rd Waltham, MA 02453, US",https://www.parkme.com/lot/215055,70.0,4.0,,,,,,17.500000
455,"200 Webster St Newton, MA 02466, US",POINT (-71.23432 42.34777),Monthly: $70,https://www.parkme.com/lot/215060,"200 Webster St Newton, MA 02466, US",https://www.parkme.com/lot/215060,70.0,4.0,,640.0,5640.0,1010.0,NEWTON,17.500000
456,"Auburn St Newton, MA 02466, US",POINT (-71.25122 42.34573),Monthly: $105,https://www.parkme.com/lot/215043,"Auburn St Newton, MA 02466, US",https://www.parkme.com/lot/215043,105.0,6.0,,364.0,4124.0,1012.0,NEWTON,17.500000
457,"91 Wyman St Newton, MA 02468, US",POINT (-71.23017 42.32636),Monthly: $157.50,https://www.parkme.com/lot/215119,"91 Wyman St Newton, MA 02468, US",https://www.parkme.com/lot/215119,157.5,9.0,,325.0,4073.0,1033.0,NEWTON,17.500000


In [6]:
# estimate monthly from daily and mean regional ratio (using only where both values)
estmonth['Est_Monthly'] = estmonth['DR'] * estmonth['Monthly_to_Daily'].mean()

# combine estimated daily with actual daily where possible
estmonth['Monthly_Rate_wEst'] = np.where(estmonth['MR'].isna(),
                                         estmonth['Est_Monthly'],
                                         estmonth['MR'])

### Daily to Hourly

In [7]:
#get D to H ratio at the lot level
# if divide by 0, just set to 0
estmonth['Daily_to_Hourly'] = np.where(estmonth['HR'] == 0,None,estmonth['DR']/estmonth['HR'])

In [8]:
# estimate Daily from Hourly and mean regional ratio (using only where both values)
estmonth['Est_Daily'] = estmonth['HR'] * estmonth['Daily_to_Hourly'].mean()

# combine estimated daily with actual daily where possible
estmonth['Daily_Rate_wEst'] = np.where(estmonth['DR'].isna(),
                                         estmonth['Est_Daily'],
                                         estmonth['DR'])

### Monthly to Hourly 
(Derived from Daily to Hourly Ratio and Monthly to Daily Ratio)

In [9]:
#get Monthly to Hourly ratio at the lot level
# if divide by 0, just set to 0
estmonth['Monthly_to_Hourly'] = estmonth['Daily_to_Hourly']*estmonth['Monthly_to_Daily']

In [10]:
# estimate Hourly from Monthly and mean regional ratio (using only where both values)
estmonth['Est_Hourly'] = estmonth['MR'] / estmonth['Monthly_to_Hourly'].mean()

# combine estimated daily with actual daily where possible
estmonth['Hourly_Rate_wEst'] = np.where(estmonth['HR'].isna(),
                                         estmonth['Est_Hourly'],
                                         estmonth['HR'])


### Round 2 of Estimation of Missing Parking Lot Rates

In [11]:
# estimate monthly from daily and mean regional ratio (using only where both values)
estmonth['Est_Monthly2'] = estmonth['Daily_Rate_wEst'] * estmonth['Monthly_to_Daily'].mean()

# combine estimated daily with actual daily where possible
estmonth['Monthly_Rate_wEst2'] = np.where(estmonth['Monthly_Rate_wEst'].isna(),
                                         estmonth['Est_Monthly2'],
                                         estmonth['Monthly_Rate_wEst'])

# estimate Daily from Hourly and mean regional ratio (using only where both values)
estmonth['Est_Daily2'] = estmonth['Hourly_Rate_wEst'] * estmonth['Daily_to_Hourly'].mean()

# combine estimated daily with actual daily where possible
estmonth['Daily_Rate_wEst2'] = np.where(estmonth['Daily_Rate_wEst'].isna(),
                                         estmonth['Est_Daily2'],
                                         estmonth['Daily_Rate_wEst'])

# estimate Hourly from Monthly and mean regional ratio (using only where both values)
estmonth['Est_Hourly2'] = estmonth['Monthly_Rate_wEst'] / estmonth['Monthly_to_Hourly'].mean()

# combine estimated daily with actual daily where possible
estmonth['Hourly_Rate_wEst2'] = np.where(estmonth['Hourly_Rate_wEst'].isna(),
                                         estmonth['Est_Hourly2'],
                                         estmonth['Hourly_Rate_wEst'])


In [12]:
estmonth.to_csv("J:\Shared drives\TMD_TSA\Data\Parking\WebScraped_ParkingCost\estmonth333.csv")

## Results of Local Spatial Autocorrelation

Due to a lack of experience with running spatial autocorrelation in python - Getis Ord Gi* was run in QGIS with the following specifications:
1. KNN = 8 (all neighborhoods have 8 neighbors)
3. Row Standardization
5. Run on Rate column with estimates (nulls)

The results show that for all three rate types (hourly, daily, monthly) there is significant local spatial autocorrelation with high clusters downtown and cold clusters (though more spread out) further out. Given this information - we aggregate the lots rates data to TAZs using the 8 nearest neighbors and distances from the TAZ centroid (euclidean) to create a weighted average. The weights are created for each OD pair where the weight is 1/(distance)^2 (squared)

To identify outliers within clusters to smooth the weighted averages assigned to TAZs, Local Moran's I was run with the same specifications as Getis Ord Gi* above. The results are below. Please note that the legends for the maps are the same as for the graphs.


In [13]:
# IMPORT THE RESULTS OF LOCAL MORAN's I
estmonthLM = gpd.read_file("J:\Shared drives\TMD_TSA\Data\Parking\WebScraped_ParkingCost\\required_inputs\estmonth_April14_HR_DR_MR_LM.geojson")
estmonth = estmonth.to_crs(estmonthLM.crs)
estmonth = estmonth.drop(columns=["index_right"])
estmonth = estmonth.sjoin_nearest(estmonthLM[["COType_HR","COType_DR","COType_MR","geometry"]], how="left")

In [14]:
# 1 and 13 are very close to each other (see index_right) removing them so can filter later
estmonth = estmonth[~estmonth.index.duplicated(keep='first')]

# Aggregate Rates to TAZs



In [15]:
# get euclidean distance matrix from TAZ centroids to lots
# also reproject to Mass State Plane (meters) so that distance is correct
rdg83 = alltazs.to_crs("EPSG:26986").set_index("id") # TAZ ids are now the column names
estmonth83 = estmonth.to_crs("EPSG:26986") # index is the row name

eucdist = estmonth83.centroid.geometry.apply(lambda g: rdg83.distance(g))

In [16]:
# convert to miles
eucdistmi = eucdist/1609.34

In [17]:
# get just closest 16 lots to each TAZ centroid based on euclidean distance
numlot = len(eucdistmi)
for col in eucdistmi.columns:
    big8 = max(eucdistmi[col].nsmallest(16))
    eucdistmi.loc[eucdistmi[col] > big8, col]= np.nan
# set distances (weights) to 1 so all have equal weights
eucdistmi[eucdistmi.notna()] = 1

In [18]:
tazids = alltazs[(alltazs['town'].isin(["BOSTON","CAMBRIDGE","SOMERVILLE",
                                        "BROOKLINE","NEWTON"])) & (alltazs['id'] < 200000)]["id"].tolist()

In [19]:
# get lot ids where HL or LH for each time period and exclude them from the weighted average
hr_in = estmonth[~estmonth['COType_HR'].isin(["LH", "HL"])].reset_index()['index']
mr_in = estmonth[~estmonth['COType_MR'].isin(["LH", "HL"])].reset_index()['index']
dr_in = estmonth[~estmonth['COType_DR'].isin(["LH", "HL"])].reset_index()['index']

In [20]:
# calculate weighted average

# 1. multiply weights (1) by rates
# filter the rates by whether the lot is an outlier - so will match weights below
hr = estmonth["Hourly_Rate_wEst2"].filter(items = hr_in, axis=0)
dr = estmonth["Daily_Rate_wEst2"].filter(items = dr_in, axis=0)
mr = estmonth["Monthly_Rate_wEst2"].filter(items = mr_in, axis=0)

# filter the weights by whether the lot is an outlier, then multiply by rates
xWhr = eucdistmi.filter(items = hr_in, axis=0).multiply(hr, axis="index")
xWdr = eucdistmi.filter(items = dr_in, axis=0).multiply(dr, axis="index")
xWmr = eucdistmi.filter(items = mr_in, axis=0).multiply(mr, axis="index")

# sum weighted rates by TAZ
xW_hr_taz = xWhr.sum()
xW_dr_taz = xWdr.sum()
xW_mr_taz = xWmr.sum()
xW_hr_taz.name = "HRSum16"
xW_dr_taz.name = "DRSum16"
xW_mr_taz.name = "MRSum16"

#sum weights by TAZ
W_taz = eucdistmi.sum()
W_taz.name = "TotalNN"

# join weighted rates sums by taz and sum weights by taz together
wAvg = pd.merge(W_taz,xW_hr_taz, left_index=True, right_index=True)
wAvg = pd.merge(wAvg,xW_dr_taz, left_index=True, right_index=True)
wAvg = pd.merge(wAvg,xW_mr_taz, left_index=True, right_index=True)

# set weighted average rates to 0 where TAZ not in prediction area
wAvg["HRSum16"] = np.where(~wAvg.index.isin(tazids), 0, wAvg["HRSum16"])
wAvg["DRSum16"] = np.where(~wAvg.index.isin(tazids), 0, wAvg["DRSum16"])
wAvg["MRSum16"] = np.where(~wAvg.index.isin(tazids), 0, wAvg["MRSum16"])

wAvg

Unnamed: 0_level_0,TotalNN,HRSum16,DRSum16,MRSum16
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
4398,16.0,0.0,0.0,0.0
2571,16.0,0.0,0.0,0.0
2669,16.0,0.0,0.0,0.0
4392,16.0,0.0,0.0,0.0
2641,16.0,0.0,0.0,0.0
...,...,...,...,...
4793,16.0,0.0,0.0,0.0
4795,16.0,0.0,0.0,0.0
4794,16.0,0.0,0.0,0.0
209094,16.0,0.0,0.0,0.0


In [21]:
wAvg["NN_Average_HR"] = wAvg["HRSum16"]/wAvg["TotalNN"]
wAvg["NN_Average_DR"] = wAvg["DRSum16"]/wAvg["TotalNN"]
wAvg["NN_Average_MR"] = wAvg["MRSum16"]/wAvg["TotalNN"]

In [22]:
tazs_avg_rates = pd.merge(rdg83,wAvg, left_index=True, right_index=True)

In [23]:
tazs_avg_rates["NN_Average_MR_2010"] = tazs_avg_rates["NN_Average_MR"] * 0.69 # convert to 2010 dollars
tazs_avg_rates["NN_Average_DR_2010"] = tazs_avg_rates["NN_Average_DR"] * 0.69 # convert to 2010 dollars
tazs_avg_rates["NN_Average_HR_2010"] = tazs_avg_rates["NN_Average_HR"] * 0.69 # convert to 2010 dollars

# Exports
(For presentation)

In [24]:
tazs_avg_rates.drop("geometry",axis=1).to_csv("J:\\Shared drives\\TMD_TSA\\Data\\Parking\\WebScraped_ParkingCost\\tazs_avg_rates333.csv")

In [25]:
tazs_avg_rates.to_file("J:\Shared drives\\TMD_TSA\Data\Parking\WebScraped_ParkingCost\\tazs_avg_rates333.geojson")  