# Public Transit Effects on Housing Values in Los Angeles County - Ordinary Least Squares Model
The following Jupyter Notebook utilizes the data collected and analyzed in the previous notebooks to create an Ordinary Least Squares model that analyzes the effects different phases of a mass transit station's lifetime has on the median Zestimate values for a respective ZIP Code. 

To start, we import necessary tools and the data collected previously

In [1]:
#for data analysis and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#to ignore warnings
import warnings
warnings.filterwarnings('ignore')

#to reset how many max columns and rows I can see
pd.set_option('display.max_columns', None)

#to export dataframes as images
import dataframe_image as dfi

#for of modeling
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as sm
from sklearn.metrics import mean_squared_error, r2_score 

## Importing data and preparing data for OLS models
In the following cells, 2 more dataframes are created.

In total, 5 OLS models are to be made using the following data:
- 1) Zestimates for all homes (without distinction on whether a Zestimate is for a single family home, condominium or how much much rooms a home has)
- 2) Zestimates for all homes depending on housetype (single family homes versus condominiums/co-op units)
- 3) Zestimates for all homes depending on how many rooms they have (1, 2, 3, 4, or 5+ bedrooms)
- 4) Zestimates for all homes depending on housetype (single family homes versus condominiums/co-op units) WITH interaction between station status and housetype
- 5) Zestimates for all homes depending on housetype (single family homes versus condominiums/co-op units) WITH interaction between station status and bedroom number

In [2]:
#the dataframes are collected in the 1st part of this project
all_homes_df = pd.read_csv('Data/Zestimates for All Homes.csv').drop(['LineType','TotalPopulation'], axis=1)
housetype_df = pd.read_csv('Data/Zestimates by House Type.csv').drop(['LineType','TotalPopulation'], axis=1)
bedroom_df = pd.read_csv('Data/Zestimates by Bedroom Num.csv').drop(['LineType','TotalPopulation'], axis=1)

In [3]:
#get dummy/categorical variables
all_homes_df = pd.get_dummies(data=all_homes_df, columns=['StationStatus'])
housetype_df = pd.get_dummies(data=housetype_df, columns=['StationStatus'])
bedroom_df = pd.get_dummies(data=bedroom_df, columns=['StationStatus'])
bedroom_df = pd.get_dummies(data=bedroom_df, columns=['BedroomNum'])

#to avoid multicollinearity, drop one of the station status variables
all_homes_df = all_homes_df.drop('StationStatus_0', axis=1)
housetype_df = housetype_df.drop('StationStatus_0', axis=1)
bedroom_df = bedroom_df.drop('StationStatus_0', axis=1)
bedroom_df = bedroom_df.drop('BedroomNum_1', axis=1)

#make copies of the latter two datasets
housetype_intvar_df = housetype_df.copy()
bedroom_intvar_df = bedroom_df.copy()

#create interactive variables
housetype_intvar_df['SFH*StationStatus_1'] = housetype_intvar_df['HouseType'] * housetype_intvar_df['StationStatus_1']
housetype_intvar_df['SFH*StationStatus_2'] = housetype_intvar_df['HouseType'] * housetype_intvar_df['StationStatus_2']
housetype_intvar_df['SFH*StationStatus_3'] = housetype_intvar_df['HouseType'] * housetype_intvar_df['StationStatus_3']
bedroom_intvar_df['BedroomNum_2*StationStatus_1'] = bedroom_intvar_df['BedroomNum_2'] * bedroom_intvar_df['StationStatus_1']
bedroom_intvar_df['BedroomNum_3*StationStatus_1'] = bedroom_intvar_df['BedroomNum_3'] * bedroom_intvar_df['StationStatus_1']
bedroom_intvar_df['BedroomNum_4*StationStatus_1'] = bedroom_intvar_df['BedroomNum_4'] * bedroom_intvar_df['StationStatus_1']
bedroom_intvar_df['BedroomNum_5*StationStatus_1'] = bedroom_intvar_df['BedroomNum_5'] * bedroom_intvar_df['StationStatus_1']
bedroom_intvar_df['BedroomNum_2*StationStatus_2'] = bedroom_intvar_df['BedroomNum_2'] * bedroom_intvar_df['StationStatus_2']
bedroom_intvar_df['BedroomNum_3*StationStatus_2'] = bedroom_intvar_df['BedroomNum_3'] * bedroom_intvar_df['StationStatus_2']
bedroom_intvar_df['BedroomNum_4*StationStatus_2'] = bedroom_intvar_df['BedroomNum_4'] * bedroom_intvar_df['StationStatus_2']
bedroom_intvar_df['BedroomNum_5*StationStatus_2'] = bedroom_intvar_df['BedroomNum_5'] * bedroom_intvar_df['StationStatus_2']
bedroom_intvar_df['BedroomNum_2*StationStatus_3'] = bedroom_intvar_df['BedroomNum_2'] * bedroom_intvar_df['StationStatus_3']
bedroom_intvar_df['BedroomNum_3*StationStatus_3'] = bedroom_intvar_df['BedroomNum_3'] * bedroom_intvar_df['StationStatus_3']
bedroom_intvar_df['BedroomNum_4*StationStatus_3'] = bedroom_intvar_df['BedroomNum_4'] * bedroom_intvar_df['StationStatus_3']
bedroom_intvar_df['BedroomNum_5*StationStatus_3'] = bedroom_intvar_df['BedroomNum_5'] * bedroom_intvar_df['StationStatus_3']

#add constants to the training sets
all_homes_df = sm.add_constant(all_homes_df, prepend=False)
housetype_df = sm.add_constant(housetype_df, prepend=False)
bedroom_df = sm.add_constant(bedroom_df, prepend=False)
housetype_intvar_df = sm.add_constant(housetype_intvar_df, prepend=False)
bedroom_intvar_df = sm.add_constant(bedroom_intvar_df, prepend=False)

housetype_df

Unnamed: 0,ZIP,City,Date,Zestimate,MetroLine,LAT,LNG,PollutionBurdenScore,Income,Homeownership,Commute,BachelorsEd,Retail,ParkAccess,TreeCanopy,Walkability,TotalCrime,ViolentCrimeRate,PropertyCrimeRate,MortgageRate,HouseType,StationStatus_1,StationStatus_2,StationStatus_3,const
0,90001,Florence-Graham,1/31/00,190743,A Line,33.974027,-118.249509,7.101267,48011,0.358768,0.155705,0.056269,7.536387,0.720039,0.036932,14.560535,121.385,14.470,28.56,8.21,1,0,0,1,1.0
1,90002,Los Angeles,1/31/00,172345,A Line,33.949099,-118.246737,6.629903,42245,0.349694,0.154442,0.059574,3.175994,0.965845,0.046310,14.505535,107.673,14.760,26.29,8.21,1,0,0,1,1.0
2,90003,Los Angeles,1/31/00,176070,-,33.964131,-118.272783,7.197647,42220,0.283002,0.149385,0.058099,3.890044,0.514657,0.039675,14.610410,110.422,14.100,27.18,8.21,1,0,0,0,1.0
3,90004,Los Angeles,1/31/00,481804,B Line (to Hollywood/Vine),34.076198,-118.310722,6.363980,52775,0.165924,0.222401,0.350418,7.503799,0.580628,0.046591,15.078754,97.800,10.690,39.78,8.21,1,0,0,1,1.0
4,90005,Los Angeles,1/31/00,693093,D Line (to Wilshire/Western),34.059163,-118.306892,5.874539,42398,0.077409,0.346879,0.313073,30.472961,0.762089,0.033707,16.131568,129.820,12.330,57.63,8.21,1,0,0,1,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
123961,93534,Lancaster,4/30/22,272564,-,34.713292,-118.152920,4.326598,43247,0.343620,0.030925,0.164462,6.207458,0.503773,0.031308,10.155014,60.792,8.862,19.80,4.98,0,0,0,0,1.0
123962,93535,Lancaster,4/30/22,210439,-,34.713656,-117.864660,3.815726,49304,0.566326,0.032056,0.111295,0.975277,0.278922,0.021886,7.008131,37.968,5.112,15.60,4.98,0,0,0,0,1.0
123963,93536,Lancaster,4/30/22,312069,-,34.747390,-118.369249,3.352665,84674,0.658702,0.021643,0.238092,1.593354,0.226433,0.026834,6.898064,65.691,3.915,18.53,4.98,0,0,0,0,1.0
123964,93550,Palmdale,4/30/22,303761,-,34.408548,-118.123592,3.387621,48002,0.520744,0.038784,0.086532,1.361141,0.670852,0.032939,7.777514,42.691,4.482,15.95,4.98,0,0,0,0,1.0


# Splitting Data and Dropping Variables
The following splits the 5 dataframes into datasets for dependent and independent variables. 

Several variables are dropped because their inclusion in the models produce the following error: "*The condition number is large. This might indicate that there are strong multicollinearity or other numerical problems.*" 

The variables dropped are the following:
- Income
- Tree Canopy
- Commute
- Walkability 
- Total Crime (obviously a culprit of multicollinearity as it is a variable dependent on other variables such as property crime rates and violent crime rates)

Furthermore, other variables are dropped/not used (such as Latitude and Longitude data, Metro Line, Date, and City)

In [4]:
y_ah = all_homes_df['Zestimate']
y_ht = housetype_df['Zestimate']
y_bd = bedroom_df['Zestimate']
y_ht_intrctn = housetype_intvar_df['Zestimate']
y_bd_intrctn = bedroom_intvar_df['Zestimate']

X_AH = all_homes_df.loc[:,'PollutionBurdenScore':].drop(['TotalCrime','Income','TreeCanopy','Commute','Walkability'],axis=1)
X_HT = housetype_df.loc[:,'PollutionBurdenScore':].drop(['TotalCrime','Income','TreeCanopy','Commute','Walkability'],axis=1)
X_BD = bedroom_df.loc[:,'PollutionBurdenScore':].drop(['TotalCrime','Income','TreeCanopy','Commute','Walkability'],axis=1)
X_HT_INTRCTN = housetype_intvar_df.loc[:,'PollutionBurdenScore':].drop(['TotalCrime','Income','TreeCanopy','Commute','Walkability'],axis=1)
X_BD_INTRCTN = bedroom_intvar_df.loc[:,'PollutionBurdenScore':].drop(['TotalCrime','Income','TreeCanopy','Commute','Walkability'],axis=1)

# Modeling and Model Results
The results below show the coefficients for each model with robust standard errors (cov_type='HC3'). Without them, the models have heteroskedacity (when the standard deviations of a predicted variable are non-constant and they vary as the independent variables change) which would invalidate the results of the estimated Standard Errors.

Notes on the following variables:
- StationStatus_1 indicates a station in the planning phase (after a Locally Preferred Alternative is chosen by Metro)
- StationStatus_2 indicates a station in the construction phase
- StationStatus_3 indicates a station in the operation phase
- Housetype = 1 indicates a condominium/co-op unit (0 for single family homes)
- Interactive variables are denoted in the following manner: *indicator* * *StationStatus_n* (ex: "Condo * StationStatus_1" or "BedroomNum_2 * StationStatus_3"

## Model 1: OLS model for all homes

In [5]:
lr = sm.OLS(y_ah,X_AH).fit(cov_type='HC3')
lr.summary()

0,1,2,3
Dep. Variable:,Zestimate,R-squared:,0.482
Model:,OLS,Adj. R-squared:,0.482
Method:,Least Squares,F-statistic:,3785.0
Date:,"Sat, 16 Jul 2022",Prob (F-statistic):,0.0
Time:,12:44:54,Log-Likelihood:,-1019900.0
No. Observations:,72103,AIC:,2040000.0
Df Residuals:,72091,BIC:,2040000.0
Df Model:,11,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
PollutionBurdenScore,-4921.6073,997.455,-4.934,0.000,-6876.584,-2966.631
Homeownership,1.842e+05,9622.585,19.145,0.000,1.65e+05,2.03e+05
BachelorsEd,1.476e+06,1.15e+04,128.721,0.000,1.45e+06,1.5e+06
Retail,-27.1019,142.826,-0.190,0.850,-307.035,252.832
ParkAccess,3.575e+04,7223.419,4.950,0.000,2.16e+04,4.99e+04
ViolentCrimeRate,857.3400,464.466,1.846,0.065,-52.998,1767.677
PropertyCrimeRate,2519.1354,167.880,15.006,0.000,2190.097,2848.174
MortgageRate,-8.74e+04,880.903,-99.215,0.000,-8.91e+04,-8.57e+04
StationStatus_1,1.24e+05,9931.149,12.487,0.000,1.05e+05,1.43e+05

0,1,2,3
Omnibus:,64945.429,Durbin-Watson:,1.371
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3671396.787
Skew:,4.178,Prob(JB):,0.0
Kurtosis:,36.944,Cond. No.,334.0


## Model 2: OLS model for all homes with distinction of hometype

In [6]:
lr = sm.OLS(y_ht,X_HT).fit(cov_type='HC3')
lr.summary()

0,1,2,3
Dep. Variable:,Zestimate,R-squared:,0.504
Model:,OLS,Adj. R-squared:,0.504
Method:,Least Squares,F-statistic:,5002.0
Date:,"Sat, 16 Jul 2022",Prob (F-statistic):,0.0
Time:,12:44:54,Log-Likelihood:,-1817800.0
No. Observations:,123966,AIC:,3636000.0
Df Residuals:,123953,BIC:,3636000.0
Df Model:,12,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
PollutionBurdenScore,-5962.9152,1469.436,-4.058,0.000,-8842.958,-3082.873
Homeownership,-7.574e+04,1.21e+04,-6.276,0.000,-9.94e+04,-5.21e+04
BachelorsEd,2.202e+06,1.49e+04,148.098,0.000,2.17e+06,2.23e+06
Retail,3074.0746,200.613,15.323,0.000,2680.881,3467.268
ParkAccess,-9.349e+04,1e+04,-9.344,0.000,-1.13e+05,-7.39e+04
ViolentCrimeRate,1424.9565,678.041,2.102,0.036,96.021,2753.892
PropertyCrimeRate,4059.3495,246.487,16.469,0.000,3576.244,4542.454
MortgageRate,-1.114e+05,1158.211,-96.180,0.000,-1.14e+05,-1.09e+05
HouseType,7.23e+05,3391.702,213.164,0.000,7.16e+05,7.3e+05

0,1,2,3
Omnibus:,110156.056,Durbin-Watson:,1.218
Prob(Omnibus):,0.0,Jarque-Bera (JB):,5730699.702
Skew:,4.12,Prob(JB):,0.0
Kurtosis:,35.273,Cond. No.,328.0


## Model 3: OLS model with distinction for number of bedrooms

In [7]:
lr = sm.OLS(y_bd,X_BD).fit(cov_type='HC3')
lr.summary()

0,1,2,3
Dep. Variable:,Zestimate,R-squared:,0.506
Model:,OLS,Adj. R-squared:,0.506
Method:,Least Squares,F-statistic:,9651.0
Date:,"Sat, 16 Jul 2022",Prob (F-statistic):,0.0
Time:,12:44:54,Log-Likelihood:,-4513600.0
No. Observations:,312656,AIC:,9027000.0
Df Residuals:,312640,BIC:,9027000.0
Df Model:,15,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
PollutionBurdenScore,-6095.1860,709.797,-8.587,0.000,-7486.363,-4704.009
Homeownership,-7.255e+04,6162.113,-11.774,0.000,-8.46e+04,-6.05e+04
BachelorsEd,1.712e+06,7520.633,227.657,0.000,1.7e+06,1.73e+06
Retail,3095.1783,112.863,27.424,0.000,2873.971,3316.386
ParkAccess,-5.428e+04,5027.007,-10.797,0.000,-6.41e+04,-4.44e+04
ViolentCrimeRate,8942.5132,355.270,25.171,0.000,8246.197,9638.829
PropertyCrimeRate,3430.7159,129.946,26.401,0.000,3176.026,3685.406
MortgageRate,-1.014e+05,585.163,-173.287,0.000,-1.03e+05,-1e+05
StationStatus_1,7.025e+04,5608.141,12.527,0.000,5.93e+04,8.12e+04

0,1,2,3
Omnibus:,296304.808,Durbin-Watson:,1.217
Prob(Omnibus):,0.0,Jarque-Bera (JB):,23006842.001
Skew:,4.423,Prob(JB):,0.0
Kurtosis:,44.083,Cond. No.,328.0


## Model 4: OLS model for all homes with distinction of housetype AND interaction between station status and housetype

In [8]:
#change housetype to equal 1 if single family home instead of condominiu
lr = sm.OLS(y_ht_intrctn,X_HT_INTRCTN).fit(cov_type='HC3')
lr.summary()

0,1,2,3
Dep. Variable:,Zestimate,R-squared:,0.513
Model:,OLS,Adj. R-squared:,0.513
Method:,Least Squares,F-statistic:,4179.0
Date:,"Sat, 16 Jul 2022",Prob (F-statistic):,0.0
Time:,12:44:55,Log-Likelihood:,-1816700.0
No. Observations:,123966,AIC:,3633000.0
Df Residuals:,123950,BIC:,3634000.0
Df Model:,15,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
PollutionBurdenScore,-7364.2076,1435.928,-5.129,0.000,-1.02e+04,-4549.840
Homeownership,-6.308e+04,1.21e+04,-5.210,0.000,-8.68e+04,-3.94e+04
BachelorsEd,2.207e+06,1.48e+04,149.122,0.000,2.18e+06,2.24e+06
Retail,3533.3345,198.312,17.817,0.000,3144.649,3922.020
ParkAccess,-9.677e+04,9873.408,-9.801,0.000,-1.16e+05,-7.74e+04
ViolentCrimeRate,686.3542,670.868,1.023,0.306,-628.523,2001.231
PropertyCrimeRate,4322.7777,243.075,17.784,0.000,3846.360,4799.195
MortgageRate,-1.127e+05,1161.109,-97.068,0.000,-1.15e+05,-1.1e+05
HouseType,6.324e+05,3565.038,177.391,0.000,6.25e+05,6.39e+05

0,1,2,3
Omnibus:,109507.455,Durbin-Watson:,1.253
Prob(Omnibus):,0.0,Jarque-Bera (JB):,5580277.569
Skew:,4.089,Prob(JB):,0.0
Kurtosis:,34.835,Cond. No.,334.0


## Model 5: OLS model for all homes with distinction between bedroom num AND interaction between station status and bedroom num

In [9]:
lr = sm.OLS(y_bd_intrctn,X_BD_INTRCTN).fit(cov_type='HC3')
lr.summary()

0,1,2,3
Dep. Variable:,Zestimate,R-squared:,0.511
Model:,OLS,Adj. R-squared:,0.511
Method:,Least Squares,F-statistic:,5551.0
Date:,"Sat, 16 Jul 2022",Prob (F-statistic):,0.0
Time:,12:44:55,Log-Likelihood:,-4512100.0
No. Observations:,312656,AIC:,9024000.0
Df Residuals:,312628,BIC:,9025000.0
Df Model:,27,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
PollutionBurdenScore,-6366.5372,699.241,-9.105,0.000,-7737.024,-4996.051
Homeownership,-6.501e+04,6128.569,-10.607,0.000,-7.7e+04,-5.3e+04
BachelorsEd,1.706e+06,7496.758,227.625,0.000,1.69e+06,1.72e+06
Retail,3289.3998,110.689,29.717,0.000,3072.452,3506.347
ParkAccess,-5.702e+04,4980.845,-11.448,0.000,-6.68e+04,-4.73e+04
ViolentCrimeRate,8810.6880,352.293,25.010,0.000,8120.206,9501.170
PropertyCrimeRate,3484.4282,127.666,27.293,0.000,3234.208,3734.649
MortgageRate,-1.014e+05,586.088,-173.018,0.000,-1.03e+05,-1e+05
StationStatus_1,-1.431e+05,9300.831,-15.389,0.000,-1.61e+05,-1.25e+05

0,1,2,3
Omnibus:,293397.017,Durbin-Watson:,1.231
Prob(Omnibus):,0.0,Jarque-Bera (JB):,21899826.247
Skew:,4.365,Prob(JB):,0.0
Kurtosis:,43.061,Cond. No.,822.0


# Discussion on Key Variables
## Station Status:
- For 4/5 models (Models 1-4), **all phases of a transit station after status 0 (planning, construction, and operation) were associated with higher Zestimate values than Zestimates in ZIP Codes where no concrete plans for a Metro station were made**. The results for station status are statistically significant. They are also economically significant with the increases ranging from the tens of thousands to the hundreds of thousands (depending on the model). 
- In Models 1-4, of all the planning stages after Status 0, Status 1 (the planning stage) was associated with the highest increase in Zestimates when compared to the pre-planning stage. This is against the expectation that the operational phase of a station would be associated with the highest increase Zestimate values compared to when there was no station at all. 
- Although Model 5 exhibits all station statuses past 0 to have a negative association with Zestimates, this is likely due to the addition of the interaction variables of station status with bedroom numbers. This would likely invalidate the previously discussed conclusions with Model 4. However, the 3 other models still exist to validate the theory that having a station serving a ZIP Code increases median homes' values. 
- As a result, focusing on the models without interactive variables, this research finds that **having an operational station serving a ZIP code is associated with an increase in median housing values**: specifically a 24,060 dollar increase in Model 1; 14,180 dollar increase in Model 2; and 20,390 dollar increase in Model 3.

## Interaction Variables:
- In Model 4, **single-family homes with stations benefit from mass transportation stations more than condominiums**. If we were to switch the values for house type so that condominiums were 1 instead of 0, we see a surprising decrease in Zestimates for condominiums with Metro stations when compared to single-family homes with no station. In this model, single-family homes with a station are worth 241,800 dollars more than condos with no Metro station. 
- Model 5 has numerous results with the interaction between the number of bedrooms and station status. One of the key findings is that **the larger the home, the more it benefits from the addition of public transportation**. These results are all statistically and economically significant. 

## Other variables:
- Across the board, **an increase in pollution burden is associated with a decrease in Zestimate values**. Every point increase is associated with a decrease in Zestimates ranging from 3,781 (Model 1) to 7,252 (Model 2) dollars. If we change the scale of pollution burden scores from 0-10 to 0-100, a one-point increase in pollution burden scores would result in a decrease in Zestimates ranging from 37,818 (Model 1) to 72,526 dollars (Model 2). This is an economically significant drop! All results are statistically significant as well. 
- The **results for Homeownership** are mixed. For all models except for Model 1, an increase in the homeownership rate per ZIP code is associated with a decrease in Zestimates values. This is the **opposite of what was expected**. 
- Across the board, **a point percentage increase in the percentage of the population with at least a bachelor's degree is associated with a large increase in Zestimate values**, with average expected increases within the millions. The results from the models for this variable are both statistically significant and clearly economically significant.
- **Most models show that an increase in employment density for retail, entertainment, supermarkets, and educational uses per ZIP Code increases housing values per ZIP Code**. The only model that doesn't show this is Model 1, which shows a decrease (this result is statistically insignificant at the 95% confidence interval though). This is economically significant because with the addition of just 1 job per acre could increase the Zestimates of a ZIP Code by a range between 2,982 dollars (Model 2) to 3,429 dollars (Model 4). Multiply these numbers by 10 for every 10 jobs in retail/entertainment/supermarkets/educational uses added in a ZIP Code per acre!
- Park Access results are mixed and, for the most part, contrary to expectations. Most models, surprisingly, show that as the percentage of the population living within half a mile of parks, beaches, or open space greater than 1 acre per ZIP Code increases the housing values of that ZIP Code decrease. This is likely due to the fact that open spaces attract a lot of people, which by nature can increase crime. Furthermore, not all open spaces are the same: some could be much more nicer than others. 
- Violent crime rates produced mixed results, most of which are not statistically significant.
- Property crime rates produced results contrary to expectations. As property crimes per 1,000 people increase, Zestimates increase with them according to the models.
- **Across all models, as mortgage rates rise, the values of homes go down**. This is consistent with the economic theory that raising interest rates makes the demand for buying homes lower (hence, causing housing values to go down). 

# Overall model
Because there are numerous independent variables in this model, the Adjusted R-Squared is to be analyzed. However, in all the models, both the r-squared and adjusted r-squared are the same. This could indicate that no extra, unnecessary variables were included in this model. For more information about the difference between r-squared and adjusted r-squared, please view the following: https://towardsdatascience.com/demystifying-r-squared-and-adjusted-r-squared-52903c006a60.

As expected, Models 2 and 3 have a slightly higher adjusted r-squared compared to Model 1, with their addition of house type and the number of bedrooms as controls. The adjusted r-squared is slightly higher in Models 4 and 5 as they control for interaction between station status and house type/number of bedrooms. Overall, the difference between the adjusted r-squared of Model 4 and 5 are not too far off from that of Model 1. Around 50% of the variation in Zestimates can be explained by the models.

Lastly, the F-statistic for all models are very high and their p-values are very low (basically 0.00). The F-statistic tests all the variables in a model as opposed to t-tests which test one variable at a time. F-statistics tests the model compared to the alternative of when the model has no independent variables (only the intercept). For more information about F-tests, visit the following: https://statisticsbyjim.com/regression/interpret-f-test-overall-significance-regression/. In this case of this research, the models as a whole are significant.

# Conclusion
Several limitations of this study include the lack of usage of Geographically Weighted Regression models (which take into account geographic variation; however, they are highly computationally expensive) and the lack of data available about individual homes (which could add additional controls in this model such as the square foot of a home, date of construction, etc.).

Despite these limitations, **this study concludes that the addition of public transportation IS NOT associated with a decrease in housing values**. Many of the talking points for NIMBYs, particularly upper-class NIMBYs, are mainly grounded on racism and classism as there is no factual or empirical evidence to support their claims. Furthermore, this research supports the conclusions made in numerous other studies on the topic, that **mass transportation has a positive effect on housing values**.