<p align= " center"> <strong>SPACIAL AUTOREGRESSIVE MODEL</p>
<p><b>By: Jefferson C.</b></p>

---

### __Data__

In [61]:
import pandas as pd 
import geopandas as gpd

# Data
data = pd.read_csv('NY-House-Dataset/NY-House-Dataset.csv')
ols_data = gpd.read_file('NY-House-Dataset/NY-House-SHP/VORONOI.shp')

ols_data.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 4200 entries, 0 to 4199
Data columns (total 15 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   PRICE       4200 non-null   int64   
 1   BEDS        4200 non-null   int64   
 2   BATH        4200 non-null   float64 
 3   PROPERTYSQ  4200 non-null   float64 
 4   LATITUDE    4200 non-null   float64 
 5   LONGITUDE   4200 non-null   float64 
 6   LOG-PRICE   4200 non-null   float64 
 7   LOG-PROPER  4200 non-null   float64 
 8   SUBLOCALIT  4200 non-null   object  
 9   TYPE        4200 non-null   object  
 10  BOROUGH     4200 non-null   object  
 11  TYPE-GROUP  4200 non-null   object  
 12  RESIDUALS   4200 non-null   float64 
 13  Y-HAT       4200 non-null   float64 
 14  geometry    4200 non-null   geometry
dtypes: float64(8), geometry(1), int64(2), object(4)
memory usage: 492.3+ KB


### __Spacial Autoregressive Model__

Extension Model of Linear Regression that incorporate the spacial structure for capture relationshiphs between observations.

- __Population Model__

$$ y_i = \ln(Price_i) = \qquad\mathbf{\rho} Wy\qquad+\mathbf{\beta_0} +  \mathbf{\beta_1}(Bath_i) + \mathbf{\beta_2}  \ln(Propertysqft_i) + \sum_{d \neq d_0}{\gamma_d \cdot 1(Borough_i = d)} + \sum_{p \neq p_0 }\delta_p\cdot 1(Housing\space Type_i=p) + u_i  $$

where: 
- $\rho \to$ spacial autocorrelation coeficient  
- $W \to$ weight spacial matrix (contiguity matrix in __VORONOI__ polygons) 
- $y \to$ vector of dependent variable ($\ln(price)$ )

<br>

- __Sampling Model__
$$ \hat{y_i} = \widehat{\ln(Price_i)} = \qquad\hat{\mathbf{\rho}} W\hat{y} \qquad+ \mathbf{\hat{\beta_0}}  +  \mathbf{\hat{\beta_1}}(Bath_i) + \mathbf{\hat{\beta_2}}  \ln(Propertysqft_i) + \sum_{d \neq d_0}{\hat{\gamma_d} \cdot 1(Borough=d)} + \sum_{p \neq p_0 }\hat{\delta_p}\cdot 1(Housing \space Type_i=p) + \hat{u_i}$$

In [62]:
# WEIGHT MATRIX
import numpy as np 
import pysal as ps 
from pysal.lib import weights

# Weight Matrix (Queen) 
W = weights.Queen.from_shapefile('NY-House-Dataset/NY-House-SHP/VORONOI.shp')
W.transform= 'r'

In [63]:
# SPATIAL AUTOREGRESSIVE MODEL (Dummy variables)
from pysal.model import spreg

# Dummy Variables 
borought_dummies = pd.get_dummies(ols_data['BOROUGH']).astype(int)
group_dummies = pd.get_dummies(ols_data['TYPE-GROUP']).astype(int)

# Independent Varible
# Add dummy variables to X data frame 
X = pd.concat([ols_data[['LOG-PROPER','BATH']],borought_dummies,group_dummies],axis=1)

# Dependent Varible
Y = ols_data['LOG-PRICE']

# Spatial Autoregressive Model
sar_model = spreg.ML_Lag(Y,X,w=W,spat_diag=True)
print(sar_model.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :   LOG-PRICE                Number of Observations:        4200
Mean dependent var  :     13.7774                Number of Variables   :          14
S.D. dependent var  :      1.0068                Degrees of Freedom    :        4186
Pseudo R-squared    :      0.0366
Spatial Pseudo R-squared:  0.0569
Log likelihood      : -17896.8078
Sigma-square ML     :    209.6465                Akaike info criterion :   35821.616
S.E of regression   :     14.4792                Schwarz criterion     :   35910.415

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
---------------------------------------------------------------

In [None]:
# SPATIAL AUTOREGRESSIVE MODEL (NO Dummy variables)

from pysal.model import spreg

# Estimaci√≥n del modelo SAR (lag espacial)
sar_model = spreg.GM_Lag(Y, X, w=W, lag_k=1, spat_diag=True)

# Independent Varible
# Add dummy variables to X data frame 
X = pd.concat([ols_data[['LOG-PROPER','BATH']]],axis=1)

# Dependent Varible
Y = ols_data['LOG-PRICE']

# Spatial Autoregressive Model
sar_model = spreg.ML_Lag(Y,X,w=W,spat_diag=True)
print(sar_model.summary)

# Residuals 
ols_data['RESIDUALS_SAR'] = sar_model.u

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :   LOG-PRICE                Number of Observations:        4200
Mean dependent var  :     13.7774                Number of Variables   :           4
S.D. dependent var  :      1.0068                Degrees of Freedom    :        4196
Pseudo R-squared    :      0.5963
Spatial Pseudo R-squared:  0.4293
Log likelihood      :  -4221.2836
Sigma-square ML     :      0.4107                Akaike info criterion :    8450.567
S.E of regression   :      0.6408                Schwarz criterion     :    8475.939

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
---------------------------------------------------------------

In [None]:
import statsmodels.api as sm
from statsmodels.stats import diagnostic as diag

# residuals SAR model
residuals = sar_model.u

# Independent Variable
X = pd.concat([ols_data[['LOG-PROPER','BATH']]], axis=1)

# Dependent Variable
X_const = sm.add_constant(X)

# Breusch-Pagan Test
bp_test = diag.het_breuschpagan(residuals, X_const)
print(f'Breusch-Pagan Test p-valor: {bp_test[1]}')

Breusch-Pagan Test p-valor: 4.492968288194693e-110


In [None]:
import statsmodels.api as sm
from statsmodels.stats import diagnostic as diag

# White Test
white_test = diag.het_white(residuals, X_const)
print(f'White Test p-valor: {white_test[1]}')

White Test p-valor: 4.422416286430625e-246


### __Import Data__

In [None]:
# IMPORT DATA TO CSV
ols_data.to_csv('NY-House-SAR.csv',index=False)

In [86]:
# VERIFY data frame 
pd.read_csv('NY-House-Dataset/NY-House-SAR.csv').head(9)

Unnamed: 0,PRICE,BEDS,BATH,PROPERTYSQ,LATITUDE,LONGITUDE,LOG-PRICE,LOG-PROPER,SUBLOCALIT,TYPE,BOROUGH,TYPE-GROUP,RESIDUALS,Y-HAT,geometry,RESIDUALS_SAR
0,1153000,4,2.373861,3980.0,40.507891,-74.253033,13.957878,8.289037,Richmond County,Multi-family home for sale,STATEN_ISLAND,MULTI_FAMILY,0.041755,13.916123,"POLYGON ((-74.2530332 40.511553841142955, -74....",-0.359771
1,838000,4,4.0,2780.0,40.512461,-74.248894,13.638773,7.930206,Richmond County,Pending,STATEN_ISLAND,OTHER,-0.119788,13.758561,POLYGON ((-74.25205167289866 40.51116182074138...,-0.681132
2,1280000,5,4.0,3600.0,40.499546,-74.244308,14.062371,8.188689,Richmond County,Pending,STATEN_ISLAND,OTHER,0.155889,13.906482,"POLYGON ((-74.2498335673531 40.4995462, -74.24...",-0.356748
3,1250000,5,4.0,2700.0,40.504193,-74.252823,14.038654,7.901007,Richmond County,Multi-family home for sale,STATEN_ISLAND,MULTI_FAMILY,0.107009,13.931645,"POLYGON ((-74.2530332 40.4995462, -74.2530332 ...",-0.274945
4,1100000,3,2.0,1928.0,40.514209,-74.25051,13.910821,7.564238,Richmond County,Multi-family home for sale,STATEN_ISLAND,MULTI_FAMILY,0.464094,13.446727,"POLYGON ((-74.2530332 40.529655552687544, -74....",0.171703
5,989000,6,4.0,2720.0,40.511945,-74.247072,13.80445,7.908387,Richmond County,House for sale,STATEN_ISLAND,HOUSE,-0.328154,14.132604,POLYGON ((-74.24476921486205 40.51918757989499...,-0.46234
6,1229000,6,5.0,2816.0,40.508631,-74.247023,14.021711,7.943073,Richmond County,Multi-family home for sale,STATEN_ISLAND,MULTI_FAMILY,-0.080105,14.101816,"POLYGON ((-74.2501761356023 40.50946297698628,...",-0.394291
7,1100000,4,2.0,1800.0,40.510337,-74.242487,13.910821,7.495542,Richmond County,House for sale,STATEN_ISLAND,HOUSE,0.306671,13.60415,"POLYGON ((-74.2410292143991 40.51203477481742,...",0.107383
8,1399999,3,2.0,2700.0,40.511687,-74.238819,14.151982,7.901007,Richmond County,House for sale,STATEN_ISLAND,HOUSE,0.315799,13.836183,POLYGON ((-74.2422233995243 40.515278218756144...,0.114809


---