# 7. Spatial Lag, Error, and Econometrics
## 7.1 Spatial Regression
#### 7.1.1 Spatial Heterogeneity
We've already defined what spatial autocorrelation is and how to detect it in spatial data. Let's take a closer look at the data generating processes that might be creating spatial autocorrelation or adding spatial structure to a dataset. Spatial hetrogeneity, also known as a 1st order spatial process, is a spatial process that:
* Regional or large-scale process over the entire study
* Variance/covariance structure drifts over an area
* Violates assumption of spatial stationarity

Spatial econometric approaches attempt to model the spatial heterogeneity in data. An approach purely based on a spatial heterogenous process assumes there is no spatial interaction in the data. We also call these reactive data generating processes because the target variable is reacting to an underlying trend surface or regional generator.

#### 7.1.2 Spatial Dependence

Spatial dependence, also known as a 2nd order spatial process, is a more local process that suggests a functional relationship between neighboring observations. Spatial dependence is:
* Small-scale, localized, and short-distance
* Spatial interaction or contagion present in our data generating clusters
* Follows a property known as ergodicity which allows interaction over a certain area

#### 7.1.3 Spatial Modeling
So what's the issue with this? 1st and 2nd order spatial effects can woefully misspecify models and inflate accuracy metrics. How should we treat the spatial structure in our data?

* Assumption of a 1st Order generative process
    * Spatial autocorrelation is a nuisance
    * Target variable reacting to some other variables
    * Regression structure focused on spatial heterogeneity
* Assumption of a 2nd Order generative process
    * Spatial autocorrelation is substantive
    * Focus on spatial interaction
    * Must consider spatially-influenced covariance strucutre
* Assumption of both processes
    * Spatial Error Model
        * Spatial regression
        * Spatial heterogenity in design matrix
        * Spatial interaction in the residuals
    * Spatial Lag Model
        * Spatial regression
        * Spatial heterogeneity in design matrix
        * Explicit control of interaction in the target variable

## 7.2 Spatial Lag Model

#### 7.2.1 Model
Spatial lag model accounts for 1st order spatial effects by including a weights matrix in the model. It deals with spatial dependence (2nd order effects) by controlling for spatial effects in the target using a spatial lag variable based on the neighbors.

In [46]:
# Create dependent, independent, and W
import pysal
import numpy as np
stp_cenacs = pysal.open('./data/stpete_cenacs_2014_norms.dbf')
w = pysal.queen_from_shapefile("./data/stpete_cenacs_2014.shp")
w.transform = 'r'
y = np.array(stp_cenacs.by_col('MEDHHINC')).T
y.shape = (len(y),1)
# x = np.array(stp_cenacs[:,:-5])
x_names = ["ACRES","AVE_HH_SZ","AVE_FAM_SZ","MED_AGE","PCT_BACHLR","PCT_POV"]
x = np.array([stp_cenacs.by_col(var) for var in x_names]).T

array([[ 26223],
       [ 25163],
       [ 50074],
       [ 40795],
       [ 27399],
       [ 73750],
       [ 90972],
       [ 29460],
       [ 59828],
       [ 31174],
       [ 38750],
       [ 25714],
       [ 38529],
       [ 28607],
       [ 38021],
       [ 56964],
       [ 58207],
       [ 11446],
       [ 38872],
       [ 49500],
       [ 20717],
       [ 47222],
       [ 63149],
       [ 27396],
       [ 46944],
       [ 51493],
       [ 42813],
       [ 57895],
       [ 22328],
       [ 34886],
       [ 57887],
       [ 44737],
       [ 51063],
       [ 57303],
       [ 40938],
       [ 31857],
       [ 48929],
       [ 87610],
       [ 97199],
       [ 14643],
       [ 50750],
       [ 42440],
       [ 50189],
       [ 73125],
       [ 58413],
       [ 55536],
       [ 22043],
       [ 26346],
       [ 12917],
       [ 21052],
       [ 72826],
       [ 64950],
       [ 25179],
       [101094],
       [ 78462],
       [ 40292],
       [ 35705],
       [ 48750],
       [ 60938

In [47]:
mllag = pysal.spreg.ML_Lag(y,x,w)
np.around(mllag.betas, decimals=4) 



array([[ -9.87485920e+03],
       [  3.42000000e-01],
       [  2.48976290e+04],
       [ -8.72115310e+03],
       [  1.80307200e+02],
       [  1.23084530e+03],
       [ -4.54458600e+02],
       [  2.27000000e-01]])

#### 7.2.2 Parameters and Metrics

In [54]:
print("{0:.6f}".format(mllag.rho))
print("{0:.6f}".format(mllag.mean_y))
print("{0:.6f}".format(mllag.std_y))
print(mllag.e_pred)
print(mllag.pr2)
print(mllag.pr2_e)

0.226998
47938.861472
22207.844075
[[  2554.04226456]
 [ 17210.7251358 ]
 [  2823.65205868]
 [-13162.22091742]
 [ -3071.60317381]
 [  8054.517894  ]
 [  3863.24145895]
 [-10870.84324489]
 [  4105.60842728]
 [  5057.67040607]
 [  6014.11130943]
 [-27098.48953732]
 [-11617.32700295]
 [ -6690.42282968]
 [  5352.54234913]
 [ -5968.83989508]
 [ 13635.06836494]
 [  6600.63049781]
 [ -5690.43203088]
 [ -8114.8076851 ]
 [  9550.16883297]
 [-21574.09659863]
 [  1339.53260235]
 [  6111.27362771]
 [ -5916.35151692]
 [ -3501.53186057]
 [  8625.6841863 ]
 [  4215.72839875]
 [  8032.28114619]
 [ -3023.26433815]
 [ -4395.82481387]
 [  8248.18320682]
 [  2519.4153856 ]
 [ -8217.92282788]
 [  5917.95128871]
 [  4331.68234078]
 [  3586.0600557 ]
 [ 19953.22344813]
 [ 16553.21981801]
 [-10698.80064523]
 [  -867.54254638]
 [-11123.93705161]
 [  1681.01449826]
 [  7278.98317457]
 [ -1942.00360074]
 [  -119.8306929 ]
 [ -2839.52106603]
 [ -6058.28438545]
 [  8565.78648723]
 [  6968.1278062 ]
 [  3609.253909

#### 7.2.3 Interpretation

In [55]:
print("{0:.6f}".format(mllag.sig2))
print("{0:.6f}".format(mllag.logll))
print("{0:.6f}".format(mllag.aic))
print("{0:.6f}".format(mllag.schwarz))
print("{0:.6f}".format(mllag.pr2))
print("{0:.4f}".format(mllag.pr2_e))
print("{0:.4f}".format(mllag.utu))

140124929.608094
-2495.363343
5006.726685
5034.266027
0.714654
0.7101
32368858739.4698


## 7.3 Spatial Error Model

#### 7.3.1 Model

#### 7.3.2 Parameters and Metrics

#### 7.3.3 Interpretation