## Spatial Lag - Fixed Effects Panel Model

This notebook introduces the Spatial Lag model for Fixed Effects Panel data. It is based on the estimation procedure outline in:
- Anselin, Le Gallo and Jayet (2008). Spatial Panel Econometrics.
- Elhorst (2014). Spatial Econometrics, From Cross-Sectional Data to Spatial Panels.

## Spatial Lag and Error Model for Panel data

### Imports

In [1]:
import libpysal
import spreg
import numpy as np
import numpy.linalg as la
from scipy import sparse as sp
from scipy.sparse.linalg import splu as SuperLU
from spreg.utils import inverse_prod
from spreg.sputils import spdot, spfill_diagonal, spinv
try:
    from scipy.optimize import minimize_scalar
    minimize_scalar_available = True
except ImportError:
    minimize_scalar_available = False
    
from spreg.panel_utils import check_panel, demean_panel

## Spatial Lag model

### Read data

In [2]:
# Open data on NCOVR US County Homicides (3085 areas).
nat = libpysal.examples.load_example("NCOVR")
db = libpysal.io.open(nat.get_path("NAT.dbf"), "r")
# Create spatial weight matrix
nat_shp = libpysal.examples.get_path("NAT.shp")
w = libpysal.weights.Queen.from_shapefile(nat_shp)
w.transform = 'r'
# Define dependent variable
name_y = ["HR70", "HR80", "HR90"]
y = np.array([db.by_col(name) for name in name_y]).T
# Define independent variables
name_x = ["RD70", "RD80", "RD90", "PS70", "PS80", "PS90"]
x = np.array([db.by_col(name) for name in name_x]).T

epsilon = 0.0000001

### Transform variables

In [3]:
# Check the data structure and converts from wide to long if needed.
bigy, bigx, name_y, name_x = check_panel(y, x, w, name_y, name_x)

Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.


Demeaning the variables using 
$$
y^\ast = Q_0 y
$$ 

where $Q_0 = J_T \otimes I_N$ and $J_T = I_T - \iota \cdot \iota' / t$

In [4]:
n = w.n
t = bigy.shape[0] // n
k = bigx.shape[1]
# Demeaned variables
y = demean_panel(bigy, n, t)
x = demean_panel(bigx, n, t)
# Big W matrix
W = w.full()[0]
W_nt = np.kron(np.identity(t), W)
Wsp = w.sparse
Wsp_nt = sp.kron(sp.identity(t), Wsp)
# Lag variables
ylag = spdot(W_nt, y)
ylag2 = spdot(W_nt, ylag)
xlag = spdot(W_nt, x)

### Estimation

First, I'll compute the residuals of these two regressions:
$$
y = X\beta_0 + e_0
$$
and
$$
Wy = X\beta_1 + e_1
$$

Then, maximize the concentrated log-likehood function with respect to $\rho$:
$$
L = \frac{NT}{2} \ln (e'_r e_r) - T \ln | I_N - \rho W |
$$

where $e_r = e_0 - \rho e_1$. 

In [5]:
def c_loglik_sp(params, n, t, y, ylag, ylag2, x, xlag, I, Wsp):
    rho = params[0]
    lam = params[1]
    
    S = I - rho*Wsp
    R = I - lam*Wsp    
    Rx = x - lam*xlag
    Sy = y - rho*ylag
    RSy = Sy - lam*ylag + lam*rho*ylag2
    xRRx = spdot(Rx.T, Rx)
    xRRxi = spinv(xRRx)
    xRRSy =  spdot(Rx.T, RSy)
    b = spdot(xRRxi, xRRSy)
    er = RSy - spdot(Rx, b)
    sig2 = spdot(er.T, er) / (n*t)
    nlsig2 = (n*t / 2.0) * np.log(sig2)

    LU_s = SuperLU(S.tocsc())
    LU_r = SuperLU(R.tocsc())
    jacob_s = np.sum(np.log(np.abs(LU_s.U.diagonal())))
    jacob_r = np.sum(np.log(np.abs(LU_r.U.diagonal())))
    jacob = t * (jacob_s + jacob_r)
    clike = nlsig2 - jacob
    return clike

In [6]:
try:
    from scipy.optimize import minimize
    minimize_scalar_available = True
except ImportError:
    minimize_scalar_available = False

In [7]:
I = sp.identity(n)
args = (n, t, y, ylag, ylag2, x, xlag, I, Wsp)
res = minimize(c_loglik_sp, (0.0, 0.0), bounds=((-1.0, 1.0), (-1.0, 1.0)),
               args=args, method='L-BFGS-B', tol=epsilon)

In [8]:
res.x

array([-0.414457  ,  0.51130283])

In [14]:
res.hess_inv.todense()

array([[ 0.00294774, -0.00181184],
       [-0.00181184,  0.00127387]])

## R section

In [1]:
### load library
library("splm")

### set options
options(prompt = "R> ",  continue = "+ ", width = 70, useFancyQuotes = FALSE, warn=-1)

Loading required package: spdep

Loading required package: sp

Loading required package: spData

To access larger datasets in this package, install the spDataLarge
package with: `install.packages('spDataLarge',
repos='https://nowosad.github.io/drat/', type='source')`

Loading required package: sf

Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1



In [3]:
## read data
nat <- read.csv("data/NAT.csv", header = TRUE)
## set formula
fm <- HR ~ RD + PS
wnat <- as.matrix(read.csv("data/NAT_w.csv"))
## standardization
wnat <- wnat/apply(wnat, 1, sum)
## make it a listw
lwnat <- mat2listw(wnat)

col_order <- c("FIPSNO", "YEAR", "HR", "RD", "PS")
nat <- nat[, col_order]

In [4]:
fixed_lag = spml(HR ~ RD + PS, data=nat, listw=lwnat, effect="individual",
                 model="within", spatial.error = "b", lag=TRUE)

Registered S3 methods overwritten by 'spatialreg':
  method                   from 
  residuals.stsls          spdep
  deviance.stsls           spdep
  coef.stsls               spdep
  print.stsls              spdep
  summary.stsls            spdep
  print.summary.stsls      spdep
  residuals.gmsar          spdep
  deviance.gmsar           spdep
  coef.gmsar               spdep
  fitted.gmsar             spdep
  print.gmsar              spdep
  summary.gmsar            spdep
  print.summary.gmsar      spdep
  print.lagmess            spdep
  summary.lagmess          spdep
  print.summary.lagmess    spdep
  residuals.lagmess        spdep
  deviance.lagmess         spdep
  coef.lagmess             spdep
  fitted.lagmess           spdep
  logLik.lagmess           spdep
  fitted.SFResult          spdep
  print.SFResult           spdep
  fitted.ME_res            spdep
  print.ME_res             spdep
  print.lagImpact          spdep
  plot.lagImpact           spdep
  summary.lagImpact      

In [5]:
summary(fixed_lag)

Spatial panel fixed effects sarar model
 

Call:
spml(formula = HR ~ RD + PS, data = nat, listw = lwnat, model = "within", 
    effect = "individual", lag = TRUE, spatial.error = "b")

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-28.95636  -1.72199  -0.14332   1.50908  47.79093 

Spatial error parameter:
    Estimate Std. Error t-value  Pr(>|t|)    
rho 0.511304   0.032497  15.734 < 2.2e-16 ***

Spatial autoregressive coefficient:
        Estimate Std. Error t-value  Pr(>|t|)    
lambda -0.414459   0.047846 -8.6624 < 2.2e-16 ***

Coefficients:
   Estimate Std. Error t-value Pr(>|t|)    
RD  0.87618    0.17836  4.9126 8.99e-07 ***
PS -3.32652    0.60458 -5.5022 3.75e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
