# Panel - Spatial Model

In [2]:
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd
import libpysal
import spreg

np.set_printoptions(suppress=True)

## NCOV data

I'm going to use a subsample of NCOVR US County Homicides. The dependent variable will be the **Homicide Rates**, and the independent variables are the **Resource Deprivation** (principal component composed of percent black, log of median family income, gini index of family income inequality, and more), and also the **Population Structure** (principal component composed of the log of population and the log of population density). Finally, the time period will be three decades: 1970, 1980, and 1990.

In [3]:
from libpysal.weights import w_subset
# Open data on NCOVR US County Homicides (3085 areas).
nat = libpysal.examples.load_example("NCOVR")
db = libpysal.io.open(nat.get_path("NAT.dbf"), "r")
# Create spatial weight matrix
nat_shp = libpysal.examples.get_path("NAT.shp")
w_full = libpysal.weights.Queen.from_shapefile(nat_shp)

# Define dependent variable
name_y = ["HR70", "HR80", "HR90"]
y_full = np.array([db.by_col(name) for name in name_y]).T
# Define independent variables
name_x = ["RD70", "RD80", "RD90", "PS70", "PS80", "PS90"]
x_full = np.array([db.by_col(name) for name in name_x]).T

epsilon = 0.0000001

The subsample include the counties of 4 states: Kansas, Missouri, Oklahoma, and Arkansas. The weight matrix is row-normalized after the subsample is filtered.

In [4]:
name_c = ["STATE_NAME", "FIPSNO"]
df_counties = pd.DataFrame([db.by_col(name) for name in name_c], index=name_c).T

filter_states = ["Kansas", "Missouri", "Oklahoma", "Arkansas"]
filter_counties = df_counties[df_counties["STATE_NAME"].isin(filter_states)]["FIPSNO"].values

counties = np.array(db.by_col("FIPSNO"))
subid = np.where(np.isin(counties, filter_counties))[0]

w = w_subset(w_full, subid)
w.transform = 'r'

y = y_full[subid, ]
x = x_full[subid, ]

## Diagnostics

The classic Lagrange Multiplier test the null hypothesis of no spatially lagged dependent variable and the null hypothesis of no
spatially autocorrelated error term. In order to include spatial interactions, we need a p-value less than 5%.

In [54]:
spreg.panel_LMlag(y, x, w)

Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.


(1.472807526666869, 0.22490325114767176)

In [55]:
spreg.panel_rLMlag(y, x, w)

Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.


(2.5125780962741793, 0.11294102977710921)

In [56]:
spreg.panel_LMerror(y, x, w)

Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.


(81.69630396101608, 1.5868998506678388e-19)

In [57]:
spreg.panel_rLMerror(y, x, w)

Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.


(32.14155241279442, 1.4333858484607395e-08)

We reject the null hypothesis in the LM error test and the robust version. However, we can't reject the hypothesis in the LM lag test.

## Estimation

The four basic estimations of panel data with spatial interactions are estimated below.

In [5]:
fe_lag = spreg.Panel_FE_Lag(y, x, w, name_y=name_y, name_x=name_x, name_ds="NAT")
fe_error = spreg.Panel_FE_Error(y, x, w, name_y=name_y, name_x=name_x, name_ds="NAT")
re_lag = spreg.Panel_RE_Lag(y, x, w, name_y=name_y, name_x=name_x, name_ds="NAT")
re_error = spreg.Panel_RE_Error(y, x, w, name_y=name_y, name_x=name_x, name_ds="NAT")

Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.
Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.
Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.
Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.


In [59]:
models_betas = np.hstack((np.vstack(([0], fe_lag.betas, [0])), 
                          np.vstack(([0], fe_error.betas, [0])), 
                          re_lag.betas, 
                          re_error.betas))
pd.DataFrame(models_betas, 
             columns=["FE_Lag", "FE_Error", "RE_Lag", "RE_Error"], 
             index=["Constant", "RD", "PS", "Rho/Lambda", "Random Effects"])

Unnamed: 0,FE_Lag,FE_Error,RE_Lag,RE_Error
Constant,0.0,0.0,4.44422,5.878938
RD,-0.615257,-0.512243,2.528217,3.23269
PS,-3.768267,-4.431288,2.247688,2.629968
Rho/Lambda,0.183525,0.190501,0.258468,0.340427
Random Effects,0.0,0.0,0.684266,4.978245


The estimation of $\rho$ for the spatial lag dependent variable is 0.18 in the Fixed Effects, and 0.26 in the Random Effects. On the other hand, the estimationof $\lambda$ for the spatial error term is 0.19 in the Fixed Effects, and 0.34 in the Random Effects.

## Hausman test

In [28]:
spreg.panel_Hausman(fe_lag, re_lag)

(-67.26822586935438, 1.0)

In [29]:
spreg.panel_Hausman(fe_error, re_error)

(-84.38351088621853, 1.0)