This notebook contains the PySAL/spreg code for Chapter 7 - Spatial 2SLS 

in
Modern Spatial Econometrics in Practice: A Guide to GeoDa, GeoDaSpace and PySAL.

by Luc Anselin and Sergio J. Rey

(c) 2014 Luc Anselin and Sergio J. Rey, All Rights Reserved

In [1]:
__author__ = "Luc Anselin luc.anselin@asu.edu"

##Basic Regression Setup##

##Spatial Lag without Endogenous Variables##

**Creating arrays for y and x using the Baltimore example - see also Chapter 5 Notebook**

Preliminaries, import **numpy** and **pysal**

In [2]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [3]:
import numpy as np
import pysal

the **baltimore** sample data set

In [4]:
db = pysal.open('data/baltim.dbf','r')
y_name = "PRICE"
y = np.array([db.by_col(y_name)]).T
x_names = ['NROOM','NBATH','PATIO','FIREPL','AC','GAR','AGE',
           'LOTSZ','SQFT']
x = np.array([db.by_col(var) for var in x_names]).T

model weights - k nearest neighbors with k=4

In [5]:
w = pysal.knnW_from_shapefile('data/baltim.shp',
                                k=4,idVariable='STATION')
w.transform = 'r'

kernel weights - triangular with k=12

In [6]:
kw12 = pysal.adaptive_kernelW_from_shapefile('data/baltim.shp',
                                             k=12,diagonal=True,idVariable='STATION')

##Basic Spatial 2SLS##

default settings

In [7]:
reg1 = pysal.spreg.GM_Lag(y,x,w=w,name_y=y_name,name_x=x_names,
                          name_w='baltim_k4',name_ds='baltim')

coefficients in the order of variables in **x_names** - last one is spatial autoregressive coefficient

In [8]:
dir(reg1)

['__doc__',
 '__init__',
 '__module__',
 '__summary',
 '_cache',
 'betas',
 'e_pred',
 'h',
 'hth',
 'hthi',
 'htz',
 'k',
 'kstar',
 'mean_y',
 'n',
 'name_ds',
 'name_gwk',
 'name_h',
 'name_q',
 'name_w',
 'name_x',
 'name_y',
 'name_yend',
 'name_z',
 'pfora1a2',
 'pr2',
 'pr2_e',
 'predy',
 'predy_e',
 'q',
 'rho',
 'robust',
 'sig2',
 'sig2n',
 'sig2n_k',
 'std_err',
 'std_y',
 'summary',
 'title',
 'u',
 'utu',
 'varb',
 'vm',
 'x',
 'y',
 'yend',
 'z',
 'z_stat',
 'zthhthi']

In [9]:
reg1.betas

array([[-0.062598  ],
       [ 1.01935457],
       [ 5.5644252 ],
       [ 7.07238883],
       [ 7.30516192],
       [ 6.12992825],
       [ 3.41029693],
       [-0.09081007],
       [ 0.06599566],
       [ 0.06279926],
       [ 0.50431292]])

In [10]:
print reg1.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :      baltim
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          11
S.D. dependent var  :     23.6061                Degrees of Freedom    :         200
Pseudo R-squared    :      0.7083
Spatial Pseudo R-squared:  0.6820

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      -0.0625980       5.8254085      -0.0107457       0.9914263
                  AC       6.1299283       2.4221367       2.5307937       0.0113805
                 AGE      -0.0908101       0.0543237      -

using second order spatial lags for the instruments, set **w_lags = 2**

In [11]:
reg2 = pysal.spreg.GM_Lag(y,x,w=w,w_lags=2,name_y=y_name,
                          name_x=x_names,name_w='baltim_k4',
                          name_ds='baltim')

In [12]:
print reg2.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :      baltim
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          11
S.D. dependent var  :     23.6061                Degrees of Freedom    :         200
Pseudo R-squared    :      0.7080
Spatial Pseudo R-squared:  0.6808

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      -0.4328107       5.7762571      -0.0749293       0.9402710
                  AC       6.1025702       2.4229066       2.5186981       0.0117790
                 AGE      -0.0888641       0.0542009      -

up to third order spatial lags, set **w_lags=3**

In [13]:
reg2a = pysal.spreg.GM_Lag(y,x,w=w,w_lags=3,name_y=y_name,
                          name_x=x_names,name_w='baltim_k4',
                          name_ds='baltim')

In [14]:
print reg2a.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :      baltim
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          11
S.D. dependent var  :     23.6061                Degrees of Freedom    :         200
Pseudo R-squared    :      0.7084
Spatial Pseudo R-squared:  0.6820

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      -0.0440376       5.7436761      -0.0076671       0.9938826
                  AC       6.1312998       2.4210039       2.5325444       0.0113238
                 AGE      -0.0909076       0.0540817      -

###Direct, Indirect and Total Effects###

extract the regression coefficients

In [15]:
b = reg1.betas[:-1]

In [16]:
b

array([[-0.062598  ],
       [ 1.01935457],
       [ 5.5644252 ],
       [ 7.07238883],
       [ 7.30516192],
       [ 6.12992825],
       [ 3.41029693],
       [-0.09081007],
       [ 0.06599566],
       [ 0.06279926]])

extract the spatial autoregressive coefficient

In [17]:
rho = reg1.betas[-1]

In [18]:
rho

array([ 0.50431292])

total effect using the multiplier

In [19]:
btot = b / (1.0 - rho)

In [20]:
btot

array([[ -0.12628532],
       [  2.05644774],
       [ 11.22568142],
       [ 14.26784998],
       [ 14.73744683],
       [ 12.3665283 ],
       [  6.87993918],
       [ -0.1832004 ],
       [  0.13313977],
       [  0.12669135]])

indirect effect

In [21]:
bind = btot - b

summary of the results

In [22]:
varnames = ["CONSTANT"] + x_names
print "Variable       Direct       Indirect      Total"
for i in range(len(varnames)):
    print "%10s %12.7f %12.7f %12.7f" % (varnames[i],b[i][0],bind[i][0],btot[i][0])

Variable       Direct       Indirect      Total
  CONSTANT   -0.0625980   -0.0636873   -0.1262853
     NROOM    1.0193546    1.0370932    2.0564477
     NBATH    5.5644252    5.6612562   11.2256814
     PATIO    7.0723888    7.1954611   14.2678500
    FIREPL    7.3051619    7.4322849   14.7374468
        AC    6.1299283    6.2366001   12.3665283
       GAR    3.4102969    3.4696422    6.8799392
       AGE   -0.0908101   -0.0923903   -0.1832004
     LOTSZ    0.0659957    0.0671441    0.1331398
      SQFT    0.0627993    0.0638921    0.1266913


##Spatial 2SLS with Spatial Diagnostics##

specify the weights as **w=w**, set **spat_diag = True** and optionally specify a name for the weights

In [23]:
reg3 = pysal.spreg.GM_Lag(y,x,w=w,spat_diag=True,
                          name_y=y_name,name_x=x_names,
                          name_w='baltim_k4',name_ds='baltim')

In [24]:
print reg3.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :      baltim
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          11
S.D. dependent var  :     23.6061                Degrees of Freedom    :         200
Pseudo R-squared    :      0.7083
Spatial Pseudo R-squared:  0.6820

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      -0.0625980       5.8254085      -0.0107457       0.9914263
                  AC       6.1299283       2.4221367       2.5307937       0.0113805
                 AGE      -0.0908101       0.0543237      -

##Spatial 2SLS with White Standard Errors##

set **robust = 'white'**

In [25]:
reg4 = pysal.spreg.GM_Lag(y,x,w=w,robust='white',
                          spat_diag=True,name_y=y_name,name_x=x_names,
                          name_w='baltim_k4',name_ds='baltim')

In [26]:
print reg4.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :      baltim
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          11
S.D. dependent var  :     23.6061                Degrees of Freedom    :         200
Pseudo R-squared    :      0.7083
Spatial Pseudo R-squared:  0.6820

White Standard Errors
------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      -0.0625980       7.0759267      -0.0088466       0.9929415
                  AC       6.1299283       2.6896173       2.2791080       0.0226606
                 AGE      -0.0908101 

##Spatial 2SLS with HAC Standard Errors##

set **robust = 'hac'** and specify the kernel weights **gwk** and optionally their name **name_gwk**

In [27]:
reg5 = pysal.spreg.GM_Lag(y,x,w=w,robust='hac',gwk=kw12,
                          spat_diag=True,name_y=y_name,name_x=x_names,
                          name_w='baltim_k4',name_gwk='baltim_tri_k12',
                          name_ds='baltim')

In [28]:
print reg5.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :      baltim
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          11
S.D. dependent var  :     23.6061                Degrees of Freedom    :         200
Pseudo R-squared    :      0.7083
Spatial Pseudo R-squared:  0.6820

HAC Standard Errors; Kernel Weights: baltim_tri_k12
------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      -0.0625980       7.5653787      -0.0082743       0.9933982
                  AC       6.1299283       2.9543705       2.0748678       0.0379988
       

##Spatial Lag Model with other Endogenous Variables##

create the variable arrays using the **natregimes** sample data set

In [29]:
db = pysal.open('data/natregimes.dbf','r')
y_name = "HR90"
y = np.array([db.by_col(y_name)]).T
x_names = ['RD90','MA90','PS90']
x = np.array([db.by_col(var) for var in x_names]).T
yend_names = ['UE90']
yend = np.array([db.by_col(var) for var in yend_names]).T
q_names = ['FH90','FP89','GI89']
q = np.array([db.by_col(var) for var in q_names]).T

model weights

In [30]:
w = pysal.queen_from_shapefile('data/natregimes.shp',idVariable="FIPSNO")
w.transform = 'r'

###Spatial Lag with Endogenous Variables###

base case with spatial diagnostics

In [31]:
reg6 = pysal.spreg.GM_Lag(y,x,yend,q,w=w,spat_diag=True,
                          name_y=y_name,name_x=x_names,name_yend=yend_names,
                          name_q=q_names,name_w='natqueen',name_ds='nat')

In [32]:
print reg6.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :         nat
Weights matrix      :    natqueen
Dependent Variable  :        HR90                Number of Observations:        3085
Mean dependent var  :      6.1829                Number of Variables   :           6
S.D. dependent var  :      6.6414                Degrees of Freedom    :        3079
Pseudo R-squared    :      0.4186
Spatial Pseudo R-squared:  0.3914

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      10.0338240       1.3616383       7.3689349       0.0000000
                MA90      -0.0500990       0.0286025      -1.7515613       0.0798493
                PS90       1.5813070       0.1084249      1

without spatial lags for the instruments, set **lag_q = False**

In [33]:
reg7 = pysal.spreg.GM_Lag(y,x,yend,q,w=w,lag_q=False,spat_diag=True,
                          name_y=y_name,name_x=x_names,name_yend=yend_names,
                          name_q=q_names,name_w='nat_queen',name_ds='nat')

In [35]:
print reg7.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :         nat
Weights matrix      :   nat_queen
Dependent Variable  :        HR90                Number of Observations:        3085
Mean dependent var  :      6.1829                Number of Variables   :           6
S.D. dependent var  :      6.6414                Degrees of Freedom    :        3079
Pseudo R-squared    :      0.4076
Spatial Pseudo R-squared:  0.3802

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      11.2850228       1.4177538       7.9597903       0.0000000
                MA90      -0.0601927       0.0290474      -2.0722259       0.0382444
                PS90       1.6149324       0.1105060      1

##Practice##

Replicate the analysis above using a subset of the U.S. counties, i.e., the south data set. Use both k=6 nearest neighbors and queen contiguity as weights and compare the results. Use adaptive bandwidth quadratic kernel weights (k=12) to assess the effect of HAC standard errors.