This notebook contains the PySAL/spreg code for Chapter 9 - GM/GMM Error 

in
Modern Spatial Econometrics in Practice: A Guide to GeoDa, GeoDaSpace and PySAL.

by Luc Anselin and Sergio J. Rey

(c) 2014 Luc Anselin and Sergio J. Rey, All Rights Reserved

In [1]:
__author__ = "Luc Anselin luc.anselin@asu.edu"

##Basic Regression Setup##

###Exogenous Explanatory Variables Only###

**Creating arrays for y and x for south.dbf example data set** (see previous notebooks)

Preliminaries, import **numpy** and **pysal**

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [2]:
import numpy as np
import pysal

In [3]:
db = pysal.open('data/south.dbf','r')
y_name = "HR90"
y = np.array([db.by_col(y_name)]).T
x_names = ["RD90","PS90","UE90","DV90"]
x = np.array([db.by_col(var) for var in x_names]).T

###Exogenous and Endogenous Explanatory Variables###

**Creating arrays for yend, q and xe (exogenous only)**

In [4]:
yend_names = ["UE90"]
yend = np.array([db.by_col(var) for var in yend_names]).T
q_names = ["FH90","FP89","GI89"]
q = np.array([db.by_col(var) for var in q_names]).T
xe_names = ["RD90","PS90","DV90"]
xe = np.array([db.by_col(var) for var in xe_names]).T

###Spatial Weights###

Queen contiguity, with FIPSNO as the ID variable

In [5]:
w = pysal.queen_from_shapefile('data/south.shp',idVariable="FIPSNO")
w.transform = 'r'

##GM##

###Exogenous Variables Only###

In [6]:
gm1 = pysal.spreg.GM_Error(y,x,w,name_y=y_name,name_x=x_names,
                  name_w="south_q",name_ds="south.dbf") 

Attributes of the regression object

In [7]:
dir(gm1)

['__doc__',
 '__init__',
 '__module__',
 '__summary',
 '_cache',
 'betas',
 'e_filtered',
 'k',
 'mean_y',
 'n',
 'name_ds',
 'name_w',
 'name_x',
 'name_y',
 'pr2',
 'predy',
 'sig2',
 'std_err',
 'std_y',
 'summary',
 'title',
 'u',
 'vm',
 'x',
 'y',
 'z_stat']

The estimated coefficients, including lambda as the last element

In [8]:
gm1.betas

array([[ 6.33865368],
       [ 4.43265183],
       [ 1.81335314],
       [-0.3985616 ],
       [ 0.47772164],
       [ 0.26040896]])

The spatial autoregressive coefficient

In [9]:
gm1.betas[-1][0]

0.2604089565665465

The full listing

In [10]:
print gm1.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED LEAST SQUARES
---------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.3066

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       6.3386537       1.0155422       6.2416445       0.0000000
                DV90       0.4777216       0.1203677       3.9688512       0.0000722
                PS90       1.8133531       0.2105237       8.6135328       0.0000000
      

###Exogenous and Endogenous Variables###

In [11]:
gm2 = pysal.spreg.GM_Endog_Error(y,xe,yend,q,w,name_y=y_name,
                  name_x=xe_names,name_yend=yend_names,name_q=q_names,
                  name_w="south_q",name_ds="south.dbf")

Attributes of the regression object

In [12]:
dir(gm2)

['__doc__',
 '__init__',
 '__module__',
 '__summary',
 '_cache',
 'betas',
 'e_filtered',
 'k',
 'mean_y',
 'n',
 'name_ds',
 'name_h',
 'name_q',
 'name_w',
 'name_x',
 'name_y',
 'name_yend',
 'name_z',
 'pr2',
 'predy',
 'sig2',
 'std_err',
 'std_y',
 'summary',
 'title',
 'u',
 'vm',
 'x',
 'y',
 'yend',
 'z',
 'z_stat']

The estimated coefficients, including lambda as the last element

In [13]:
gm2.betas

array([[ 10.7717841 ],
       [  5.90371303],
       [  2.04553883],
       [  0.49190638],
       [ -1.14071221],
       [  0.23609742]])

The spatial autoregressive coefficient

In [14]:
gm2.betas[-1][0]

0.23609741823856531

The full listing

In [15]:
print gm2.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED TWO STAGE LEAST SQUARES
-------------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.2818

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      10.7717841       1.2771988       8.4339137       0.0000000
                DV90       0.4919064       0.1246483       3.9463541       0.0000794
                PS90       2.0455388       0.2190619       9.3377222   

##GMM Heteroskedastic Case##

###Exogenous Variables Only###

In [16]:
gm3 = pysal.spreg.GM_Error_Het(y,x,w,name_y=y_name,name_x=x_names,
                  name_w="south_q",name_ds="south.dbf")

Attributes of the regression object

In [17]:
dir(gm3)

['__doc__',
 '__init__',
 '__module__',
 '__summary',
 '_cache',
 'betas',
 'e_filtered',
 'iter_stop',
 'iteration',
 'k',
 'mean_y',
 'n',
 'name_ds',
 'name_w',
 'name_x',
 'name_y',
 'pr2',
 'predy',
 'std_err',
 'std_y',
 'step1c',
 'summary',
 'title',
 'u',
 'vm',
 'x',
 'xtx',
 'y',
 'z_stat']

The estimated coefficients, including lambda as the last element

In [18]:
gm3.betas

array([[ 6.25760366],
       [ 4.41953589],
       [ 1.79832764],
       [-0.38976971],
       [ 0.48116579],
       [ 0.31474155]])

The spatial autoregressive coefficient

In [19]:
gm3.betas[-1][0]

0.31474155432811185

The full listing

In [20]:
print gm3.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED LEAST SQUARES (HET)
---------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.3062
N. of iterations    :           1                Step1c computed       :          No

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       6.2576037       1.0821873       5.7823668       0.0000000
                DV90       0.4811658       0.1198516       4.0146802       0.00

**Setting the step1c option**

In [21]:
gm4 = pysal.spreg.GM_Error_Het(y,x,w,step1c=True,name_y=y_name,
                name_x=x_names,name_w="south_q",name_ds="south.dbf")

The full listing

In [22]:
print gm4.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED LEAST SQUARES (HET)
---------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.3059
N. of iterations    :           1                Step1c computed       :         Yes

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       6.1903085       1.0826509       5.7177328       0.0000000
                DV90       0.4840327       0.1199034       4.0368561       0.00

**Setting the maximum number of iterations**

In [23]:
gm5 = pysal.spreg.GM_Error_Het(y,x,w,max_iter=10,name_y=y_name,
       name_x=x_names,name_w="south_q",name_ds="south.dbf")

The full listing

In [24]:
print gm5.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED LEAST SQUARES (HET)
---------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.3053
N. of iterations    :           5                Step1c computed       :          No

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       6.0484442       1.0837030       5.5812749       0.0000000
                DV90       0.4900931       0.1200190       4.0834631       0.00

###Exogenous and Endogenous Variables###

In [25]:
gm6 = pysal.spreg.GM_Endog_Error_Het(y,xe,yend,q,w,name_y=y_name,
                  name_x=xe_names,name_yend=yend_names,name_q=q_names,
                  name_w="south_q",name_ds="south.dbf")

Attributes of the regression object

In [26]:
dir(gm6)

['__doc__',
 '__init__',
 '__module__',
 '__summary',
 '_cache',
 'betas',
 'e_filtered',
 'h',
 'hth',
 'iter_stop',
 'iteration',
 'k',
 'mean_y',
 'n',
 'name_ds',
 'name_h',
 'name_q',
 'name_w',
 'name_x',
 'name_y',
 'name_yend',
 'name_z',
 'pr2',
 'predy',
 'q',
 'std_err',
 'std_y',
 'step1c',
 'summary',
 'title',
 'u',
 'vm',
 'x',
 'y',
 'yend',
 'z',
 'z_stat']

The estimated coefficients, including lambda as the last element

In [27]:
gm6.betas

array([[ 10.74563401],
       [  5.89766591],
       [  2.03579202],
       [  0.49278877],
       [ -1.13750113],
       [  0.26162482]])

The spatial autoregressive coefficient

In [28]:
gm6.betas[-1][0]

0.26162482091893224

The full listing

In [29]:
print gm6.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED TWO STAGE LEAST SQUARES (HET)
-------------------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.2820
N. of iterations    :           1                Step1c computed       :          No

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      10.7456340       1.5222725       7.0589425       0.0000000
                DV90       0.4927888       0.1266845       

##GMM Homoskedastic Case##

###Exogenous Variables Only###

In [30]:
gm7 = pysal.spreg.GM_Error_Hom(y,x,w,name_y=y_name,name_x=x_names,
           name_w="south_q",name_ds="south.dbf")

Attributes of the regression object

In [31]:
dir(gm7)

['__doc__',
 '__init__',
 '__module__',
 '__summary',
 '_cache',
 'betas',
 'e_filtered',
 'iter_stop',
 'iteration',
 'k',
 'mean_y',
 'n',
 'name_ds',
 'name_w',
 'name_x',
 'name_y',
 'pr2',
 'predy',
 'sig2',
 'std_err',
 'std_y',
 'summary',
 'title',
 'u',
 'vm',
 'x',
 'xtx',
 'y',
 'z_stat']

The estimated coefficients, including lambda as the last element

In [32]:
gm7.betas

array([[ 6.33803479],
       [ 4.43255065],
       [ 1.81323806],
       [-0.39849432],
       [ 0.4777479 ],
       [ 0.27985722]])

The spatial autoregressive coefficient

In [33]:
gm7.betas[-1][0]

0.2798572154943586

The full listing

In [34]:
print gm7.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED LEAST SQUARES (HOM)
---------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.3066
N. of iterations    :           1

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       6.3380348       1.0237066       6.1912612       0.0000000
                DV90       0.4777479       0.1210440       3.9468939       0.0000792
                PS90       1.8132381       0.

**The A1 option**

**A1 = 'hom'**

In [35]:
gm8a = pysal.spreg.GM_Error_Hom(y,x,w,A1='hom',name_y=y_name,
                    name_x=x_names,name_w="south_q",
                    name_ds="south.dbf")

Full listing

In [36]:
print gm8a.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED LEAST SQUARES (HOM)
---------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.3066
N. of iterations    :           1

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       6.3392818       1.0236978       6.1925324       0.0000000
                DV90       0.4776950       0.1210433       3.9464801       0.0000793
                PS90       1.8134699       0.

**A1 = 'het'**

In [37]:
gm8b = pysal.spreg.GM_Error_Hom(y,x,w,A1='het',name_y=y_name,
                    name_x=x_names,name_w="south_q",
                    name_ds="south.dbf")

Full listing

In [38]:
print gm8b.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED LEAST SQUARES (HOM)
---------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.3062
N. of iterations    :           1

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       6.2576037       1.0361188       6.0394656       0.0000000
                DV90       0.4811658       0.1220549       3.9422070       0.0000807
                PS90       1.7983276       0.

###Exogenous and Endogenous Variables###

In [39]:
gm9 = pysal.spreg.GM_Endog_Error_Hom(y,xe,yend,q,w,name_y=y_name,
                  name_x=xe_names,name_yend=yend_names,name_q=q_names,
                  name_w="south_q",name_ds="south.dbf")

Full listing

In [40]:
print gm9.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED TWO STAGE LEAST SQUARES (HOM)
-------------------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.2818
N. of iterations    :           1

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      10.7713463       1.2834619       8.3924158       0.0000000
                DV90       0.4919212       0.1249985       3.9354165       0.0000831
                PS90     

**The A1 option**

**A1 = 'hom'**

In [41]:
gm10a = pysal.spreg.GM_Endog_Error_Hom(y,xe,yend,q,w,A1='hom',
                      name_y=y_name,name_x=xe_names,name_yend=yend_names,
                      name_q=q_names,name_w="south_q",
                      name_ds="south.dbf")

Full listing

In [42]:
print gm10a.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED TWO STAGE LEAST SQUARES (HOM)
-------------------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.2818
N. of iterations    :           1

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      10.7722363       1.2834693       8.3930611       0.0000000
                DV90       0.4918910       0.1249994       3.9351462       0.0000831
                PS90     

**A1 = 'het'**

In [43]:
gm10b = pysal.spreg.GM_Endog_Error_Hom(y,xe,yend,q,w,A1='het',
                      name_y=y_name,name_x=xe_names,name_yend=yend_names,
                      name_q=q_names,name_w="south_q",
                      name_ds="south.dbf")

Full listing

In [44]:
print gm10b.summary

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED TWO STAGE LEAST SQUARES (HOM)
-------------------------------------------------------------------
Data set            :   south.dbf
Weights matrix      :     south_q
Dependent Variable  :        HR90                Number of Observations:        1412
Mean dependent var  :      9.5493                Number of Variables   :           5
S.D. dependent var  :      7.0389                Degrees of Freedom    :        1407
Pseudo R-squared    :      0.2820
N. of iterations    :           1

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      10.7456340       1.2965714       8.2877305       0.0000000
                DV90       0.4927888       0.1257156       3.9198710       0.0000886
                PS90     

##Practice##

Since the spatial diagnostics for the Boston house price example (Chapter 5 practice) pointed to a spatial error alternative,
estimate this specification by means of GM, GMM-het and GMM-hom. Compare the results and the inference. Feel free to experiment
with the various options (number of iterations, etc.). To assess the effect of endogenous variables, use the south or natregimes
data sets for one of the HR specifications (see Chapter 7 practice).