This notebook contains the PySAL/spreg code for Chapter 12 - Regimes, Non-Spatial (OLS only)

in Modern Spatial Econometrics in Practice: A Guide to GeoDa, GeoDaSpace and PySAL.

by Luc Anselin and Sergio J. Rey

(c) 2014 Luc Anselin and Sergio J. Rey, All Rights Reserved

In [1]:
__author__ = "Luc Anselin luc.anselin@asu.edu"

##Regimes - Non-spatial - OLS##

###Baltimore Example###

Basic Setup: 

- import necessary modules (numpy and pysal)

- create a data object

- create variables as numpy arrays

- create regime variable (as list)

- create weights object(s) for diagnostics

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [2]:
import numpy as np
import pysal

create data object

In [3]:
db = pysal.open('data/baltim.dbf','r')

read in dependent variable and turn into numpy array y

In [4]:
y_name = "PRICE"
y = np.array([db.by_col(y_name)]).T

read in explanatory variables and turn into numpy array x

In [5]:
x_names = ['NROOM','NBATH','PATIO','FIREPL','AC','GAR','AGE','LOTSZ','SQFT']
x = np.array([db.by_col(var) for var in x_names]).T

create k = 4 nearest neighbor weights and row-standardize

In [6]:
w = pysal.knnW_from_shapefile("data/baltim.shp",k=4,idVariable='STATION')
w.transform = 'r'

use CITCOU as the regimes variable

In [7]:
rvar = "CITCOU"

In [8]:
regimes = db.by_col(rvar)    # note: regimes is a list

In [9]:
regimes[:4]

[0.0, 1.0, 1.0, 1.0]

##Regimes - Default Setting##

**With spatial diagnostics**

In [10]:
reg1 = pysal.spreg.OLS_Regimes(y,x,regimes,w=w,spat_diag=True,moran=True,
name_y=y_name,name_x=x_names,name_regimes=rvar,name_w="baltim_k4",name_ds="baltim.dbf")

Various regime settings

Separate regressions by regime

In [11]:
reg1.regime_err_sep

True

Different constant term in each regime

In [12]:
reg1.constant_regi

'many'

All coefficients are varying

In [13]:
reg1.cols2regi

'all'

Full output

Note the warning for islands in each of the regimes

In [14]:
print reg1.summary

REGRESSION
----------

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES ESTIMATION - REGIME 0
---------------------------------------------------------------
Data set            :  baltim.dbf
Weights matrix      :   baltim_k4
Dependent Variable  :     0_PRICE                Number of Observations:          83
Mean dependent var  :     31.5127                Number of Variables   :          10
S.D. dependent var  :     17.1598                Degrees of Freedom    :          73
R-squared           :      0.6129
Adjusted R-squared  :      0.5652
Sum squared residual:    9347.239                F-statistic           :     12.8414
Sigma-square        :     128.044                Prob(F-statistic)     :   5.381e-12
S.E. of regression  :      11.316                Log likelihood        :    -313.818
Sigma-square ML     :     112.617                Akaike info criterion :     647.635
S.E of regression ML:     10.6121                Schwarz criterion     :     671.824

--------------------------------

###Regime Options###

**regime_err_sep = False - forced homoskedasticity**

using k nearest neighbor weights

In [15]:
reg2 = pysal.spreg.OLS_Regimes(y,x,regimes,w=w,spat_diag=True,moran=True,
regime_err_sep=False,
name_y=y_name,name_x=x_names,name_regimes=rvar,name_w="baltim_k4",name_ds="baltim.dbf")

In [16]:
print reg2.summary

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES - REGIMES
---------------------------------------------------
Data set            :  baltim.dbf
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          20
S.D. dependent var  :     23.6061                Degrees of Freedom    :         191
R-squared           :      0.7391
Adjusted R-squared  :      0.7132
Sum squared residual:   30529.788                F-statistic           :     28.4795
Sigma-square        :     159.842                Prob(F-statistic)     :   1.261e-45
S.E. of regression  :      12.643                Log likelihood        :    -824.216
Sigma-square ML     :     144.691                Akaike info criterion :    1688.433
S.E of regression ML:     12.0288                Schwarz criterion     :    1755.470

---------------------------------------------------------

**constant_regi='one' -- one global constant**

with regime_err_sep=True (default), i.e. groupwise heteroskedasticity

In [17]:
reg3 = pysal.spreg.OLS_Regimes(y,x,regimes,w=w,spat_diag=True,moran=True,
constant_regi='one',name_y=y_name,name_x=x_names,
name_regimes=rvar,name_w="baltim_k4",name_ds="baltim.dbf")

In [18]:
print reg3.summary

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES - REGIMES (Group-wise heteroskedasticity)
-----------------------------------------------------------------------------------
Data set            :  baltim.dbf
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          19
S.D. dependent var  :     23.6061                Degrees of Freedom    :         192
R-squared           :      0.7389
Adjusted R-squared  :      0.7144

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
                0_AC      12.3629617       4.4713687       2.7649167       0.0062490
               0_AGE       0.0415673       0.0630325       0.6594583       0.5

with regime_err_sep=False, i.e. homoskedasticity

In [19]:
reg4 = pysal.spreg.OLS_Regimes(y,x,regimes,w=w,spat_diag=True,moran=True,
constant_regi='one',regime_err_sep=False,name_y=y_name,name_x=x_names,
name_regimes=rvar,name_w="baltim_k4",name_ds="baltim.dbf")

In [20]:
print reg4.summary

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES - REGIMES
---------------------------------------------------
Data set            :  baltim.dbf
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          19
S.D. dependent var  :     23.6061                Degrees of Freedom    :         192
R-squared           :      0.7389
Adjusted R-squared  :      0.7144
Sum squared residual:   30559.735                F-statistic           :     30.1790
Sigma-square        :     159.165                Prob(F-statistic)     :   2.471e-46
S.E. of regression  :      12.616                Log likelihood        :    -824.320
Sigma-square ML     :     144.833                Akaike info criterion :    1686.640
S.E of regression ML:     12.0347                Schwarz criterion     :    1750.325

---------------------------------------------------------

**cols2regi -- specifying variable specific regimes**

set up the list with True for regimes, False for constant across regimes

follow the order in which the x array has been created

NROOM, NBATH, PATIO, FIREPL, AC, GAR, AGE, LOTSZ, SQFT

only NBATH, GAR and LOTSZ vary

In [21]:
colsvari = [False,True,False,False,False,True,False,True,False]

**must set constant_regi='one' to keep constant from varying across regimes - not included in cols2regi**

with default regime_err_sep = True, k nearest neighbor weights

In [22]:
reg5 = pysal.spreg.OLS_Regimes(y,x,regimes,w=w,spat_diag=True,moran=True,
constant_regi='one',cols2regi=colsvari,
name_y=y_name,name_x=x_names,
name_regimes=rvar,name_w="baltim_k4",name_ds="baltim.dbf")

In [23]:
print reg5.summary

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES - REGIMES (Group-wise heteroskedasticity)
-----------------------------------------------------------------------------------
Data set            :  baltim.dbf
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          13
S.D. dependent var  :     23.6061                Degrees of Freedom    :         198
R-squared           :      0.7288
Adjusted R-squared  :      0.7123

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
               0_GAR      -0.5338910       2.7346384      -0.1952328       0.8454109
             0_LOTSZ       0.1442208       0.0388247       3.7146643       0.0

with regime_err_sep = False (homoskedasticity), k nearest neighbors

In [24]:
reg6 = pysal.spreg.OLS_Regimes(y,x,regimes,w=w,spat_diag=True,moran=True,
constant_regi='one',cols2regi=colsvari,regime_err_sep=False,
name_y=y_name,name_x=x_names,
name_regimes=rvar,name_w="baltim_k4",name_ds="baltim.dbf")

In [25]:
print reg6.summary

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES - REGIMES
---------------------------------------------------
Data set            :  baltim.dbf
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          13
S.D. dependent var  :     23.6061                Degrees of Freedom    :         198
R-squared           :      0.7288
Adjusted R-squared  :      0.7123
Sum squared residual:   31739.937                F-statistic           :     44.3338
Sigma-square        :     160.303                Prob(F-statistic)     :   1.532e-49
S.E. of regression  :      12.661                Log likelihood        :    -828.317
Sigma-square ML     :     150.426                Akaike info criterion :    1682.635
S.E of regression ML:     12.2648                Schwarz criterion     :    1726.209

---------------------------------------------------------

**default is constant varies across regimes**

In [26]:
reg7 = pysal.spreg.OLS_Regimes(y,x,regimes,w=w,spat_diag=True,moran=True,
cols2regi=colsvari,
name_y=y_name,name_x=x_names,
name_regimes=rvar,name_w="baltim_k4",name_ds="baltim.dbf")

In [27]:
print reg7.summary

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES - REGIMES (Group-wise heteroskedasticity)
-----------------------------------------------------------------------------------
Data set            :  baltim.dbf
Weights matrix      :   baltim_k4
Dependent Variable  :       PRICE                Number of Observations:         211
Mean dependent var  :     44.3072                Number of Variables   :          14
S.D. dependent var  :     23.6061                Degrees of Freedom    :         197
R-squared           :      0.7289
Adjusted R-squared  :      0.7110

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
          0_CONSTANT      11.4001349       5.3053644       2.1487939       0.0328709
               0_GAR      -0.5037883       2.7523302      -0.1830406       0.8

##Practice##

Use the Boston example (see Chapter 5 notebook) with CHAS as the regime variable. Experiment with the different options.
For example, using the results of the Chow test in the default setup for OLS regimes, let only those coefficient vary
that are significant in the individual Chow tests.