# Spatial cross-section models: Application to crime data

[![DOI](https://zenodo.org/badge/387109687.svg)](https://zenodo.org/badge/latestdoi/387109687)

> Mendez C. (2021). Spatial econometrics for cross-sectional data: Columbus crime example. DOI: [10.5281/zenodo.5151076](https://doi.org/10.5281/zenodo.5151076). Notebook available at https://deepnote.com/@carlos-mendez/STATA-Spatial-panel-data-NkfLwLfHR3SY15RKWXbeIQ.


## Roadmap

![](https://github.com/quarcs-lab/data-open/raw/master/Columbus/columbus/SpatialEcometricsRoad.jpg)

In [1]:
* Clean your environment
clear all
macro drop _all
set more off
*cls
*version 17

In [2]:
* Install packages: esttab, estadd, eststo, estout, estpost (http://repec.sowi.unibe.ch/stata/estout/index.html)
* net install st0085_2, from(http://www.stata-journal.com/software/sj14-2)
* ssc install estout, replace

## Import W and data

In [3]:
* Import .dta weights matrix with spmatrix (official function from Stata15)
use "https://github.com/quarcs-lab/data-open/raw/master/Columbus/columbus/Wqueen_fromStata_spmat.dta", clear
gen id = _n
order id, first
spset id
spmatrix fromdata WqueenS_fromStata15 = v*, normalize(row) replace
spmatrix summarize WqueenS_fromStata15






      Sp dataset: Wqueen_fromStata_spmat.dta
Linked shapefile: <none>
            Data: Cross sectional
 Spatial-unit ID: _ID (equal to id)
     Coordinates: <none>



Weighting matrix WqueenS_fromStata15
---------------------------------------
           Type |           contiguity
  Normalization |                  row
      Dimension |              49 x 49
Elements        |
   minimum      |                    0
   minimum > 0  |                   .1
   mean         |             .0204082
   max          |                   .5
Neighbors       |
   minimum      |                    2
   mean         |             4.816327
   maximum      |                   10
---------------------------------------


In [4]:
* Import the dataset and set up the spatial id: https://geodacenter.github.io/data-and-lab/columbus/
use "https://github.com/quarcs-lab/data-open/raw/master/Columbus/columbus/columbusDbase.dta", clear
spset id




      Sp dataset: columbusDbase.dta
Linked shapefile: <none>
            Data: Cross sectional
 Spatial-unit ID: _ID (equal to id)
     Coordinates: <none>


In [5]:
label var CRIME "Crime"
label var INC   "Income"
label var HOVAL "House value"

## OLS

In [6]:
regress CRIME INC HOVAL
eststo OLS

estat ic
mat s=r(S)
quietly estadd scalar AIC = s[1,5]



      Source |       SS           df       MS      Number of obs   =        49
-------------+----------------------------------   F(2, 46)        =     28.39
       Model |  7423.32674         2  3711.66337   Prob > F        =    0.0000
    Residual |  6014.89274        46  130.758538   R-squared       =    0.5524
-------------+----------------------------------   Adj R-squared   =    0.5329
       Total |  13438.2195        48  279.962906   Root MSE        =    11.435

------------------------------------------------------------------------------
       CRIME | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         INC |  -1.597311   .3341308    -4.78   0.000    -2.269881   -.9247405
       HOVAL |  -.2739315   .1031987    -2.65   0.011    -.4816597   -.0662033
       _cons |   68.61896   4.735486    14.49   0.000     59.08692      78.151
-------------------------------------------------

### Moran's I test

In [7]:
regress CRIME INC HOVAL
estat moran, errorlag(WqueenS_fromStata15)



      Source |       SS           df       MS      Number of obs   =        49
-------------+----------------------------------   F(2, 46)        =     28.39
       Model |  7423.32674         2  3711.66337   Prob > F        =    0.0000
    Residual |  6014.89274        46  130.758538   R-squared       =    0.5524
-------------+----------------------------------   Adj R-squared   =    0.5329
       Total |  13438.2195        48  279.962906   Root MSE        =    11.435

------------------------------------------------------------------------------
       CRIME | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         INC |  -1.597311   .3341308    -4.78   0.000    -2.269881   -.9247405
       HOVAL |  -.2739315   .1031987    -2.65   0.011    -.4816597   -.0662033
       _cons |   68.61896   4.735486    14.49   0.000     59.08692      78.151
-------------------------------------------------

### LM tests

In [8]:
spatwmat using "https://github.com/quarcs-lab/data-open/raw/master/Columbus/columbus/Wqueen_fromStata_spmat.dta", name(WqueenS_fromStata_spatwmat) eigenval(eWqueenS_fromStata_spatwmat) standardize



The following matrices have been created:

1. Imported binary weights matrix WqueenS_fromStata_spatwmat (row-standardized)
   Dimension: 49x49

2. Eigenvalues matrix eWqueenS_fromStata_spatwmat
   Dimension: 49x1




In [9]:
quietly reg CRIME INC HOVAL
spatdiag, weights(WqueenS_fromStata_spatwmat)





Diagnostic tests for spatial dependence in OLS regression
---------------------------------------------------------


Fitted model
------------------------------------------------------------
CRIME = INC + HOVAL
------------------------------------------------------------

Weights matrix
------------------------------------------------------------
Name: WqueenS_fromStata_spatwmat
Type: Imported (binary)
Row-standardized: Yes
------------------------------------------------------------

Diagnostics
------------------------------------------------------------
Test                           |  Statistic    df   p-value
-------------------------------+----------------------------
Spatial error:                 |
  Moran's I                    |     2.840      1    0.005
  Lagrange multiplier          |     5.206      1    0.023
  Robust Lagrange multiplier   |     0.044      1    0.834
                               |
Spatial lag:                   |
  Lagrange multiplier          |   

## SAR

In [10]:
spregress CRIME INC HOVAL, ml dvarlag(WqueenS_fromStata15)
eststo SAR

estat ic
mat s=r(S)
quietly estadd scalar AIC = s[1,5]

estat impact


  (49 observations)
  (49 observations (places) used)
  (weighting matrix defines 49 places)

Performing grid search ... finished 

Optimizing concentrated log likelihood:

Iteration 0:   log likelihood = -182.69106  
Iteration 1:   log likelihood = -182.67397  
Iteration 2:   log likelihood = -182.67397  

Optimizing unconcentrated log likelihood:

Iteration 0:   log likelihood = -182.67397  
Iteration 1:   log likelihood = -182.67397  (backed up)

Spatial autoregressive model                            Number of obs =     49
Maximum likelihood estimates                            Wald chi2(3)  =  88.00
                                                        Prob > chi2   = 0.0000
Log likelihood = -182.67397                             Pseudo R2     = 0.5806

-------------------------------------------------------------------------------
        CRIME | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+-----------------------------------------------------

## SEM

In [11]:
spregress CRIME INC HOVAL, ml errorlag(WqueenS_fromStata15)
eststo SEM

estat ic
mat s=r(S)
quietly estadd scalar AIC = s[1,5]

estat impact


  (49 observations)
  (49 observations (places) used)
  (weighting matrix defines 49 places)

Performing grid search ... finished 

Optimizing concentrated log likelihood:

Iteration 0:   log likelihood = -183.79112  
Iteration 1:   log likelihood =  -183.7495  
Iteration 2:   log likelihood = -183.74943  
Iteration 3:   log likelihood = -183.74943  

Optimizing unconcentrated log likelihood:

Iteration 0:   log likelihood = -183.74943  
Iteration 1:   log likelihood = -183.74943  (backed up)

Spatial autoregressive model                            Number of obs =     49
Maximum likelihood estimates                            Wald chi2(2)  =  30.15
                                                        Prob > chi2   = 0.0000
Log likelihood = -183.74943                             Pseudo R2     = 0.5362

-------------------------------------------------------------------------------
        CRIME | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+--------

## SLX

In [12]:
spregress CRIME INC HOVAL, ml ivarlag(WqueenS_fromStata15: INC HOVAL)
eststo SLX

estat ic
mat s=r(S)
quietly estadd scalar AIC = s[1,5]

estat impact


  (49 observations)
  (49 observations (places) used)
  (weighting matrix defines 49 places)

Optimizing unconcentrated log likelihood:

Iteration 0:   log likelihood =  -183.9706  
Iteration 1:   log likelihood =  -183.9706  

Spatial autoregressive model                            Number of obs =     49
Maximum likelihood estimates                            Wald chi2(4)  =  76.80
                                                        Prob > chi2   = 0.0000
Log likelihood = -183.9706                              Pseudo R2     = 0.6105

-------------------------------------------------------------------------------
        CRIME | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
CRIME         |
          INC |   -1.09739   .3542451    -3.10   0.002    -1.791698   -.4030821
        HOVAL |  -.2943898   .0963324    -3.06   0.002    -.4831978   -.1055817
        _cons |   74.55343   6.363788 

## SDM

In [13]:
spregress CRIME INC HOVAL, ml dvarlag(WqueenS_fromStata15) ivarlag(WqueenS_fromStata15: INC HOVAL)
eststo SDM

estat ic
mat s=r(S)
quietly estadd scalar AIC = s[1,5]

estat impact


  (49 observations)
  (49 observations (places) used)
  (weighting matrix defines 49 places)

Performing grid search ... finished 

Optimizing concentrated log likelihood:

Iteration 0:   log likelihood = -181.63946  
Iteration 1:   log likelihood = -181.63926  
Iteration 2:   log likelihood = -181.63925  

Optimizing unconcentrated log likelihood:

Iteration 0:   log likelihood = -181.63925  
Iteration 1:   log likelihood = -181.63925  (backed up)

Spatial autoregressive model                            Number of obs =     49
Maximum likelihood estimates                            Wald chi2(5)  =  93.47
                                                        Prob > chi2   = 0.0000
Log likelihood = -181.63925                             Pseudo R2     = 0.6120

-------------------------------------------------------------------------------
        CRIME | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+-----------------------------------------------------

### Wald tests

### Reduce to OLS?

In [14]:
spregress CRIME INC HOVAL, ml dvarlag(WqueenS_fromStata15) ivarlag(WqueenS_fromStata15: INC HOVAL)
* Wald test: Reduce to OLS? (NO if p < 0.05 of the spatial terms)

  (49 observations)
  (49 observations (places) used)
  (weighting matrix defines 49 places)

Performing grid search ... finished 

Optimizing concentrated log likelihood:

Iteration 0:   log likelihood = -181.63946  
Iteration 1:   log likelihood = -181.63926  
Iteration 2:   log likelihood = -181.63925  

Optimizing unconcentrated log likelihood:

Iteration 0:   log likelihood = -181.63925  
Iteration 1:   log likelihood = -181.63925  (backed up)

Spatial autoregressive model                            Number of obs =     49
Maximum likelihood estimates                            Wald chi2(5)  =  93.47
                                                        Prob > chi2   = 0.0000
Log likelihood = -181.63925                             Pseudo R2     = 0.6120

-------------------------------------------------------------------------------
        CRIME | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+------------------------------------------------------

### Reduce to SLX?

In [15]:
* Wald test: Reduce to SLX? (NO if p < 0.05)
test ([WqueenS_fromStata15]CRIME = 0)


 ( 1)  [WqueenS_fromStata15]CRIME = 0

           chi2(  1) =    5.51
         Prob > chi2 =    0.0189


### Reduce to SAR?

In [16]:
* Wald test: Reduce to SAR? (NO if p < 0.05)
test ([WqueenS_fromStata15]INC = 0) ([WqueenS_fromStata15]HOVAL = 0)


 ( 1)  [WqueenS_fromStata15]INC = 0
 ( 2)  [WqueenS_fromStata15]HOVAL = 0

           chi2(  2) =    2.10
         Prob > chi2 =    0.3494


### Reduce to SEM?

In [17]:
* Wald test: Reduce to SEM? (NO if p < 0.05)
testnl ([WqueenS_fromStata15]INC = -[WqueenS_fromStata15]CRIME*[CRIME]INC) ([WqueenS_fromStata15]HOVAL = -[WqueenS_fromStata15]CRIME*[CRIME]HOVAL)


  (1)  [WqueenS_fromStata15]INC = -[WqueenS_fromStata15]CRIME*[CRIME]INC
  (2)  [WqueenS_fromStata15]HOVAL = -[WqueenS_fromStata15]CRIME*[CRIME]HOVAL

               chi2(2) =        4.08
           Prob > chi2 =        0.1300


## SDEM

In [18]:
spregress CRIME INC HOVAL, ml ivarlag(WqueenS_fromStata15: INC HOVAL) errorlag(WqueenS_fromStata15)
eststo SDEM

estat ic
mat s=r(S)
quietly estadd scalar AIC = s[1,5]

estat impact


  (49 observations)
  (49 observations (places) used)
  (weighting matrix defines 49 places)

Performing grid search ... finished 

Optimizing concentrated log likelihood:

Iteration 0:   log likelihood =  -181.7792  
Iteration 1:   log likelihood =   -181.779  
Iteration 2:   log likelihood =   -181.779  

Optimizing unconcentrated log likelihood:

Iteration 0:   log likelihood =   -181.779  
Iteration 1:   log likelihood =   -181.779  (backed up)

Spatial autoregressive model                            Number of obs =     49
Maximum likelihood estimates                            Wald chi2(4)  =  46.68
                                                        Prob > chi2   = 0.0000
Log likelihood = -181.779                               Pseudo R2     = 0.6092

-------------------------------------------------------------------------------
        CRIME | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+-----------------------------------------------------

## SAC

In [19]:
spregress CRIME INC HOVAL, ml dvarlag(WqueenS_fromStata15) errorlag(WqueenS_fromStata15)
eststo SAC

estat ic
mat s=r(S)
quietly estadd scalar AIC = s[1,5]

estat impact


  (49 observations)
  (49 observations (places) used)
  (weighting matrix defines 49 places)

Performing grid search ... finished 

Optimizing concentrated log likelihood:

Iteration 0:   log likelihood = -182.57166  
Iteration 1:   log likelihood = -182.55505  
Iteration 2:   log likelihood = -182.55502  
Iteration 3:   log likelihood = -182.55502  

Optimizing unconcentrated log likelihood:

Iteration 0:   log likelihood = -182.55502  
Iteration 1:   log likelihood = -182.55502  (backed up)

Spatial autoregressive model                            Number of obs =     49
Maximum likelihood estimates                            Wald chi2(3)  =  57.82
                                                        Prob > chi2   = 0.0000
Log likelihood = -182.55502                             Pseudo R2     = 0.5793

-------------------------------------------------------------------------------
        CRIME | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+--------

## GNS

In [20]:
spregress CRIME INC HOVAL, ml dvarlag(WqueenS_fromStata15) ivarlag(WqueenS_fromStata15: INC HOVAL) errorlag(WqueenS_fromStata15)
eststo GNS

estat ic
mat s=r(S)
quietly estadd scalar AIC = s[1,5]

estat impact


  (49 observations)
  (49 observations (places) used)
  (weighting matrix defines 49 places)

Performing grid search ... finished 

Optimizing concentrated log likelihood:

Iteration 0:   log likelihood = -181.60541  
Iteration 1:   log likelihood = -181.58046  
Iteration 2:   log likelihood = -181.58014  
Iteration 3:   log likelihood = -181.58014  

Optimizing unconcentrated log likelihood:

Iteration 0:   log likelihood = -181.58014  
Iteration 1:   log likelihood = -181.58014  (backed up)

Spatial autoregressive model                            Number of obs =     49
Maximum likelihood estimates                            Wald chi2(5)  =  62.57
                                                        Prob > chi2   = 0.0000
Log likelihood = -181.58014                             Pseudo R2     = 0.6115

-------------------------------------------------------------------------------
        CRIME | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+--------

## Comparison

In [21]:
%html
esttab OLS SAR SEM SLX SDM SDEM SAC GNS, label stats(AIC) mtitle("OLS" "SAR" "SEM" "SLX" "SDM" "SDEM" "SAC" "GNS") html

0,1,2,3,4,5,6,7,8
,,,,,,,,
,(1),(2),(3),(4),(5),(6),(7),(8)
,OLS,SAR,SEM,SLX,SDM,SDEM,SAC,GNS
,,,,,,,,
main,,,,,,,,
Income,-1.597***,-1.049**,-0.957*,-1.097**,-0.920**,-1.052**,-1.043**,-0.959**
,(-4.78),(-3.17),(-2.54),(-3.10),(-2.71),(-3.25),(-3.11),(-2.69)
,,,,,,,,
House value,-0.274*,-0.266**,-0.305***,-0.294**,-0.297***,-0.278**,-0.280**,-0.289**
,(-2.65),(-3.00),(-3.30),(-3.06),(-3.30),(-3.05),(-2.98),(-3.13)


In [22]:
eststo clear

The following comparison requires Stata 17.  Caution is needed as the p-values are not shown

In [23]:
collect clear

quietly spregress CRIME INC HOVAL, ml dvarlag(WqueenS_fromStata15)
collect: quietly estat impact

quietly spregress CRIME INC HOVAL, ml ivarlag(WqueenS_fromStata15: INC HOVAL)
collect: quietly estat impact

quietly spregress CRIME INC HOVAL, ml dvarlag(WqueenS_fromStata15) ivarlag(WqueenS_fromStata15: INC HOVAL)
collect: quietly estat impact

quietly spregress CRIME INC HOVAL, ml ivarlag(WqueenS_fromStata15: INC HOVAL) errorlag(WqueenS_fromStata15)
collect: quietly estat impact

quietly spregress CRIME INC HOVAL, ml dvarlag(WqueenS_fromStata15) errorlag(WqueenS_fromStata15)
collect: quietly estat impact

quietly spregress CRIME INC HOVAL, ml dvarlag(WqueenS_fromStata15) ivarlag(WqueenS_fromStata15: INC HOVAL) errorlag(WqueenS_fromStata15)
collect: quietly estat impact


collect label list cmdset, all
collect style autolevels result b_direct b_indirect  
collect label levels cmdset 1 "SAR" 2 "SLX" 3 "SDM" 4 "SDEM" 5 "SAC" 6 "GNS"
collect style cell, nformat(%7.2f)
collect layout (colname#result) (cmdset) 

















  Collection: default
   Dimension: cmdset
       Label: Command results index
Level labels:
           1  
           2  
           3  
           4  
           5  
           6  





Collection: default
      Rows: colname#result
   Columns: cmdset
   Table 1: 6 x 6

--------------------------------------------------
             |   SAR   SLX   SDM  SDEM   SAC   GNS
-------------+------------------------------------
Income       |                                    
  b_direct   | -1.10 -1.10 -1.02 -1.05 -1.08 -1.03
  b_indirect | -0.72 -1.40 -1.50 -1.20 -0.57 -1.39
House value  |                                    
  b_direct   | -0.28 -0.29 -0.28 -0.28 -0.29 -0.28
  b_indirect | -0.18  0.21  0.22  0.13 -0.15  0.18
--------------------------------------------------
