# 1. 空间模型初步

## 1.1 空间相互作用

空间模型的基础往往基于一个基本的假定：观测单位的距离越远，它们之间的联系就越弱。

例子
- 国家或城市人均警察开支的影响因素（Ajilore and Smith，2011; Di Tella and Schargrodsky，2004等）
- 地区或城市的人均教育支出的影响因素（Hondroyiannis et al., 2009）
- 地区或城市的污染状况（马大来等，2014）

## 1.2 邻居和空间权重矩阵

自Moran（1948）以来，空间相互作用常用“权重矩阵(W)”来表达。

权重矩阵中$(i,j)$位置的元素$w_{ij}$表示空间单位$i$与$j$的“接近性”。

如果$w_{ij} \neq=0$，空间单位$i$被认为是$j$的邻居，否则，就不是。

若$\sum_{j=1}^{N} w_{ij} =1$，那么权重矩阵就被称为行正规化。

**空间模型(SLX)示例**

$$y_{i} = \beta_{0} + \beta_{1}X + \beta_{2}W_{i}X + \varepsilon$$
或
$$y_{i} = \beta_{0} + \beta_{1}x_{i} + \beta_{2}\sum_{j=1}^{N}w_{ij}x_{j} + \varepsilon$$

### 空间权重矩阵示例

**邻接矩阵**

**地理距离矩阵**

$$w_{ij} = 1/d_{ij}$$

**经济距离矩阵**

$$w_{ij} = |INC_{i}-INC_{j}|^{-1}$$

改进

$$w_{ij} = 1 - \frac{|INC_{i}-INC_{j}|}{INC_{i}+INC_{j}}$$

参见http://47.96.41.8:3838/spatialweightmatrix/

# 2. 空间模型的设定和估计

## 2.1 模型设定

**空间相关性检验**

- Moran I test
- LM test
- Wald test

例如B是真实模型，而错误的估计了A

$$A: y = X \beta + \varepsilon$$

$$B: y = X \beta + \delta Wy + \varepsilon$$

![空间计量模型](https://cl.ly/2m240i3L3u2R/spatialmodels.png)

## 2.2 估计方法

- 极大似然法(ML)
- 工具变量和广义矩估计(IV/GMM) - 例如，Kelejian等（2004）建议用$[X WX ... W^{g}X]$作工具。

空间权重矩阵$W$不能被估计，需要提前对其进行设定。所以检验W的设定是否稳健成为了一个惯例（Elhorst，2014）。

## 2.3 空间模型的直接和间接效应

例如SAR模型

$y=X\beta + \delta Wy + \mu$$

方程的解可以写为

$y = (I - \delta W)^{-1} [X\beta + \mu]$

假设

$G=(I - \delta W)^{-1}$

那么$x_{1}$变化的对于$E(y)$的影响可以表示为

$\frac{\partial E(y_{j})}{\partial x_{1}} = G_{j1} \beta$

<center>
<img src="https://cl.ly/2V3e1x3j1I0w/spatial01.png" width="40%" height="40%" />
</center>

**自身的溢出效应(own spillover effect)**

$$\frac{\partial E(y_{1})}{\partial x_{1}} = G_{11} \beta$$

同样的SAR模型，假设使用的是跨国数据，$y$是出口，$x$是GDP。除了第一个国家，所有其他国家GDP的变动对于第一个国家出口的效应可以表示为

$$V_{1}=\sum_{j=2}^{N} \frac{\partial E(y_{1})}{\partial x_{j}} = \sum_{j=2}^{N} G_{1j} \beta$$

**总结**

![直接和间接效应](https://cl.ly/3x1r323a3F2p/spatialeffects.png)

# 3. 应用Stata进行空间计量分析

## 3.1 空间数据的预处理

- 数据清洗
- 数据空间结构的构建

In [2]:
# 导入python调用stata的包
import ipystata

**设置工作路径**

In [2]:
%%stata

cd "E:\cyberspace\notebook\Project\spatial econometrics summer"


E:\cyberspace\notebook\Project\spatial econometrics summer



**处理shp格式的空间数据**

用spshape2dta命令把shp格式的文件转换为dta格式。这里province_2004是中国省级区域的地图文件。

In [3]:
%%stata

spshape2dta province_2004, replace

  (importing .shp file)
  (importing .dbf file)
  (creating _ID spatial-unit id)
  (creating _CX coordinate)
  (creating _CY coordinate)

  file province_2004_shp.dta created
  file province_2004.dta     created



**stata14以上转码**（非必须）

In [4]:
%%stata

clear
unicode analyze province_2004.dta
unicode encoding set gb18030
unicode translate province_2004.dta


  File summary (before starting):
        1  file(s) specified
        1  file(s) already translated        in previous runs
        0  file(s) to be examined ...
  (nothing to do)

  (default encoding now gb18030)

  (using gb18030 encoding)

  File summary (before starting):
        1  file(s) specified
        1  file(s) already translated        in previous runs
        0  file(s) to be examined ...
  (nothing to do)



**导入新创建的province_2004.dta文件**

In [5]:
%%stata -o province_2004

use province_2004, clear
describe


Contains data from province_2004.dta
  obs:            37                          
 vars:             9                          10 Jul 2018 23:26
 size:         1,739                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
_ID             byte    %12.0g                Spatial-unit ID
_CX             double  %10.0g                x-coordinate of area centroid
_CY             double  %10.0g                y-coordinate of area centroid
AREA            float   %13.11f               AREA
PERIMETER       float   %13.11f               PERIMETER
LEVEL1_         byte   

**设置新变量acode保存省份代码，并根据此变量排序**

In [7]:
%%stata

gen acode = AD2004
sort acode
spset acode, modify replace

save province_2004, replace


  (_shp.dta file saved)
  (data in memory saved)
  Sp dataset data_output.dta
                data:  cross sectional
     spatial-unit id:  _ID (equal to acode)
         coordinates:  _CX, _CY (planar)
    linked shapefile:  province_2004_shp.dta

file province_2004.dta saved



**导入统计数据文件**

In [8]:
%%stata

import excel "prov_data.xls", sheet("Sheet1") firstrow clear
describe


Contains data
  obs:            31                          
 vars:            12                          
 size:         2,852                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
acode           long    %10.0g                acode
GDP             double  %10.0g                GDP
capital         double  %10.0g                capital
labour          double  %10.0g                labour
humancapital    double  %10.0g                humancapital
perGDP          double  %10.0g                perGDP
perCapital      double  %10.0g                perCapital
perHum

**合并地理格式信息与统计数据信息，要求省份代码一一对应**

In [9]:
%%stata

merge 1:1 acode using province_2004

(note: variable acode was long, now double to accommodate using data's values)

    Result                           # of obs.
    -----------------------------------------
    not matched                             6
        from master                         0  (_merge==1)
        from using                          6  (_merge==2)

    matched                                31  (_merge==3)
    -----------------------------------------



**创建空间权重矩阵，并对回归的残差做Moran I检验**

In [10]:
%%stata

keep if _merge==3
drop _merge

gen linitperGDP = ln(initperGDP)

regress perGDPgrowth linitperGDP perCapitalgrwoth perHumancapitalgrowth

spmatrix clear
spmatrix create contiguity W, normalize(none)

estat moran, errorlag(W)

(6 observations deleted)

      Source |       SS           df       MS      Number of obs   =        31
-------------+----------------------------------   F(3, 27)        =     11.19
       Model |  101.263053         3   33.754351   Prob > F        =    0.0001
    Residual |  81.4103685        27  3.01519883   R-squared       =    0.5543
-------------+----------------------------------   Adj R-squared   =    0.5048
       Total |  182.673422        30  6.08911405   Root MSE        =    1.7364

---------------------------------------------------------------------------------------
         perGDPgrowth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
          linitperGDP |  -.6955203   .7393406    -0.94   0.355    -2.212522    .8214813
     perCapitalgrwoth |   .2374025   .0458477     5.18   0.000     .1433308    .3314741
perHumancapitalgrowth |   6.502551   3.240351     2.01   0.

**导出空间权重矩阵到文件**

In [31]:
%%stata

spmatrix export W using province2004W.txt

  (matrix W saved in file province2004W.txt)



**用pysal包导入空间权重矩阵文件**

In [35]:
import pysal

stata_txt = pysal.open(r'E:\cyberspace\notebook\Project\spatial econometrics summer\province2004W.txt','r','stata_text')
w = stata_txt.read()
stata_txt.close()
w.n



31

**读取省级代码和名称对应表**

In [43]:
import sys

sys.path.append("D:\\github\\pluto")
from application.dataworld.admindivision.class_admindivision import AdminDivision

adivision = AdminDivision(year='2011')
adivision.province

region_dict = dict(zip(adivision.province['acode'],adivision.province['region']))
region_dict

{'110000': '北京市',
 '120000': '天津市',
 '130000': '河北省',
 '140000': '山西省',
 '150000': '内蒙古自治区',
 '210000': '辽宁省',
 '220000': '吉林省',
 '230000': '黑龙江省',
 '310000': '上海市',
 '320000': '江苏省',
 '330000': '浙江省',
 '340000': '安徽省',
 '350000': '福建省',
 '360000': '江西省',
 '370000': '山东省',
 '410000': '河南省',
 '420000': '湖北省',
 '430000': '湖南省',
 '440000': '广东省',
 '450000': '广西壮族自治区',
 '460000': '海南省',
 '500000': '重庆市',
 '510000': '四川省',
 '520000': '贵州省',
 '530000': '云南省',
 '540000': '西藏自治区',
 '610000': '陕西省',
 '620000': '甘肃省',
 '630000': '青海省',
 '640000': '宁夏回族自治区',
 '650000': '新疆维吾尔自治区'}

**查看空间权重矩阵信息**

In [47]:
neighbors = dict()
for key in w.neighbors:
    neighbors[region_dict[str(key)]] = [region_dict[str(item)] for item in w.neighbors[key]]
neighbors

{'北京市': ['天津市', '河北省'],
 '天津市': ['北京市', '河北省'],
 '河北省': ['北京市', '天津市', '山西省', '内蒙古自治区', '辽宁省', '山东省', '河南省'],
 '山西省': ['河北省', '内蒙古自治区', '河南省', '陕西省'],
 '内蒙古自治区': ['河北省', '山西省', '辽宁省', '吉林省', '黑龙江省', '陕西省', '甘肃省', '宁夏回族自治区'],
 '辽宁省': ['河北省', '内蒙古自治区', '吉林省'],
 '吉林省': ['内蒙古自治区', '辽宁省', '黑龙江省'],
 '黑龙江省': ['内蒙古自治区', '吉林省'],
 '上海市': ['江苏省', '浙江省'],
 '江苏省': ['上海市', '浙江省', '安徽省', '山东省'],
 '浙江省': ['上海市', '江苏省', '安徽省', '福建省', '江西省'],
 '安徽省': ['江苏省', '浙江省', '江西省', '山东省', '河南省', '湖北省'],
 '福建省': ['浙江省', '江西省', '广东省'],
 '江西省': ['浙江省', '安徽省', '福建省', '湖北省', '湖南省', '广东省'],
 '山东省': ['河北省', '江苏省', '安徽省', '河南省'],
 '河南省': ['河北省', '山西省', '安徽省', '山东省', '湖北省', '陕西省'],
 '湖北省': ['安徽省', '江西省', '河南省', '湖南省', '重庆市', '陕西省'],
 '湖南省': ['江西省', '湖北省', '广东省', '广西壮族自治区', '重庆市', '贵州省'],
 '广东省': ['福建省', '江西省', '湖南省', '广西壮族自治区'],
 '广西壮族自治区': ['湖南省', '广东省', '贵州省', '云南省'],
 '海南省': [],
 '重庆市': ['湖北省', '湖南省', '四川省', '贵州省', '陕西省'],
 '四川省': ['重庆市', '贵州省', '云南省', '西藏自治区', '陕西省', '甘肃省', '青海省'],
 '贵州省': ['湖南省', '广西壮族自治区', '重庆市', '四

## 实例：项目制治理模式与中国地区经济发展的截面空间回归模型
- 案例数据来自于郑世林和应珊珊（2017）的论文“项目制治理模式与中国地区经济发展”（中国工业经济）

**数据预处理：shp格式文件转dta，并设置新变量acode**

In [3]:
%%stata

cd "E:\cyberspace\notebook\Project\spatial econometrics summer"

spshape2dta city, replace

use city, clear
gen acode = AD2004
sort acode

spset acode, modify replace

save, replace


E:\cyberspace\notebook\Project\spatial econometrics summer

  (importing .shp file)
  (importing .dbf file)
  (creating _ID spatial-unit id)
  (creating _CX coordinate)
  (creating _CY coordinate)

  file city_shp.dta created
  file city.dta     created

  (_shp.dta file saved)
  (data in memory saved)
  Sp dataset city.dta
                data:  cross sectional
     spatial-unit id:  _ID (equal to acode)
         coordinates:  _CX, _CY (planar)
    linked shapefile:  city_shp.dta

file city.dta saved



**关联地理信息与统计数据的地区**

In [4]:
%%stata

use project_open, clear
keep if year == 2009
drop if y == .
rename code acode

merge 1:1 acode using city
keep if _merge==3
drop _merge

save spatial_project, replace


(2,802 observations deleted)

(1 observation deleted)

(note: variable acode was long, now double to accommodate using data's values)

    Result                           # of obs.
    -----------------------------------------
    not matched                             7
        from master                         0  (_merge==1)
        from using                          7  (_merge==2)

    matched                               280  (_merge==3)
    -----------------------------------------

(7 observations deleted)

file spatial_project.dta saved



**不包含空间作用的回归模型，并检验**

In [5]:
%%stata

use spatial_project, clear

reg y project

spmatrix clear
spmatrix create contiguity W, normalize(row)

estat moran, errorlag(W)


      Source |       SS           df       MS      Number of obs   =       280
-------------+----------------------------------   F(1, 278)       =     31.54
       Model |  340.541661         1  340.541661   Prob > F        =    0.0000
    Residual |  3001.99407       278  10.7985398   R-squared       =    0.1019
-------------+----------------------------------   Adj R-squared   =    0.0987
       Total |  3342.53573       279  11.9804148   Root MSE        =    3.2861

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     project |    .123953   .0220726     5.62   0.000     .0805022    .1674038
       _cons |   6.938091    .995225     6.97   0.000     4.978957    8.897225
------------------------------------------------------------------------------

  weighting matrix in W contains 6 islands

Moran

**SAR模型估计**

In [10]:
%%stata

spregress y project, gs2sls dvarlag(W)
estat impact

  (280 observations)
  (280 observations (places) used)
  (weighting matrix defines 280 places)

Spatial autoregressive model                    Number of obs     =        280
GS2SLS estimates                                Wald chi2(2)      =      67.29
                                                Prob > chi2       =     0.0000
                                                Pseudo R2         =     0.1550

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
     project |   .0952824   .0198693     4.80   0.000     .0563392    .1342256
       _cons |   3.038352   1.161154     2.62   0.009     .7625322    5.314172
-------------+----------------------------------------------------------------
W            |
           y |   .4221614   .0847593     4.98   0.000     .2560362    .5882866
---

**SDM回归模型**

In [12]:
%%stata

spregress y project, gs2sls dvarlag(W) ivarlag(W:project)
estat impact

  (280 observations)
  (280 observations (places) used)
  (weighting matrix defines 280 places)

Spatial autoregressive model                    Number of obs     =        280
GS2SLS estimates                                Wald chi2(3)      =      79.72
                                                Prob > chi2       =     0.0000
                                                Pseudo R2         =     0.1002

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
     project |   .1025992    .019755     5.19   0.000     .0638801    .1413183
       _cons |   2.286735   1.152446     1.98   0.047     .0279817    4.545487
-------------+----------------------------------------------------------------
W            |
     project |  -.0974891   .0570565    -1.71   0.088    -.2093178    .0143397
   

**加入更多控制变量**

- rgdpc_lag1: 滞后一期人均GDP
- popgrow：人口自然增长率
- edu：高中以上在校人数占人口比率

**不包括空间作用的回归模型**

In [13]:
%%stata

reg y project rgdpc_lag1 popgrow edu

drop if edu == . | popgrow == . | y == .
spmatrix create contiguity M, normalize(row)

estat moran, errorlag(M)


      Source |       SS           df       MS      Number of obs   =       272
-------------+----------------------------------   F(4, 267)       =     13.61
       Model |  550.837999         4    137.7095   Prob > F        =    0.0000
    Residual |  2701.04296       267  10.1162658   R-squared       =    0.1694
-------------+----------------------------------   Adj R-squared   =    0.1569
       Total |  3251.88096       271  11.9995607   Root MSE        =    3.1806

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     project |   .0934196   .0235719     3.96   0.000     .0470091      .13983
  rgdpc_lag1 |  -.1894917   .3622769    -0.52   0.601    -.9027746    .5237912
     popgrow |  -.1696845   .0586289    -2.89   0.004    -.2851183   -.0542507
         edu |  -25.27467   8.629811    -2.93   0.

**SAR回归模型**

In [14]:
%%stata

spregress y project rgdpc_lag1 popgrow edu, gs2sls dvarlag(M)
estat impact

  (272 observations)
  (272 observations (places) used)
  (weighting matrix defines 272 places)

Spatial autoregressive model                    Number of obs     =        272
GS2SLS estimates                                Wald chi2(5)      =      96.83
                                                Prob > chi2       =     0.0000
                                                Pseudo R2         =     0.2221

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
     project |    .074747    .020882     3.58   0.000      .033819     .115675
  rgdpc_lag1 |  -.1543023   .3158765    -0.49   0.625    -.7734089    .4648043
     popgrow |  -.1123624   .0523684    -2.15   0.032    -.2150027   -.0097222
         edu |  -16.00962   7.745965    -2.07   0.039    -31.19143   -.8278031
       _cons |    

**SDM回归模型**

In [17]:
%%stata

spregress y project rgdpc_lag1 popgrow edu, gs2sls dvarlag(M) ivarlag(M:project rgdpc_lag1 popgrow edu)
estat impact

  (272 observations)
  (272 observations (places) used)
  (weighting matrix defines 272 places)

Spatial autoregressive model                    Number of obs     =        272
GS2SLS estimates                                Wald chi2(9)      =     113.01
                                                Prob > chi2       =     0.0000
                                                Pseudo R2         =     0.1918

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
     project |   .0748499   .0222839     3.36   0.001     .0311742    .1185256
  rgdpc_lag1 |  -.1079625   .3567822    -0.30   0.762    -.8072428    .5913177
     popgrow |  -.0431387   .0664679    -0.65   0.516    -.1734134    .0871361
         edu |  -9.710028   8.242513    -1.18   0.239    -25.86506    6.445001
       _cons |   6