(주의) 본 매뉴얼에서 Stata 기능은 라이선스 이슈로 아직까지는 BIDAS 환경에서 사용할 수 없다. 매뉴얼에서 stata 관련 코드는 주석처리하였다. 로컬환경(내부망, 인터넷망)에서 활용하는 경우 주석해제 하여 사용할 수 있다.

# BOK Stata 패키지
- 작성자: 고려대학교 경제학과 한치록 교수

## 추정

`bok_da` 라이브러리에 구현된 `cstata` 모듈을 사용하여 PyStata를 손쉽게 이용할 수 있다. 예를 들어 Stata의 `xtdpdsys`를 이용한 시스템 GMM 추정은 다음과 같이 수행된다.

In [1]:
from bok_da.stata import Stata

In [1]:
# stata = Stata('/Applications/Stata', 'mp')
# stata.get_ready() # we need this
# stata.run('use abdata, clear')
# stata.run('xtdpdsys n l(0/1).w l(0/2).(k ys) yr1980-yr1984, lags(2) twostep vce(robust)')

. use abdata, clear
. xtdpdsys n l(0/1).w l(0/2).(k ys) yr1980-yr1984, lags(2) twostep vce(robust)

System dynamic panel-data estimation            Number of obs     =        751
Group variable: id                              Number of groups  =        140
Time variable: year
                                                Obs per group:
                                                              min =          5
                                                              avg =   5.364286
                                                              max =          7

Number of instruments =     48                  Wald chi2(15)     =    1449.65
                                                Prob > chi2       =     0.0000
Two-step results
------------------------------------------------------------------------------
             |              WC-robust
           n | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+------------------------------------

## 결과 리턴

`Stata` 클래스의 `ereturn` 프로퍼티가 추정 결과(Stata의 `ereturn`)의 dictionary를 리턴해 준다. 단, `Stata.ereturn` 프로퍼티는 Stata의 `ereturn`뿐 아니라 변수명을 나타내는 "label" key도 제공한다.

In [2]:
# stata.ereturn.keys()

dict_keys(['rank', 'sig2', 'rss', 'N', 'N_g', 'g_min', 'g_max', 'g_avg', 't_min', 't_max', 'chi2', 'df_m', 'zrank', 'artests', 'arm1', 'arm2', 'cmdline', 'cmd', 'engine', 'estat_cmd', 'marginsok', 'predict', 'depvar', 'transform', 'hascons', 'tvar', 'ivar', 'lgmmiv_lag', 'lgmmiv_vars', 'dgmmiv_llag', 'dgmmiv_flag', 'dgmmiv_vars', 'liv_olvars', 'div_odvars', 'datasignaturevars', 'datasignature', 'system', 'vce', 'vcetype', 'twostep', 'properties', 'b', 'V', 'labels'])

Stata의 `return`은 `bok.cstata.Stata.rreturn` property, Stata의 `sreturn`은 `bok.cstata.Stata.sreturn` property를 사용한다.

Stata의 모든 리턴값들은 `bok.cstata.Stata.returns` 프로퍼티써로 얻는다. 이것은 Stata의 ereturn, return, sreturn의 모두 리턴한다(클래스 `types.SimpleNamespace`). 다음과 같이 하여 ereturn, return, sreturn을 얻을 수 있다.

In [3]:
# myret = stata.returns  # myret.e, myret.r, myret.s # types.SimpleNamespace
# myret.e['df_m'], myret.e['N_g']

(15.0, 140.0)

## 계수와 분산공분산 행렬 추정값의 리턴

Stata의 `e(b)`와 `e(V)`는 특히 유용한 정보이며, `bok.cstata.Stata` 클래스의 `get_b`와 `get_V` 메쏘드를 이용하여 받을 수 있다. 이때 `pandas` `Series` 혹은 `DataFrame` 형태로 리턴된다.

In [4]:
# stata.get_b()

Unnamed: 0,L.n,L2.n,w,L.w,k,L.k,L2.k,ys,L.ys,L2.ys,yr1980,yr1981,yr1982,yr1983,yr1984,_cons
0,0.976745,-0.083665,-0.563122,0.567323,0.284928,-0.087607,-0.096145,0.613859,-0.765499,0.114054,0.009473,-0.024805,-0.030371,-0.009715,-0.021445,0.324696


In [5]:
# stata.get_V()

Unnamed: 0,L.n,L2.n,w,L.w,k,L.k,L2.k,ys,L.ys,L2.ys,yr1980,yr1981,yr1982,yr1983,yr1984,_cons
L.n,0.02011,-0.002214,0.000248,0.008722,-0.000708,-0.006291,-0.002137,0.003685,-0.007614,0.001542,0.001065,0.001066,0.001442,0.002267,0.002091,-0.041131
L2.n,-0.002214,0.001758,-0.001165,-0.000475,0.000214,0.000571,-0.000216,0.000143,0.001097,-0.002085,-3.3e-05,5.8e-05,1.8e-05,-0.000195,-8.4e-05,0.009884
w,0.000248,-0.001165,0.022837,-0.026508,-0.000192,0.001281,-0.000123,-0.012787,0.02755,-0.005954,0.000316,-0.000446,0.001482,0.001643,0.001259,-0.02949
L.w,0.008722,-0.000475,-0.026508,0.045094,0.002033,-0.005736,-0.001014,0.017392,-0.043623,0.011116,-1.6e-05,0.000708,-0.002114,-0.002622,-0.002148,0.003265
k,-0.000708,0.000214,-0.000192,0.002033,0.004469,-0.003973,2.8e-05,0.000254,-0.001346,0.002153,0.000335,0.000485,0.000634,0.000396,0.000273,-0.009918
L.k,-0.006291,0.000571,0.001281,-0.005736,-0.003973,0.007591,-0.000554,-0.003666,0.005951,-0.001209,-0.000796,-0.001094,-0.000919,-0.000922,-0.000877,0.01667
L2.k,-0.002137,-0.000216,-0.000123,-0.001014,2.8e-05,-0.000554,0.001879,0.000869,-0.000411,-0.001163,-6.4e-05,-6e-06,-0.000412,-0.000524,-0.000453,0.010156
ys,0.003685,0.000143,-0.012787,0.017392,0.000254,-0.003666,0.000869,0.031723,-0.026056,-0.007984,0.001626,0.003599,0.00114,-0.000233,-0.000572,-0.008382
L.ys,-0.007614,0.001097,0.02755,-0.043623,-0.001346,0.005951,-0.000411,-0.026056,0.061013,-0.017598,-0.000349,-0.00023,0.003373,0.003693,0.002769,-0.023891
L2.ys,0.001542,-0.002085,-0.005954,0.011116,0.002153,-0.001209,-0.001163,-0.007984,-0.017598,0.029777,-0.000947,-0.002712,-0.001763,-7.6e-05,0.000517,-0.034779


In [6]:
# stata = Stata('/Applications/Stata', 'mp')
# stata.get_ready() # we need this
# stata.run('use abdata, clear')
# stata.run('xtdpdsys n l(0/1).w l(0/2).(k ys) yr1980-yr1984, lags(2) twostep vce(robust)')
# eret = stata.ereturn
# myret = stata.returns  # myret.e, myret.r, myret.s
# myret.e['df_m'], myret.e['N_g']

. use abdata, clear
. xtdpdsys n l(0/1).w l(0/2).(k ys) yr1980-yr1984, lags(2) twostep vce(robust)

System dynamic panel-data estimation            Number of obs     =        751
Group variable: id                              Number of groups  =        140
Time variable: year
                                                Obs per group:
                                                              min =          5
                                                              avg =   5.364286
                                                              max =          7

Number of instruments =     48                  Wald chi2(15)     =    1449.65
                                                Prob > chi2       =     0.0000
Two-step results
------------------------------------------------------------------------------
             |              WC-robust
           n | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+------------------------------------

(15.0, 140.0)