# 패널 데이터 요약통계량
- 작성자: 고려대학교 경제학과 한치록 교수

계량경제학에서 패널데이터 분석 도구는 [Stata]가 거의 표준으로 자리잡았다. 파이썬의 [Statsmodels][sm]와 [linearmodels][lm]에서도 어느 정도 분석도구를 제공한다. 다만 [Stata]와 용어, 처리방식 등이 달라 상당한 혼란이 있을 수 있다. 또한 [linearmodels][lm]의 formula 처리 방식이 [Statsmodels][sm] 처리방식과 다르다(예: 절편)는 점도 혼란을 야기할 수 있다. 본 `bok_da` 라이브러리 내 `paneldata.py` 모듈에서는 최대한 Stata와 유사한 방식으로 패널데이터 분석을 수행한다.

[Stata]: https://stata.com/
[sm]: https://www.statsmodels.org/
[lm]: https://bashtage.github.io/linearmodels/

#### **(주의) 본 매뉴얼에서 Stata 기능은 라이선스 이슈로 아직까지는 BIDAS 환경에서 사용할 수 없다. 매뉴얼에서 stata 관련 코드는 주석처리하였다. 로컬환경(내부망, 인터넷망)에서 활용하는 경우 주석해제 하여 사용할 수 있다.**

In [1]:
import bok_da as bd
from bok_da.panel.linear_model import PanelData

In [2]:
xt = PanelData()
xt.use('data/nlswork.dta')
xt.data

Unnamed: 0,idcode,year,birth_yr,age,race,msp,nev_mar,grade,collgrad,not_smsa,...,south,ind_code,occ_code,union,wks_ue,ttl_exp,tenure,hours,wks_work,ln_wage
0,1,70,51,18.0,Black,0.0,1.0,12.0,0,0.0,...,0.0,6.0,3.0,,2.0,1.083333,0.083333,20.0,27.0,1.451214
1,1,71,51,19.0,Black,1.0,0.0,12.0,0,0.0,...,0.0,4.0,6.0,,22.0,1.275641,0.083333,44.0,10.0,1.028620
2,1,72,51,20.0,Black,1.0,0.0,12.0,0,0.0,...,0.0,4.0,6.0,1.0,0.0,2.256410,0.916667,40.0,51.0,1.589977
3,1,73,51,21.0,Black,1.0,0.0,12.0,0,0.0,...,0.0,4.0,6.0,,0.0,2.314102,0.083333,40.0,3.0,1.780273
4,1,75,51,23.0,Black,1.0,0.0,12.0,0,0.0,...,0.0,5.0,6.0,,0.0,2.775641,0.166667,10.0,24.0,1.777012
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28529,5159,80,44,35.0,Black,0.0,0.0,12.0,0,0.0,...,1.0,11.0,3.0,1.0,,5.000000,5.000000,39.0,98.0,1.784807
28530,5159,82,44,37.0,Black,0.0,0.0,12.0,0,0.0,...,1.0,11.0,3.0,0.0,,7.000000,7.000000,38.0,98.0,1.871802
28531,5159,83,44,38.0,Black,0.0,0.0,12.0,0,0.0,...,1.0,11.0,3.0,1.0,0.0,8.076923,8.000000,38.0,56.0,1.843853
28532,5159,85,44,40.0,Black,0.0,0.0,12.0,0,0.0,...,1.0,11.0,3.0,1.0,0.0,9.076923,0.000000,40.0,52.0,1.799792


In [3]:
xt.xtset('idcode', 'year')

In [4]:
xt.xtsum(['hours', 'union'])

{'hours': {'N': 28467,
  'n': 4710,
  'Tbar': 6.043949044585987,
  'mean': 36.559560192503604,
  'sd': 9.869449877206327,
  'min': 1.0,
  'max': 168.0,
  'sd_b': 7.845752246665913,
  'min_b': 1.0,
  'max_b': 83.5,
  'sd_w': 7.520580046715213,
  'min_w': -2.1547255217821117,
  'max_w': 130.0595601925036},
 'union': {'N': 19238,
  'n': 4150,
  'Tbar': 4.635662650602409,
  'mean': 0.23443185362303773,
  'sd': 0.4236431984936196,
  'min': 0.0,
  'max': 1.0,
  'sd_b': 0.334140036333863,
  'min_b': 0.0,
  'max_b': 1.0,
  'sd_w': 0.26685523448044257,
  'min_w': -0.6822348130436289,
  'max_w': 1.1510985202897044}}

# Stata 인터페이스

Stata 인터페이스를 사용하면 간편하다. Stata 인터페이스 사용을 위해서는 다음 패키지를 설치한다.

```
pip install pystata stata_setup
```

설치 완료 후 자세한 내용에 대해서는 `8-01, 8-02` 또는 `9-01, 9-02` 매뉴얼을 참조하라. 간단한 내용을 이하에 제시한다.

In [4]:
# from bok.stata import Stata

# st = Stata("/Applications/Stata", "mp").get_ready()
# st.run('use nlswork, clear')
# st.run('xtset')
# st.run('xtsum hours union')

. use nlswork, clear
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)
. xtset

Panel variable: idcode (unbalanced)
 Time variable: year, 68 to 88, but with gaps
         Delta: 1 unit
. xtsum hours union

Variable         |      Mean   Std. dev.       Min        Max |    Observations
-----------------+--------------------------------------------+----------------
hours    overall |  36.55956   9.869623          1        168 |     N =   28467
         between |             7.846585          1       83.5 |     n =    4710
         within  |             7.520712  -2.154726   130.0596 | T-bar = 6.04395
                 |                                            |
union    overall |  .2344319   .4236542          0          1 |     N =   19238
         between |             .3341803          0          1 |     n =    4150
         within  |             .2668622  -.6822348   1.151099 | T-bar = 4.63566
