# 固定効果モデル

## パネルデータ

異なる時点で同じ個体についての情報を観測したデータを**パネルデータ** （panel data）という。

パネルデータを分析する際は個体差による効果や時点ごとの固有の効果の影響を除く必要があり、そうしたことができる代表的な分析手法が**固定効果モデル**（fixed effect model）である

In [6]:
import statsmodels.api as sm
import statsmodels.formula.api as smf

data = sm.datasets.get_rdataset("Grunfeld", package="plm").data
data

# Grunfeldは1935～1954年にかけてのアメリカの10の企業のbalanced panelデータ
# firm: 企業ID
# inv: 投資総額
# value: 企業価値
# capital: 資本ストック

Unnamed: 0,firm,year,inv,value,capital
0,1,1935,317.60,3078.50,2.80
1,1,1936,391.80,4661.70,52.60
2,1,1937,410.60,5387.10,156.90
3,1,1938,257.70,2792.20,209.20
4,1,1939,330.80,4313.20,203.40
...,...,...,...,...,...
195,10,1950,3.42,69.05,8.74
196,10,1951,4.67,83.04,9.07
197,10,1952,6.00,74.42,9.93
198,10,1953,6.53,63.51,11.68


## Pooled OLS

パネルデータ分析において、固定効果を特に考慮しないで通常の重回帰モデルを用いたモデルを**Pooled OLS**と呼ぶ。


## one-way fixed effect model

時点や個体など、1つの固定効果に対処するモデル。一元配置固定効果モデル（one-way fixed effect model）などと呼ばれる。

### 個体固定効果モデル

以下のような個体固定効果モデルを考える。

$$
Y_{it} = \beta_0 + \beta_1 X_{it} +\theta_i + \varepsilon_{it}
$$

パネルデータを用いることができる場合、以下の3つの方法によって個体固定効果（entity fixed effects）$\theta_i$を除去することができる。

#### (1) "一回の階差モデル（first difference model）"によるOLS推定

$$
(Y_{i,t+1} - Y_{it}) = (\beta_0 - \beta_0) + \beta_1 (X_{i,t+1}-X_{it}) +  (\varepsilon_{i,t+1} - \varepsilon_{it})
$$

記号を置き換えて、

$$
\Delta Y_{it} = \beta_1 \Delta X_{it}+ \Delta \varepsilon_{it}
$$

- 推定方法：
  1. 説明変数、被説明変数それぞれ$t+1$期から$t$期を引く
  2. 上の式をOLS推定する

In [97]:
import pandas as pd
data = sm.datasets.get_rdataset("Grunfeld", package="plm").data

deltas = []
data = data.sort_values(["firm", "year"])
for firm in data["firm"].unique():
    d = data.query(f"firm == {firm}").copy()
    delta = d - d.shift(1)
    delta["year"] = d["year"]
    delta["firm"] = firm
    deltas.append(delta)
delta = pd.concat(deltas).dropna().sort_values("firm").reset_index(drop=True)
delta

Unnamed: 0,firm,year,inv,value,capital
0,1,1936,74.20,1583.20,49.80
1,1,1954,182.30,-648.10,449.00
2,1,1953,413.20,1316.80,346.80
3,1,1952,135.30,91.90,222.80
4,1,1951,113.00,1077.40,108.70
...,...,...,...,...,...
185,10,1937,0.19,-5.74,-0.14
186,10,1936,-0.54,17.03,0.21
187,10,1953,0.53,-10.91,1.75
188,10,1944,0.25,-0.42,-0.17


In [103]:
first_diff = sm.OLS.from_formula(formula="value ~ -1 + inv + capital", data=delta).fit()
first_diff.summary()

0,1,2,3
Dep. Variable:,value,R-squared (uncentered):,0.389
Model:,OLS,Adj. R-squared (uncentered):,0.383
Method:,Least Squares,F-statistic:,59.85
Date:,"Sun, 28 May 2023",Prob (F-statistic):,7.69e-21
Time:,03:03:13,Log-Likelihood:,-1351.2
No. Observations:,190,AIC:,2706.0
Df Residuals:,188,BIC:,2713.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
inv,4.3070,0.398,10.816,0.000,3.521,5.092
capital,-1.5319,0.339,-4.517,0.000,-2.201,-0.863

0,1,2,3
Omnibus:,60.467,Durbin-Watson:,2.603
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1184.992
Skew:,-0.564,Prob(JB):,4.81e-258
Kurtosis:,15.182,Cond. No.,1.38


#### (2) “$n-1$個のダミー説明変数”を用いたOLS推定

最小二乗ダミー変数推定（Least Squares Dummy Variables (LSDV) 推定）とも呼ばれる。

$$
Y_{it} = \beta_0 + \beta_1 X_{it} + \gamma_2 D2_i + \cdots + \gamma_n Dn_i + \varepsilon_{it}  \\
  \text{where } D2_i = 
  \begin{cases}
  1 & \text{for } i = 2\\
  0 & \text{otherwise}
  \end{cases} \text{, etc.}
$$

- 推定方法：
  1. 個体ダミー変数（個体$i$に該当する場合に1、それ以外は0となるダミー変数）$D2_i, \cdots, Dn_i$を作成する
  2. 上の式をOLS推定する

In [100]:
data = sm.datasets.get_rdataset("Grunfeld", package="plm").data

lsdv = sm.OLS.from_formula(
    formula="value ~ inv + capital + year",
    data=data.assign(year = data["year"].astype("category")) # category型にすれば自動でダミー変数にしてくれる
).fit()
lsdv.summary()

0,1,2,3
Dep. Variable:,value,R-squared:,0.76
Model:,OLS,Adj. R-squared:,0.732
Method:,Least Squares,F-statistic:,26.87
Date:,"Sun, 28 May 2023",Prob (F-statistic):,2.44e-44
Time:,03:00:34,Log-Likelihood:,-1576.7
No. Observations:,200,AIC:,3197.0
Df Residuals:,178,BIC:,3270.0
Df Model:,21,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,317.1238,215.870,1.469,0.144,-108.870,743.117
year[T.1936],213.4092,304.447,0.701,0.484,-387.380,814.198
year[T.1937],377.3396,304.577,1.239,0.217,-223.707,978.386
year[T.1938],135.6454,304.916,0.445,0.657,-466.070,737.361
year[T.1939],356.8702,305.146,1.170,0.244,-245.300,959.040
year[T.1940],227.5702,304.929,0.746,0.456,-374.171,829.312
year[T.1941],38.1648,305.137,0.125,0.901,-563.987,640.316
year[T.1942],-65.0128,305.643,-0.213,0.832,-668.163,538.138
year[T.1943],79.6332,305.908,0.260,0.795,-524.039,683.306

0,1,2,3
Omnibus:,36.735,Durbin-Watson:,0.326
Prob(Omnibus):,0.0,Jarque-Bera (JB):,63.268
Skew:,0.951,Prob(JB):,1.83e-14
Kurtosis:,4.993,Cond. No.,9650.0


In [101]:
data = sm.datasets.get_rdataset("Grunfeld", package="plm").data

lsdv = sm.OLS.from_formula(
    formula="value ~ inv + capital + firm",
    data=data.assign(firm = data["firm"].astype("category")) # category型にすれば自動でダミー変数にしてくれる
).fit()
lsdv.summary()

0,1,2,3
Dep. Variable:,value,R-squared:,0.961
Model:,OLS,Adj. R-squared:,0.958
Method:,Least Squares,F-statistic:,415.7
Date:,"Sun, 28 May 2023",Prob (F-statistic):,1.6000000000000002e-125
Time:,03:00:51,Log-Likelihood:,-1396.3
No. Observations:,200,AIC:,2817.0
Df Residuals:,188,BIC:,2856.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2926.5580,138.597,21.116,0.000,2653.152,3199.964
firm[T.2],-1977.3704,92.074,-21.476,0.000,-2159.000,-1795.740
firm[T.3],-1074.1620,154.332,-6.960,0.000,-1378.606,-769.718
firm[T.4],-2417.7546,138.428,-17.466,0.000,-2690.826,-2144.683
firm[T.5],-2624.3942,172.507,-15.213,0.000,-2964.691,-2284.097
firm[T.6],-2611.9931,144.390,-18.090,0.000,-2896.826,-2327.160
firm[T.7],-2752.7584,161.271,-17.069,0.000,-3070.893,-2434.624
firm[T.8],-2334.6595,146.157,-15.974,0.000,-2622.978,-2046.341
firm[T.9],-2561.2563,161.357,-15.873,0.000,-2879.559,-2242.954

0,1,2,3
Omnibus:,38.057,Durbin-Watson:,1.562
Prob(Omnibus):,0.0,Jarque-Bera (JB):,212.86
Skew:,0.515,Prob(JB):,6e-47
Kurtosis:,7.948,Cond. No.,10400.0


#### (3) ”平均差分法（Entity-demeaned）”を用いたOLS推定

$$
\begin{align}
\tilde{Y}_{it} &= \beta_1 \tilde{X}_{it} + \tilde{\varepsilon}_{it}, \\
\text{where }
\tilde{Y}_{it} &= Y_{it} - \bar{Y}_i, \hspace{1em}   \bar{Y}_i = \frac{1}{T} \sum^T_{t=1} Y_{it}\\
\tilde{X}_{it} &= X_{it} - \bar{X}_i, \hspace{1em} \bar{X}_i  = \frac{1}{T} \sum^T_{t=1} X_{it}\\
\tilde{\varepsilon}_{it} &= \varepsilon_{it}- \bar{\varepsilon}_i, \hspace{1em} \bar{\varepsilon}_i = \frac{1}{T} \sum_{t=1}^T \varepsilon_{it}
\end{align}
$$

- 推定方法：
  1. 説明変数・被説明変数について、変数から期間平均を引く
  2. 上の式をOLS推定する
- $n-1$個の個体ダミー説明変数による推定と同じ推定値が得られる
- 統計ソフトでは通常は平均差分法による推定が行われる

In [133]:
data = sm.datasets.get_rdataset("Grunfeld", package="plm").data
group = "year"

rows = []
for _, d in data.groupby(group):
    for col in ["value", "inv", "capital"]:
        d[col] = (d[col] - d[col].mean())
    rows.append(d)
df = pd.concat(rows)

entity_demeaned = sm.OLS.from_formula(formula="value ~ -1 + inv + capital", data=df).fit()
entity_demeaned.summary()

0,1,2,3
Dep. Variable:,value,R-squared (uncentered):,0.755
Model:,OLS,Adj. R-squared (uncentered):,0.752
Method:,Least Squares,F-statistic:,304.7
Date:,"Sun, 28 May 2023",Prob (F-statistic):,3.7099999999999996e-61
Time:,07:35:02,Log-Likelihood:,-1576.7
No. Observations:,200,AIC:,3157.0
Df Residuals:,198,BIC:,3164.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
inv,5.6215,0.289,19.456,0.000,5.052,6.191
capital,-0.2984,0.238,-1.256,0.210,-0.767,0.170

0,1,2,3
Omnibus:,36.735,Durbin-Watson:,2.517
Prob(Omnibus):,0.0,Jarque-Bera (JB):,63.268
Skew:,0.951,Prob(JB):,1.83e-14
Kurtosis:,4.993,Cond. No.,2.25


In [124]:
md = smf.mixedlm("value ~ inv + capital", data, groups=data["year"])
mdf = md.fit()
print(mdf.summary())


          Mixed Linear Model Regression Results
Model:            MixedLM Dependent Variable: value      
No. Observations: 200     Method:             REML       
No. Groups:       20      Scale:              444235.8454
Min. group size:  10      Log-Likelihood:     -1579.1760 
Max. group size:  10      Converged:          Yes        
Mean group size:  10.0                                   
---------------------------------------------------------
             Coef.  Std.Err.   z    P>|z|  [0.025  0.975]
---------------------------------------------------------
Intercept   410.816   49.405  8.315 0.000 313.983 507.648
inv           5.760    0.279 20.671 0.000   5.214   6.306
capital      -0.615    0.083 -7.406 0.000  -0.778  -0.452
Group Var     0.000                                      





## two-way effect model

時点効果＋個体効果 といった2つの効果を同時に固定する


$$
Y_{it} =\beta_1 X_{it} +\theta_i + \pi_t + \varepsilon_{it}
$$

個体の固定効果$\theta_i$と時間の固定効果$\pi_t$の両方を除去したい場合は、それぞれの推定方法の組み合わせになる。

1. $n-1$個の個体ダミー変数と$T-1$個の時間ダミー変数を用いたOLS推定
2. entity demeaningと$T-1$個の時間ダミー変数を用いたOLS推定
3. time demeaningと$n-1$個の個体ダミー変数を用いたOLS推定
4. entity & time demeaningを用いたOLS推定
     - 説明変数と被説明変数について、個体と時間両方の平均を引いてOLS推定
  
なお、パネルデータを活用した計量経済分析では、時間固定効果がないと仮定できるケースはまれであるため、通常はone-way固定効果モデルではなくtwo-way固定効果モデルを用いる。


In [136]:
data = sm.datasets.get_rdataset("Grunfeld", package="plm").data

rows = []
for group in ["year", "firm"]:
    for _, d in data.groupby(group):
        for col in ["value", "inv", "capital"]:
            d[col] = (d[col] - d[col].mean())
    rows.append(d)
df = pd.concat(rows)

entity_demeaned = sm.OLS.from_formula(formula="value ~ -1 + inv + capital", data=df).fit()
entity_demeaned.summary()

0,1,2,3
Dep. Variable:,value,R-squared (uncentered):,0.868
Model:,OLS,Adj. R-squared (uncentered):,0.858
Method:,Least Squares,F-statistic:,91.94
Date:,"Sun, 28 May 2023",Prob (F-statistic):,4.95e-13
Time:,09:04:43,Log-Likelihood:,-217.22
No. Observations:,30,AIC:,438.4
Df Residuals:,28,BIC:,441.2
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
inv,2.5459,0.667,3.818,0.001,1.180,3.912
capital,0.7689,0.480,1.603,0.120,-0.214,1.751

0,1,2,3
Omnibus:,27.147,Durbin-Watson:,2.396
Prob(Omnibus):,0.0,Jarque-Bera (JB):,76.607
Skew:,1.709,Prob(JB):,2.32e-17
Kurtosis:,10.043,Cond. No.,5.18
