In this module, we go over basic fixed effect (FE) and random effect model (RE). Python's 'linearmodels' package has these functionalities for econometric models dedicated to panel data analysis. 

In [23]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import sklearn
import linearmodels

from statsmodels.tools.tools import add_constant
from statsmodels.datasets import grunfeld
from linearmodels.datasets import jobtraining
from linearmodels.datasets import wage_panel
from linearmodels import PooledOLS
from linearmodels import PanelOLS
from linearmodels import RandomEffects

In [2]:
print(linearmodels.__version__) # we expect version 4.25 for this module

4.25


In [3]:
path="C:\\Users\\gao\\GAO_Jupyter_Notebook\\Datasets"
os.chdir(path)

#path="C:\\Users\\pgao\\Documents\\PGZ Documents\\Programming Workshop\\PYTHON\\Open Courses on Python\\Udemy Course on Python\Introduction to Data Science Using Python\\datasets"
#os.chdir(path)

### I. The Theory of Panel Data

The basic model we want to consider here is $y_{it}=x_{it}'\beta+c_{i}+\epsilon_{it}$ where $d_{i}\alpha=c_{i}$. Our notation follows from Wooldrige (2001), Frees (2004) and Greene (2003), with Wooldrige used as the main source. Here the covariate $x_{it}$ is $K$-dimensional and $t=1,2,...T_{i}$. Here are some exemplary breakdowns of the different types of models:

   1. Pooled OLS: if $y_{it}=x_{it}'\beta+c+\epsilon_{it}$, our estimation can be done by OLS and the estimation will be consistent and efficient. 
   2. Fixed Effect (FE): if $d_{i}$ is unobserved but correlated with the covariates $x_{it}$, then the OLS estimator of $\beta$ will be biased and inconsistent as a consequence of an omitted variable. However in this case, $d_{i}\alpha=c_{i}$ embodies all the observable effects and specifies an estimable conditional mean. This fixed effect approach takes $c_{i}$ to be a group-specific constant (fixed) term in the regression model. It should be noted that the term 'fixed' aas used here signifies the correlation of unobserved heterogeneity and the covariate, not that $c_{i}$ is non-stochastic. In this setup, the covariate $x_{it}$ does not have constant terms. When you include a constant you have to set one of the group coefficients (fixed effects) to zero for identification, which is done by leaving one of the group dummy variables out. The same hold for the case where each group are the group of observations for a particular individual.
   3. Random Effect (RE): if the unobserved individual heterogeneity can be assumed to be uncorrelated with the included covariates, then the model can be written as  $y_{it}=x_{it}'\beta+E(d_{i}\alpha)+(d_{i}\alpha-E(d_{i}\alpha))+\epsilon_{it}=\alpha+x_{it}'\beta+u_{i}+\epsilon_{it}$. That is, as a linear regression model with a compound disturbance that may be consistently, albeit inefficiently estimated by OLS. This random effects approach specifies that $u_{i}$ is a group-specific random element, similar to the $\epsilon_{it}$ except that for each group, there is but a single draw that enters the regression identifically in each period. Notice here that for model notatoinal simplicity, we can be letting the covariates absorb the constant term so that $\alpha$ is absorbed in the covariate. 

#### 1. Balanced Panel - FE

Under the balanced panel data, the one-way fixed effect assumes that $y_{it}=x_{it}'\beta+c_{i}+\epsilon_{it}$  with $E(c_{i}|X_{i})=g(X_{i})$ for some function $g(.)$\. Because the conditional mean is the same in every period, we can write the model as $y_{it}=x_{it}'\beta+g(X_{i})+\epsilon_{it} +(c_{i}-g(X_{i}))=x_{it}'\beta+\alpha_{i}+(\epsilon_{it} +(c_{i}-g(X_{i})))$. By construction, the last term $c_{i}-g(X_{i})$ is uncorrelated with the covariates so the term can be absorbed into $\epsilon_{it}$. This means we can rewrite the model (reparametrizing) as $y_{it}=x_{it}'\beta+\alpha_{i}+\epsilon_{it}$ where $\alpha_{i}$ is a non-stochastic term. Here, each $\alpha_{i}$ is treated as an unknown parameter to be estimated. In this formulation, we assumed that the covariates do not contain the intercept term because adding the intercept term here can introduce multicollinearity. To obtain consistency, the strict exogeneity assumption is assumed: $E(\epsilon_{it}|X_{i},c_{i})=0$. This means the explanatory variables in each time period are uncorrelated with the idiosyncratic error $E(x_{is}\epsilon_{it})=0$ for all $s,t=1,2,...$. This assumption also allows arbitrary correlation between the latent variable and the covariates for all time periods. 

Estimation of FE can be done by the within estimator approach. This means that averaging over time gets rid of the individual specific effects $\alpha_{i}$. OLS estimator in this case then is still consistent under the usual regularity conditions. Another approach is to use the least squares dummy variables (LSDV) by creating a bunch of dummies for the individual effect. But this approach is computationally expensive and usually not recommended in applied work. 

Asymptotically, the FE estimator is not necessarily the most efficient estimator based on strict exogeneity. To ensure efficiency, we need another assumption: $E(\epsilon_{i}\epsilon'_{i}|X_{i}, \alpha_{i})=\sigma^{2}I_{T}$. This assumption is saying that theidiosyncratic errors have a constant variance across time and are serially uncorrelated. Estimation of the $\sigma^{2}$ reqruies some adjustment of degrees of freedom. In most of the applied work, serial correlation can be a problem and is a tough test to perform. So it's quite a conventional practice to use the robust version of the standard error estimation. 

Another way to estimate the FE is to use generalized least squares (GLS) estimation. Wooldrige (2001) has a nice treatment, and the motivation really comes from the standard errors estimation. Under GLS, we assume that $E(\epsilon_{i}\epsilon'_{i}|X_{i}, \alpha_{i})=\Gamma_{T \times T}$. Fesasible GLS can be applied (FGLS). Generally, FGLS is more efficient than pure FE but this conclusion relies on large sample size with fixed time periods. Unfortunately, because FGLS still uses the fixed effects transformation to remove the unobserved heterogeneity, it can have large asymptotic standard errors still. 

Two-way FE models are also applicable in many applied work. The matrix algebra and the theorectical development are complex (c.f. Baltagi (2005)). However, because modern computer programs uniformly allow doznes or even hundreds of regressors, almost any application invovling a second fixed effect can be handled by just including the second effect as a set of dummies, given the time span $T$ is usually way shorter than the number of individual cross-sectional observations. 

For the discussion in this subsection, see Wooldrige (2001) chapter 10.

#### 2. Balanced Panel - RE

For the one-way random effect with balanced panel data, assume $y_{it}=x_{it}'\beta+u_{i}+\epsilon_{it}$ where $u_{i}$ is a one-dimensional random variable. What the random effect model under the strict exogeneity assumption assumes is that $E(\epsilon_{it}|X_{i}, u_{i})=0$. In addition, RE assumes that $E(u_{i}|X_{i})=0$. This is rather a strong assumption in social science. What this says is that the unobserved latent individial characteristics are uncorrleated with all the covariates across all periods. This is why in econometrics and any observational studies without good experimental control, FE is usually the best choice. However, when appropriate, RE yields better results. 

Under the RE assumptions, it's easy to see that $E(u_{i}+\epsilon_{it}|X_{i})=0$. Thus under the strict exogeneity assumption, the estimation of the random effect model can be done through GLS, as we can treat $v_{i}=u_{i}+\epsilon_{it}$. Essentially, the RE approach exploits the serial correlation in the composite error in the GLS framework. So we can rewrite our model as $y_{i}=X_{i}'\beta+v_{i}$ and define $E(v_{i}v'_{i})=\Omega$. In small samples, using unconditional homoscedasticity with no serial correlation can help us simplify our job for estimating standard errors. In large samples, $\Omega$ can be estimated as $\hat{\Omega}=\frac{1}{n}\sum_{i=1}^{n}e^{2}_{i}$ where $e_{i}$ is the pooled OLS residual. 

The two-way RE models are rarely used and are extremely complicated. Because in this assumption, we need even more stringent assumptions to ensure consistency and these assumptions are imposed on both the individual level as well as the time domain. 

For the discussion in this subsection, see Wooldrige (2001) chapter 10.

#### 3. Unbalanced Panel

For unbalanced panels, Woodrige(2001) has a nice treatment in Chapter 17 while Cameron and Trivedi (2005) has a short treatment in Chapter 21. Essentially, as long as the unobserved heterogeneity remains independent of the covariates, the random effect estimator remains consistent. For fixed effect, the unbalanced panel presents very little problem as long as we can rewrite the problem by adding a selector variable (1 if not missing, 0 otherwise) in the data and the strong exogeneity assumption is tweaked. Problems arise when there is sample attrition problem or selection bias. We will not discuss here. 

In general, the fixed effect analysis is more robust than random effects analysis as it allows for correlation between the unobservables and the idiosyncratic errors. However, there is a price that we must pay. Without further assumptions we cannot include time-constant factors in $x_{it}$. This means when analyzing individuals, factors such as gender, or race, cannot be included in $x_{it}$. For analyzing firms, industry cannot be included in teh covariate unless industry designation changes over time for at least some firms. The bottom line is that the covariate must vary across time (some people call the covariates in fixed effect the "time-varying explanatory variables"). We don't need unique covariate each time, but we must have some $x$ changing for some individuals in the sample across time. 

### II. Data Format

The Python package linearmodels (version 4.25) has the ability to perform FE and RE analysis for panel data. For this module, "MultiIndex DataFrames" objects are encouraged to express data. There are other ways but the package doesn't recommend them. 

Below we load the job training data. To turn the existing normal DataFrame object into MultiIndex DataFrame, we will need to use the set_index() method  to declare entity ($i$) and time ($t$):

In [4]:
data1=jobtraining.load()
print(type(data1))
data1.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,year,fcode,employ,sales,avgsal,scrap,rework,tothrs,union,grant,...,grant_1,clscrap,cgrant,clemploy,clsales,lavgsal,clavgsal,cgrant_1,chrsemp,clhrsemp
0,1987,410032,100.0,47000000.0,35000.0,,,12.0,0,0,...,0,,0,,,10.4631,,,,
1,1988,410032,131.0,43000000.0,37000.0,,,8.0,0,0,...,0,,0,0.270027,-0.088949,10.51867,0.05557,0.0,-8.946565,-1.165385
2,1989,410032,123.0,49000000.0,39000.0,,,8.0,0,0,...,0,,0,-0.063013,0.130621,10.57132,0.052644,0.0,0.198597,0.047832
3,1987,410440,12.0,1560000.0,10500.0,,,12.0,0,0,...,0,,0,,,9.25913,,,,
4,1988,410440,13.0,1970000.0,11000.0,,,12.0,0,0,...,0,,0,0.080043,0.233347,9.305651,0.04652,0.0,0.0,0.0


In [5]:
data1=data1.set_index(["fcode", "year"])
print(type(data1))
data1.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Unnamed: 1_level_0,employ,sales,avgsal,scrap,rework,tothrs,union,grant,d89,d88,...,grant_1,clscrap,cgrant,clemploy,clsales,lavgsal,clavgsal,cgrant_1,chrsemp,clhrsemp
fcode,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
410032,1987,100.0,47000000.0,35000.0,,,12.0,0,0,0,0,...,0,,0,,,10.4631,,,,
410032,1988,131.0,43000000.0,37000.0,,,8.0,0,0,0,1,...,0,,0,0.270027,-0.088949,10.51867,0.05557,0.0,-8.946565,-1.165385
410032,1989,123.0,49000000.0,39000.0,,,8.0,0,0,1,0,...,0,,0,-0.063013,0.130621,10.57132,0.052644,0.0,0.198597,0.047832
410440,1987,12.0,1560000.0,10500.0,,,12.0,0,0,0,0,...,0,,0,,,9.25913,,,,
410440,1988,13.0,1970000.0,11000.0,,,12.0,0,0,0,1,...,0,,0,0.080043,0.233347,9.305651,0.04652,0.0,0.0,0.0


We can convert dummies as well:

In [6]:
year_str=data1.reset_index()[['year']].astype(str)
year_cat=pd.Categorical(year_str.iloc[:,0])
year_str.index = data1.index
year_cat.index= data1.index
data1["year_str"]=year_str
data1["year_cat"]=year_cat

print(list(data1.columns), "\n")
print(data1.info())
data1.head()

['employ', 'sales', 'avgsal', 'scrap', 'rework', 'tothrs', 'union', 'grant', 'd89', 'd88', 'totrain', 'hrsemp', 'lscrap', 'lemploy', 'lsales', 'lrework', 'lhrsemp', 'lscrap_1', 'grant_1', 'clscrap', 'cgrant', 'clemploy', 'clsales', 'lavgsal', 'clavgsal', 'cgrant_1', 'chrsemp', 'clhrsemp', 'year_str', 'year_cat'] 

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 471 entries, (410032, 1987) to (419486, 1989)
Data columns (total 30 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   employ    440 non-null    float64 
 1   sales     373 non-null    float64 
 2   avgsal    406 non-null    float64 
 3   scrap     162 non-null    float64 
 4   rework    123 non-null    float64 
 5   tothrs    415 non-null    float64 
 6   union     471 non-null    int64   
 7   grant     471 non-null    int64   
 8   d89       471 non-null    int64   
 9   d88       471 non-null    int64   
 10  totrain   465 non-null    float64 
 11  hrsemp    390 non-null    floa

Unnamed: 0_level_0,Unnamed: 1_level_0,employ,sales,avgsal,scrap,rework,tothrs,union,grant,d89,d88,...,cgrant,clemploy,clsales,lavgsal,clavgsal,cgrant_1,chrsemp,clhrsemp,year_str,year_cat
fcode,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
410032,1987,100.0,47000000.0,35000.0,,,12.0,0,0,0,0,...,0,,,10.4631,,,,,1987,1987
410032,1988,131.0,43000000.0,37000.0,,,8.0,0,0,0,1,...,0,0.270027,-0.088949,10.51867,0.05557,0.0,-8.946565,-1.165385,1988,1988
410032,1989,123.0,49000000.0,39000.0,,,8.0,0,0,1,0,...,0,-0.063013,0.130621,10.57132,0.052644,0.0,0.198597,0.047832,1989,1989
410440,1987,12.0,1560000.0,10500.0,,,12.0,0,0,0,0,...,0,,,9.25913,,,,,1987,1987
410440,1988,13.0,1970000.0,11000.0,,,12.0,0,0,0,1,...,0,0.080043,0.233347,9.305651,0.04652,0.0,0.0,0.0,1988,1988


In [7]:
data1["year_str"].unique()

array(['1987', '1988', '1989'], dtype=object)

In [8]:
example1 = PanelOLS(data1[["lscrap"]], data1[['hrsemp', 'year_str']], entity_effects=True).fit(cov_type='robust') # FE with no intercept term
print(example1) # we see that one dummy year is dropped (1987)

example2 = PanelOLS(data1[["lscrap"]], data1[['hrsemp', 'year_cat']], entity_effects=True).fit(cov_type='robust') # FE with no intercept term
print(example2) # result is the same

                          PanelOLS Estimation Summary                           
Dep. Variable:                 lscrap   R-squared:                        0.1985
Estimator:                   PanelOLS   R-squared (Between):             -0.1240
No. Observations:                 140   R-squared (Within):               0.1985
Date:                Thu, Jun 16 2022   R-squared (Overall):             -0.0934
Time:                        15:12:02   Log-likelihood                   -78.765
Cov. Estimator:                Robust                                           
                                        F-statistic:                      7.3496
Entities:                          48   P-value                           0.0002
Avg Obs:                       2.9167   Distribution:                    F(3,89)
Min Obs:                       1.0000                                           
Max Obs:                       3.0000   F-statistic (robust):             7.9091
                            

Inputs contain missing values. Dropping rows with missing observations.


### III. Case Study - Union Status and Wage

Let's use this module do a project related to the paper by Vella and Verbeek (1998) which explores whether union status will impact wages for young men. 

The dataset consists of wages and characteristics for men during the 1980s. The entity idnetifier is "nr" nd the time identified is "year". Before setting the index, a year Categorical is created which facilitated making dummies. 

In [9]:
data = wage_panel.load()
year = pd.Categorical(data.year)
data.head()

Unnamed: 0,nr,year,black,exper,hisp,hours,married,educ,union,lwage,expersq,occupation
0,13,1980,0,1,0,2672,0,14,0,1.19754,1,9
1,13,1981,0,2,0,2320,0,14,1,1.85306,4,9
2,13,1982,0,3,0,2940,0,14,0,1.344462,9,9
3,13,1983,0,4,0,2960,0,14,0,1.433213,16,9
4,13,1984,0,5,0,3071,0,14,0,1.568125,25,5


In general, we recommend clean the data before turning everything into a MultiIndex DataFrame object. Let's make the problem a bit hard by randomly assigning some missing data:

In [10]:
data.loc[(data['exper']==0), 'lwage']=np.nan

In [11]:
data.isnull().sum()

nr            0
year          0
black         0
exper         0
hisp          0
hours         0
married       0
educ          0
union         0
lwage         2
expersq       0
occupation    0
dtype: int64

In [12]:
print(data.groupby(['year']).size())

year
1980    545
1981    545
1982    545
1983    545
1984    545
1985    545
1986    545
1987    545
dtype: int64


In [13]:
mdf = data.set_index(["nr", "year"])
mdf["year"]= year
print(wage_panel.DESCR)
mdf.head()


F. Vella and M. Verbeek (1998), "Whose Wages Do Unions Raise? A Dynamic Model
of Unionism and Wage Rate Determination for Young Men," Journal of Applied
Econometrics 13, 163-183.

nr                       person identifier
year                     1980 to 1987
black                    =1 if black
exper                    labor market experience
hisp                     =1 if Hispanic
hours                    annual hours worked
married                  =1 if married
educ                     years of schooling
union                    =1 if in union
lwage                    log(wage)
expersq                  exper^2
occupation               Occupation code



Unnamed: 0_level_0,Unnamed: 1_level_0,black,exper,hisp,hours,married,educ,union,lwage,expersq,occupation,year
nr,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
13,1980,0,1,0,2672,0,14,0,1.19754,1,9,1980
13,1981,0,2,0,2320,0,14,1,1.85306,4,9,1981
13,1982,0,3,0,2940,0,14,0,1.344462,9,9,1982
13,1983,0,4,0,2960,0,14,0,1.433213,16,9,1983
13,1984,0,5,0,3071,0,14,0,1.568125,25,5,1984


let's train 3 models: pooled OLS, one way FE, two-way FE and RE:

In [26]:
exog_vars=['black','exper','hisp','hours','married','educ','union','expersq','occupation','year']
exog = add_constant(mdf[exog_vars])
pooledOLS = PooledOLS(mdf.lwage, exog)
M1 = pooledOLS.fit(cov_type='robust')
print(M1)

                          PooledOLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                        0.2142
Estimator:                  PooledOLS   R-squared (Between):              0.2504
No. Observations:                4358   R-squared (Within):               0.1723
Date:                Thu, Jun 16 2022   R-squared (Overall):              0.2142
Time:                        15:21:04   Log-likelihood                   -2913.6
Cov. Estimator:                Robust                                           
                                        F-statistic:                      73.935
Entities:                         545   P-value                           0.0000
Avg Obs:                       7.9963   Distribution:                 F(16,4341)
Min Obs:                       7.0000                                           
Max Obs:                       8.0000   F-statistic (robust):             76.295
                            

In [37]:
RE = RandomEffects(mdf.lwage, exog)
M2 = RE.fit(cov_type='robust')
print(M2)

                        RandomEffects Estimation Summary                        
Dep. Variable:                  lwage   R-squared:                        0.2006
Estimator:              RandomEffects   R-squared (Between):              0.1942
No. Observations:                4358   R-squared (Within):               0.2014
Date:                Thu, Jun 16 2022   R-squared (Overall):              0.1974
Time:                        16:42:29   Log-likelihood                   -1582.9
Cov. Estimator:                Robust                                           
                                        F-statistic:                      68.073
Entities:                         545   P-value                           0.0000
Avg Obs:                       7.9963   Distribution:                 F(16,4341)
Min Obs:                       7.0000                                           
Max Obs:                       8.0000   F-statistic (robust):             64.026
                            

In [38]:
FE1 = PanelOLS(mdf.lwage, exog, entity_effects=True, drop_absorbed=True)
M3 = FE1.fit(cov_type='robust')
print(M3)

                          PanelOLS Estimation Summary                           
Dep. Variable:                  lwage   R-squared:                        0.2031
Estimator:                   PanelOLS   R-squared (Between):             -0.0574
No. Observations:                4358   R-squared (Within):               0.2031
Date:                Thu, Jun 16 2022   R-squared (Overall):              0.0633
Time:                        16:42:29   Log-likelihood                   -1263.7
Cov. Estimator:                Robust                                           
                                        F-statistic:                      80.750
Entities:                         545   P-value                           0.0000
Avg Obs:                       7.9963   Distribution:                 F(12,3801)
Min Obs:                       7.0000                                           
Max Obs:                       8.0000   F-statistic (robust):             70.703
                            

In [33]:
exog_vars=['black','exper','hisp','hours','married','educ','union','expersq','occupation']
FE2 = PanelOLS(mdf.lwage, exog, entity_effects=True, time_effects=True, drop_absorbed=True)
M4 = FE2.fit(cov_type='robust')
print(M4)

                          PanelOLS Estimation Summary                           
Dep. Variable:                  lwage   R-squared:                        0.0488
Estimator:                   PanelOLS   R-squared (Between):             -0.0547
No. Observations:                4358   R-squared (Within):              -0.6753
Date:                Thu, Jun 16 2022   R-squared (Overall):             -0.3418
Time:                        15:24:30   Log-likelihood                   -1263.7
Cov. Estimator:                Robust                                           
                                        F-statistic:                      38.994
Entities:                         545   P-value                           0.0000
Avg Obs:                       7.9963   Distribution:                  F(5,3801)
Min Obs:                       7.0000                                           
Max Obs:                       8.0000   F-statistic (robust):             32.469
                            

Variables have been fully absorbed and have removed from the regression:

black, exper, hisp, educ, year.1981, year.1982, year.1983, year.1984, year.1985, year.1986, year.1987



In [34]:
from linearmodels.panel import compare

print(compare({"PooledOLS": M1, "RE": M2, "FE One Way": M3, "FE Two Way": M4}))

                                  Model Comparison                                  
                             PooledOLS                RE    FE One Way    FE Two Way
------------------------------------------------------------------------------------
Dep. Variable                    lwage             lwage         lwage         lwage
Estimator                    PooledOLS     RandomEffects      PanelOLS      PanelOLS
No. Observations                  4358              4358          4358          4358
Cov. Est.                       Robust            Robust        Robust        Robust
R-squared                       0.2142            0.2006        0.2031        0.0488
R-Squared (Within)              0.1723            0.2014        0.2031       -0.6753
R-Squared (Between)             0.2504            0.1942       -0.0574       -0.0547
R-Squared (Overall)             0.2142            0.1974        0.0633       -0.3418
F-statistic                     73.935            68.073        8

### References:

   - https://stats.stackexchange.com/questions/465951/the-definition-of-a-constant-term-in-a-fixed-effects-model#:~:text=When%20you%20include%20a%20constant,observations%20for%20a%20particular%20individual.
   - https://stackoverflow.com/questions/53439133/python-pandas-balance-an-unbalanced-dataset-for-panel-analysis
   