<a href="https://colab.research.google.com/github/kerryback/2022-BUSI520/blob/main/Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
# uncomment and execute the following if necessary

# !pip install linearmodels
# !pip install pystout

These examples are taken from Kevin Sheppard's user guide for the linearmodels package (https://bashtage.github.io/linearmodels/index.html).  

In [30]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pandas as pd
import pystout as pystout
from linearmodels.panel import PooledOLS
from linearmodels.panel import PanelOLS
from linearmodels.iv import IV2SLS
from linearmodels.system import SUR


### Example data

In [None]:
from linearmodels.datasets import wage_panel

data = wage_panel.load()
year = pd.Categorical(data.year)
data = data.set_index(["nr", "year"])
data["year"] = year

# print(wage_panel.DESCR)
# data.head()

### Pooled OLS with White standard errors

In [10]:
model1 = PooledOLS.from_formula(
    """
    lwage ~ black
    + hisp
    + exper
    + expersq
    + married
    + educ
    + union
    + year
    """,
    data=data
)
result1 = model1.fit(cov_type="robust")

# print(result1)

### Fixed effects and clustered standard errors

Here it is important that the data be in a multi-indexed data frame, with the entity (firm or person or ...) as the outside index and time as the inside index.

In [16]:
from linearmodels.panel import PanelOLS

# entity fixed effects
model2 = PanelOLS.from_formula(
    """
    lwage ~ expersq
    + married
    + union
    + EntityEffects
    """,
    data=data,
)
result2 = model2.fit(cov_type="clustered", cluster_entity=True)

# time fixed effects
model3 = PanelOLS.from_formula(
    """
    lwage ~ black
    + hisp
    + exper
    + expersq
    + married
    + educ
    + union
    + TimeEffects
    """,
    data=data,
)
result3 = model3.fit(cov_type="clustered", cluster_time=True)

# time and entity fixed effects
model4 = PanelOLS.from_formula(
    """
    lwage ~ expersq
    + married
    + union
    + EntityEffects
    + TimeEffects
    """,
    data=data,
)
result4 = model4.fit(cov_type="clustered", cluster_entity=True, cluster_time=True)

In [17]:
pystout(
    models=[result1, result2, result3, result4],
    file="table.tex",
       exogvars=[
        'union', 
        'married',
        'expersq',
        'exper', 
        'black',
        'hisp',
        'educ'
        ],
    stars={0.1: "*", 0.05: "**", 0.01: "***"},
    addnotes=[
        "(1): time dummy variables, White standard errors",
        "(2): entity fixed effects, standard errors clustered by entity",
        "(3): time fixed effects, standard errors clustered by year",
        "(4): entity and time fixed effects, two-way clustered standard errors",
        "$^*p<0.1$, $^{**}p<0.05$, $^{***}p<0.01$",
        ],
    modstat={"nobs": "Obs"},
    title="Log Wages",
    label="tab:wage"
    )

### New example data

In [20]:
from linearmodels.datasets import wage

data = wage.load()
data = data[["educ", "wage", "sibs", "exper"]].dropna()

# print(wage.DESCR)
# data.head()

### OLS

    - using IV2SLS but nothing specified as endogenous and no instruments
    - note can use functions (log) in formulas (true for all models, including statsmodels)

In [24]:
model1 = IV2SLS.from_formula("np.log(wage) ~ educ + exper", data=data)
result1 = model1.fit(cov_type="robust")
print(result1)


                            OLS Estimation Summary                            
Dep. Variable:           np.log(wage)   R-squared:                      0.1308
Estimator:                        OLS   Adj. R-squared:                 0.1289
No. Observations:                 934   F-statistic:                    135.65
Date:                Fri, Aug 26 2022   P-value (F-stat)                0.0000
Time:                        19:59:33   Distribution:                  chi2(2)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      5.5028     0.1142     48.188     0.0000      5.2790      5.7266
educ           0.0778     0.0067     11.628     0.00

### 2SLS

Use siblings to instrument for education, treating experience as exogenous

In [26]:
model2 = IV2SLS.from_formula("np.log(wage) ~ exper + [educ ~ sibs]", data=data)
result2 = model2.fit(cov_type="robust")
print(result2)

                          IV-2SLS Estimation Summary                          
Dep. Variable:           np.log(wage)   R-squared:                      0.9787
Estimator:                    IV-2SLS   Adj. R-squared:                 0.9787
No. Observations:                 934   F-statistic:                 4.786e+04
Date:                Fri, Aug 26 2022   P-value (F-stat)                0.0000
Time:                        20:02:39   Distribution:                  chi2(2)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
exper          0.0257     0.0182     1.4143     0.1573     -0.0099      0.0614
educ           0.4921     0.0182     27.030     0.00

### New example data

In [28]:
from linearmodels.datasets import fringe

data = fringe.load()

# print(fringe.DESCR)
# data.head()

### SUR

In [32]:
formula = """ 
    {hrbens ~ educ + exper + expersq + union + south + nrtheast + nrthcen + male}
    {hrearn ~ educ + exper + expersq + nrtheast + married + male}
    """
model = SUR.from_formula(formula, data=data)
result = model.fit(cov_type="robust")
print(result)

                           System GLS Estimation Summary                           
Estimator:                        GLS   Overall R-squared:                   0.6951
No. Equations.:                     2   McElroy's R-squared:                 0.2197
No. Observations:                 616   Judge's (OLS) R-squared:             0.1873
Date:                Fri, Aug 26 2022   Berndt's R-squared:                  0.3775
Time:                        20:37:32   Dhrymes's R-squared:                 0.6950
                                        Cov. Estimator:                      robust
                                        Num. Constraints:                      None
                 Equation: hrbens, Dependent Variable: hrbens                 
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
educ           0.0346     0.0046     7.4679     0.0000      0.0255      0.0437
exper       