# Regression Demonstrations

In this notebook, I'll simply demonstrate the syntax for a simple instrumental variables (IV) regression and for fixed-effects/random-effects panel regressions.


## OLS


In [1]:
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
import linearmodels

  from pandas.core import datetools


In [2]:
# # url = 'https://www.uam.es/personal_pdi/economicas/rsmanga/docs/mroz.dta'
# # df = pd.read_stata(url)
# url = 'https://www.uam.es/personal_pdi/economicas/rsmanga/docs/mroz.raw'
# df = pd.read_fwf(url)

url = 'http://fmwww.bc.edu/ec-p/data/wooldridge/wage2.dta'
df = pd.read_stata(url)

In [3]:
df.head()

Unnamed: 0,wage,hours,IQ,KWW,educ,exper,tenure,age,married,black,south,urban,sibs,brthord,meduc,feduc,lwage
0,769.0,40.0,93.0,35.0,12.0,11.0,2.0,31.0,1.0,0.0,0.0,1.0,1.0,2.0,8.0,8.0,6.645091
1,808.0,50.0,119.0,41.0,18.0,11.0,16.0,37.0,1.0,0.0,0.0,1.0,1.0,,14.0,14.0,6.694562
2,825.0,40.0,108.0,46.0,14.0,11.0,9.0,33.0,1.0,0.0,0.0,1.0,1.0,2.0,14.0,14.0,6.715384
3,650.0,40.0,96.0,32.0,12.0,13.0,7.0,32.0,1.0,0.0,0.0,1.0,4.0,3.0,12.0,12.0,6.476973
4,562.0,40.0,74.0,27.0,11.0,14.0,5.0,34.0,1.0,0.0,0.0,1.0,10.0,6.0,6.0,11.0,6.331502


**OLS with `statsmodels`**

In [4]:
reg = smf.ols('lwage ~ educ', df).fit()
reg.summary()

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.097
Model:,OLS,Adj. R-squared:,0.096
Method:,Least Squares,F-statistic:,100.7
Date:,"Thu, 17 May 2018",Prob (F-statistic):,1.42e-22
Time:,13:17:13,Log-Likelihood:,-469.72
No. Observations:,935,AIC:,943.4
Df Residuals:,933,BIC:,953.1
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,5.9731,0.081,73.403,0.000,5.813,6.133
educ,0.0598,0.006,10.035,0.000,0.048,0.072

0,1,2,3
Omnibus:,31.006,Durbin-Watson:,1.779
Prob(Omnibus):,0.0,Jarque-Bera (JB):,37.262
Skew:,-0.375,Prob(JB):,8.1e-09
Kurtosis:,3.627,Cond. No.,85.3


In [5]:
endog = df.lwage
exog = sm.add_constant(df[['educ']])
reg = sm.OLS(endog=endog, exog=exog).fit()
reg.summary()

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.097
Model:,OLS,Adj. R-squared:,0.096
Method:,Least Squares,F-statistic:,100.7
Date:,"Thu, 17 May 2018",Prob (F-statistic):,1.42e-22
Time:,13:17:13,Log-Likelihood:,-469.72
No. Observations:,935,AIC:,943.4
Df Residuals:,933,BIC:,953.1
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,5.9731,0.081,73.403,0.000,5.813,6.133
educ,0.0598,0.006,10.035,0.000,0.048,0.072

0,1,2,3
Omnibus:,31.006,Durbin-Watson:,1.779
Prob(Omnibus):,0.0,Jarque-Bera (JB):,37.262
Skew:,-0.375,Prob(JB):,8.1e-09
Kurtosis:,3.627,Cond. No.,85.3


## Instrumental Variables

Recall that [instrumental variables](https://en.wikipedia.org/wiki/Instrumental_variables_estimation) is a method that allows us obtain consistent estimates of models with endegenous regressors. In this lecture, we will learn how to run IV regressions in Python (and in R).

The following example comes from "Introductory Econometrics" (4th Edition), by Wooldridge.

![15_2_Wooldridge_Intro](./15_2_Wooldridge_Intro.png)

**IV2SLS with `statsmodels`**

In [6]:
from statsmodels.sandbox.regression.gmm import IV2SLS
# IV2SLS?

In [7]:
# Unsure about syntax...

In [8]:
# endog = df.lwage
# exog = sm.add_constant(df[['educ']])
# instrument = sm.add_constant(df[['educ', 'sibs']])
# reg = IV2SLS(endog=endog, exog=exog, instrument=instrument).fit()
# reg.summary()

In [9]:
# reg = IV2SLS.from_formula('lwage ~ 1 + educ', data=df).fit()
# reg

Using **`linearmodels`**

In [10]:
# See https://bashtage.github.io/linearmodels/devel/iv/methods.html
# See here for more information about formulas: 
#  https://bashtage.github.io/linearmodels/devel/iv/examples/using-formulas.html
reg = linearmodels.IV2SLS.from_formula('lwage ~ 1 + [educ ~ sibs]', data=df).fit()
reg

0,1,2,3
Dep. Variable:,lwage,R-squared:,-0.0092
Estimator:,IV-2SLS,Adj. R-squared:,-0.0103
No. Observations:,935,F-statistic:,24.850
Date:,"Thu, May 17 2018",P-value (F-stat),0.0000
Time:,13:17:14,Distribution:,chi2(1)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Intercept,5.1300,0.3304,15.528,0.0000,4.4825,5.7776
educ,0.1224,0.0246,4.9850,0.0000,0.0743,0.1706


Instrumental variables in R can be run as follows: https://www.rdocumentation.org/packages/AER/versions/1.2-5/topics/ivreg