# **Econometrics**
---

## Descrição

Projeto pessoal para prática de econometria, alinhada com a leitura teórica de manuais.

**Referências**

- WOOLDRIDGE, J. M.. **Introdução à econometria: uma abordagem moderna**. 3a ed. São Paulo: Pioneira
Thomson Learning, 2006.

- ANGRIST, J. D. & PISCHKE, J-S. **Mastering Metrics: the path from cause to effect**. Princeton University Press,
2015.

> *Disclaimer*: essa é apenas uma bibliografia básica, outros livros especializados e com conteúdo mais profundo em *Time Series*, Estatística e Econometria também podem ser utilizados. 

## Gabaritos

> http://upfie.net/index.html

# Importando bibliotecas
---

## Documentações

- Wooldridge API: https://pypi.org/project/wooldridge/

- Pandas: https://pandas.pydata.org/pandas-docs/stable/

- Seaborn: http://seaborn.pydata.org/introduction.html

- Matplotlib: https://matplotlib.org/

In [1]:
# !pip install wooldridge

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt 
import wooldridge as wdg
import statsmodels.api as sm
import statsmodels.formula.api as smf
import linearmodels as plm

In [3]:
# palette -> Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, icefire, icefire_r, inferno, inferno_r, jet, jet_r, magma, magma_r, mako, mako_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, rocket, rocket_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, viridis, viridis_r, vlag, vlag_r, winter, winter_r
sns.set_palette('bone')

# style -> white, dark, whitegrid, darkgrid, ticks
sns.set_style('dark')

In [4]:
wdg.data()

  J.M. Wooldridge (2016) Introductory Econometrics: A Modern Approach,
  Cengage Learning, 6th edition.

  401k       401ksubs    admnrev       affairs     airfare
  alcohol    apple       approval      athlet1     athlet2
  attend     audit       barium        beauty      benefits
  beveridge  big9salary  bwght         bwght2      campus
  card       catholic    cement        census2000  ceosal1
  ceosal2    charity     consump       corn        countymurders
  cps78_85   cps91       crime1        crime2      crime3
  crime4     discrim     driving       earns       econmath
  elem94_95  engin       expendshares  ezanders    ezunem
  fair       fertil1     fertil2       fertil3     fish
  fringe     gpa1        gpa2          gpa3        happiness
  hprice1    hprice2     hprice3       hseinv      htv
  infmrt     injury      intdef        intqrt      inven
  jtrain     jtrain2     jtrain3       kielmc      lawsch85
  loanapp    lowbrth     mathpnl       meap00_01   meap01
  meap93    

# Multiple Regression Analysis
---

## Estimation

### Example 3.1: College GPA

In [5]:
wdg.data('gpa1').dtypes

age           int64
soph          int64
junior        int64
senior        int64
senior5       int64
male          int64
campus        int64
business      int64
engineer      int64
colGPA      float64
hsGPA       float64
ACT           int64
job19         int64
job20         int64
drive         int64
bike          int64
walk          int64
voluntr       int64
PC            int64
greek         int64
car           int64
siblings      int64
bgfriend      int64
clubs         int64
skipped     float64
alcohol     float64
gradMI        int64
fathcoll      int64
mothcoll      int64
dtype: object

In [6]:
df_ex3_1 = wdg.data('gpa1')[['colGPA', 'hsGPA', 'ACT']]
df_ex3_1.head()

Unnamed: 0,colGPA,hsGPA,ACT
0,3.0,3.0,21
1,3.4,3.2,24
2,3.0,3.6,26
3,3.5,3.5,27
4,3.6,3.9,28


In [7]:
col_gpa = df_ex3_1.colGPA
col_gpa

0      3.0
1      3.4
2      3.0
3      3.5
4      3.6
      ... 
136    3.0
137    2.3
138    2.8
139    3.4
140    2.8
Name: colGPA, Length: 141, dtype: float64

In [8]:
hsGPA_ACT = df_ex3_1[['hsGPA', 'ACT']]
hsGPA_ACT = sm.add_constant(hsGPA_ACT)
hsGPA_ACT.head()

Unnamed: 0,const,hsGPA,ACT
0,1.0,3.0,21
1,1.0,3.2,24
2,1.0,3.6,26
3,1.0,3.5,27
4,1.0,3.9,28


In [9]:
reg_col_gpa1 = sm.OLS(col_gpa, hsGPA_ACT, hasconst = True).fit()
print(reg_col_gpa1.summary())

                            OLS Regression Results                            
Dep. Variable:                 colGPA   R-squared:                       0.176
Model:                            OLS   Adj. R-squared:                  0.164
Method:                 Least Squares   F-statistic:                     14.78
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           1.53e-06
Time:                        13:44:25   Log-Likelihood:                -46.573
No. Observations:                 141   AIC:                             99.15
Df Residuals:                     138   BIC:                             108.0
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.2863      0.341      3.774      0.0

### Example 3.2: Hourly Wage Equation

In [10]:
wdg.data('wage1')

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trcommpu,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq
0,3.10,11,2,0,0,1,0,2,1,0,...,0,0,0,0,0,0,0,1.131402,4,0
1,3.24,12,22,2,0,1,1,3,1,0,...,0,0,1,0,0,0,1,1.175573,484,4
2,3.00,11,2,0,0,0,0,2,0,0,...,0,1,0,0,0,0,0,1.098612,4,0
3,6.00,8,44,28,0,0,1,0,1,0,...,0,0,0,0,0,1,0,1.791759,1936,784
4,5.30,12,7,2,0,0,1,1,0,0,...,0,0,0,0,0,0,0,1.667707,49,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
521,15.00,16,14,2,0,1,1,2,0,0,...,0,0,0,1,1,0,0,2.708050,196,4
522,2.27,10,2,0,0,1,0,3,0,0,...,0,1,0,0,1,0,0,0.819780,4,0
523,4.67,15,13,18,0,0,1,3,0,0,...,0,0,0,0,1,0,0,1.541159,169,324
524,11.56,16,5,1,0,0,1,0,0,0,...,0,0,0,0,0,0,0,2.447551,25,1


log_hwage: Log of the Hourly Wage

In [11]:
log_hwage = wdg.data('wage1')[['wage']]
log_hwage = np.log(log_hwage)
log_hwage.head()

Unnamed: 0,wage
0,1.131402
1,1.175573
2,1.098612
3,1.791759
4,1.667707


In [12]:
educ_exper_tenure = wdg.data('wage1')[['educ', 'exper', 'tenure']]
educ_exper_tenure = sm.add_constant(educ_exper_tenure)
educ_exper_tenure.head()

Unnamed: 0,const,educ,exper,tenure
0,1.0,11,2,0
1,1.0,12,22,2
2,1.0,11,2,0
3,1.0,8,44,28
4,1.0,12,7,2


In [13]:
reg_log_hwage = sm.OLS(log_hwage, educ_exper_tenure, hasconst = True).fit()
print(reg_log_hwage.summary())

                            OLS Regression Results                            
Dep. Variable:                   wage   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.312
Method:                 Least Squares   F-statistic:                     80.39
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           9.13e-43
Time:                        13:44:26   Log-Likelihood:                -313.55
No. Observations:                 526   AIC:                             635.1
Df Residuals:                     522   BIC:                             652.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.2844      0.104      2.729      0.0

## Example 3.3: Paticipation in 401(k) pension plans

In [14]:
wdg.data('401k')

Unnamed: 0,prate,mrate,totpart,totelg,age,totemp,sole,ltotemp
0,26.100000,0.21,1653.0,6322.0,8,8709.0,0,9.072112
1,100.000000,1.42,262.0,262.0,6,315.0,1,5.752573
2,97.599998,0.91,166.0,170.0,10,275.0,1,5.616771
3,100.000000,0.42,257.0,257.0,7,500.0,0,6.214608
4,82.500000,0.53,591.0,716.0,28,933.0,1,6.838405
...,...,...,...,...,...,...,...,...
1529,85.099998,0.33,553.0,650.0,24,907.0,0,6.810143
1530,100.000000,2.52,142.0,142.0,17,197.0,1,5.283204
1531,100.000000,2.27,1928.0,1928.0,35,2171.0,0,7.682943
1532,100.000000,0.58,166.0,166.0,8,931.0,1,6.836259


In [15]:
wdg.data('401k').corr()

Unnamed: 0,prate,mrate,totpart,totelg,age,totemp,sole,ltotemp
prate,1.0,0.273319,0.004173,-0.076369,0.16398,-0.068814,0.158313,-0.223467
mrate,0.273319,1.0,0.018635,-0.000738,0.118784,-0.02297,0.140393,-0.082238
totpart,0.004173,0.018635,1.0,0.976105,0.199629,0.811043,-0.181434,0.558229
totelg,-0.076369,-0.000738,0.976105,1.0,0.187861,0.83217,-0.192556,0.580859
age,0.16398,0.118784,0.199629,0.187861,1.0,0.159965,-0.067364,0.155261
totemp,-0.068814,-0.02297,0.811043,0.83217,0.159965,1.0,-0.180221,0.678113
sole,0.158313,0.140393,-0.181434,-0.192556,-0.067364,-0.180221,1.0,-0.350264
ltotemp,-0.223467,-0.082238,0.558229,0.580859,0.155261,0.678113,-0.350264,1.0


In [16]:
reg_prate = smf.ols('prate ~ mrate + age', data = wdg.data('401k')).fit()

In [17]:
print(reg_prate.summary())

                            OLS Regression Results                            
Dep. Variable:                  prate   R-squared:                       0.092
Model:                            OLS   Adj. R-squared:                  0.091
Method:                 Least Squares   F-statistic:                     77.79
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           6.67e-33
Time:                        13:44:26   Log-Likelihood:                -6422.3
No. Observations:                1534   AIC:                         1.285e+04
Df Residuals:                    1531   BIC:                         1.287e+04
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     80.1190      0.779    102.846      0.0

## Example 3.4: Determinants of College GPA

In [18]:
print(reg_col_gpa1.summary())

                            OLS Regression Results                            
Dep. Variable:                 colGPA   R-squared:                       0.176
Model:                            OLS   Adj. R-squared:                  0.164
Method:                 Least Squares   F-statistic:                     14.78
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           1.53e-06
Time:                        13:44:26   Log-Likelihood:                -46.573
No. Observations:                 141   AIC:                             99.15
Df Residuals:                     138   BIC:                             108.0
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.2863      0.341      3.774      0.0

## Example 3.5: Explaining Arrests Records

In [19]:
wdg.data('crime1')

Unnamed: 0,narr86,nfarr86,nparr86,pcnv,avgsen,tottime,ptime86,qemp86,inc86,durat,black,hispan,born60,pcnvsq,pt86sq,inc86sq
0,0,0,0,0.38,17.600000,35.200001,12,0.0,0.000000,0.0,0,0,1,0.1444,144,0.000000
1,2,2,0,0.44,0.000000,0.000000,0,1.0,0.800000,0.0,0,1,0,0.1936,0,0.640000
2,1,1,0,0.33,22.799999,22.799999,0,0.0,0.000000,11.0,1,0,1,0.1089,0,0.000000
3,2,2,1,0.25,0.000000,0.000000,5,2.0,8.800000,0.0,0,1,1,0.0625,25,77.440002
4,1,1,0,0.00,0.000000,0.000000,0,2.0,8.100000,1.0,0,0,0,0.0000,0,65.610008
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2720,1,1,0,0.00,0.000000,0.000000,0,0.0,0.000000,3.0,0,0,0,0.0000,0,0.000000
2721,0,0,0,0.00,0.000000,0.000000,0,3.0,11.500000,1.0,0,1,1,0.0000,0,132.250000
2722,0,0,0,0.00,0.000000,0.000000,0,1.0,1.900000,1.0,0,0,0,0.0000,0,3.610000
2723,1,1,0,0.00,0.000000,0.000000,0,0.0,0.000000,19.0,1,0,0,0.0000,0,0.000000


In [20]:
reg1_narr86 = smf.ols('narr86 ~ pcnv + ptime86 + qemp86', data = wdg.data('crime1')).fit()

In [21]:
print(reg1_narr86.summary())

                            OLS Regression Results                            
Dep. Variable:                 narr86   R-squared:                       0.041
Model:                            OLS   Adj. R-squared:                  0.040
Method:                 Least Squares   F-statistic:                     39.10
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           9.91e-25
Time:                        13:44:27   Log-Likelihood:                -3394.7
No. Observations:                2725   AIC:                             6797.
Df Residuals:                    2721   BIC:                             6821.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.7118      0.033     21.565      0.0

In [22]:
reg2_narr86 = smf.ols('narr86 ~ pcnv + avgsen + ptime86 + qemp86', data = wdg.data('crime1')).fit()

In [23]:
print(reg2_narr86.summary())

                            OLS Regression Results                            
Dep. Variable:                 narr86   R-squared:                       0.042
Model:                            OLS   Adj. R-squared:                  0.041
Method:                 Least Squares   F-statistic:                     29.96
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           2.01e-24
Time:                        13:44:27   Log-Likelihood:                -3393.5
No. Observations:                2725   AIC:                             6797.
Df Residuals:                    2720   BIC:                             6826.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.7068      0.033     21.319      0.0

## Example 3.6: Hourly wage equation

In [24]:
wage1_df = wdg.data('wage1')
wage1_df['log_wage'] = np.log(wage1_df.wage)
wage1_df.head()

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq,log_wage
0,3.1,11,2,0,0,1,0,2,1,0,...,0,0,0,0,0,0,1.131402,4,0,1.131402
1,3.24,12,22,2,0,1,1,3,1,0,...,0,1,0,0,0,1,1.175573,484,4,1.175573
2,3.0,11,2,0,0,0,0,2,0,0,...,1,0,0,0,0,0,1.098612,4,0,1.098612
3,6.0,8,44,28,0,0,1,0,1,0,...,0,0,0,0,1,0,1.791759,1936,784,1.791759
4,5.3,12,7,2,0,0,1,1,0,0,...,0,0,0,0,0,0,1.667707,49,4,1.667707


In [25]:
reg_hour_wage_3_6 = smf.ols('log_wage ~ educ', data = wage1_df).fit()
print(reg_hour_wage_3_6.summary())

                            OLS Regression Results                            
Dep. Variable:               log_wage   R-squared:                       0.186
Model:                            OLS   Adj. R-squared:                  0.184
Method:                 Least Squares   F-statistic:                     119.6
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           3.27e-25
Time:                        13:44:27   Log-Likelihood:                -359.38
No. Observations:                 526   AIC:                             722.8
Df Residuals:                     524   BIC:                             731.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5838      0.097      5.998      0.0

Nesse caso, sabemos que o coeficiente de educ só vale para essa amostra e não pode ser generalizado, dado o **viés de variável omitida** ao não incluirmos outras variáveis relevantes para o modelo.

# Pooling Cross Sections across Time: Simple Panel Data Methods
---

## Example 13.1: Women's fertility over time

In [26]:
wdg.data('fertil1')

Unnamed: 0,year,educ,meduc,feduc,age,kids,black,east,northcen,west,...,y80,y82,y84,agesq,y74educ,y76educ,y78educ,y80educ,y82educ,y84educ
0,72,12,8,8,48,4,0,0,1,0,...,0,0,0,2304,0,0,0,0,0,0
1,72,17,8,18,46,3,0,0,0,0,...,0,0,0,2116,0,0,0,0,0,0
2,72,12,7,8,53,2,0,0,1,0,...,0,0,0,2809,0,0,0,0,0,0
3,72,12,12,10,42,2,0,0,1,0,...,0,0,0,1764,0,0,0,0,0,0
4,72,12,3,8,51,2,0,0,0,0,...,0,0,0,2601,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1124,84,12,12,12,41,1,0,0,0,1,...,0,0,1,1681,0,0,0,0,0,12
1125,84,19,10,15,42,0,0,0,0,1,...,0,0,1,1764,0,0,0,0,0,19
1126,84,15,12,10,35,3,0,0,0,1,...,0,0,1,1225,0,0,0,0,0,15
1127,84,12,12,12,46,2,0,0,1,0,...,0,0,1,2116,0,0,0,0,0,12


In [27]:
fertil_13_1 = wdg.data('fertil1')
fertil_13_1['age_2'] = fertil_13_1['age']**2
fertil_13_1.head()

Unnamed: 0,year,educ,meduc,feduc,age,kids,black,east,northcen,west,...,y82,y84,agesq,y74educ,y76educ,y78educ,y80educ,y82educ,y84educ,age_2
0,72,12,8,8,48,4,0,0,1,0,...,0,0,2304,0,0,0,0,0,0,2304
1,72,17,8,18,46,3,0,0,0,0,...,0,0,2116,0,0,0,0,0,0,2116
2,72,12,7,8,53,2,0,0,1,0,...,0,0,2809,0,0,0,0,0,0,2809
3,72,12,12,10,42,2,0,0,1,0,...,0,0,1764,0,0,0,0,0,0,1764
4,72,12,3,8,51,2,0,0,0,0,...,0,0,2601,0,0,0,0,0,0,2601


In [28]:
fertil_13_1.dtypes

year        int64
educ        int64
meduc       int64
feduc       int64
age         int64
kids        int64
black       int64
east        int64
northcen    int64
west        int64
farm        int64
othrural    int64
town        int64
smcity      int64
y74         int64
y76         int64
y78         int64
y80         int64
y82         int64
y84         int64
agesq       int64
y74educ     int64
y76educ     int64
y78educ     int64
y80educ     int64
y82educ     int64
y84educ     int64
age_2       int64
dtype: object

In [29]:
reg_wfertil_13_1 = smf.ols('kids ~ educ + age + age_2 + black + east + northcen + west + farm + othrural + town + smcity + y74 + y76 + y78 + y80 + y82 + y84',
                      data = fertil_13_1).fit()

reg_wfertil_13_1.summary()

0,1,2,3
Dep. Variable:,kids,R-squared:,0.13
Model:,OLS,Adj. R-squared:,0.116
Method:,Least Squares,F-statistic:,9.723
Date:,"Fri, 10 Jun 2022",Prob (F-statistic):,2.4199999999999998e-24
Time:,13:44:28,Log-Likelihood:,-2091.2
No. Observations:,1129,AIC:,4218.0
Df Residuals:,1111,BIC:,4309.0
Df Model:,17,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-7.7425,3.052,-2.537,0.011,-13.730,-1.755
educ,-0.1284,0.018,-6.999,0.000,-0.164,-0.092
age,0.5321,0.138,3.845,0.000,0.261,0.804
age_2,-0.0058,0.002,-3.710,0.000,-0.009,-0.003
black,1.0757,0.174,6.198,0.000,0.735,1.416
east,0.2173,0.133,1.637,0.102,-0.043,0.478
northcen,0.3631,0.121,3.004,0.003,0.126,0.600
west,0.1976,0.167,1.184,0.237,-0.130,0.525
farm,-0.0526,0.147,-0.357,0.721,-0.341,0.236

0,1,2,3
Omnibus:,9.775,Durbin-Watson:,2.011
Prob(Omnibus):,0.008,Jarque-Bera (JB):,9.966
Skew:,0.227,Prob(JB):,0.00685
Kurtosis:,2.92,Cond. No.,132000.0


## Example 13.2: Changes in the Return to Education and the Gender Wage Gap

> Observações:

- *union*:dummy para identificar sindicalizados;

- Ao usarmos as interações com a variável *y85* não precisamos deflacionar os dados, já que o efeito da inflação já será incorporado no intercepto de cada reta de regressão estimada;

- O coeficiente de *y85* representa o quanto o retorno da educação mudou em sete anos (entre 78 e 85);

- O coeficiente da interação entre *y85educ* representa a variação no retorno da educação em relação ao ano de 78;

- O coeficiente de *y85fem* representa a variação na discriminação em relação ao ano de 78;

- O coeficiente de *female* representa a discriminação estimada para o ano inicial, 1978.

Uma alternativa para saber a mudança de todos os coeficientes ao longo desse anos seria estimar uma regressão com interações entre *y85* e todas as variáveis, ou estimar duas regressões diferentes, uma para 1978 e outra para 1985.

In [30]:
wdg.data('cps78_85').head()

Unnamed: 0,educ,south,nonwhite,female,married,exper,expersq,union,lwage,age,year,y85,y85fem,y85educ,y85union
0,12,0,0,0,0,8,64,0,1.215,25,78,0,0,0,0
1,12,0,0,1,1,30,900,1,1.6094,47,78,0,0,0,0
2,6,0,0,0,1,38,1444,1,2.1401,49,78,0,0,0,0
3,12,0,0,0,1,19,361,1,2.0732,36,78,0,0,0,0
4,12,0,0,0,1,11,121,0,1.649,28,78,0,0,0,0


In [31]:
reg_educ_gender_gap = smf.ols('lwage ~ y85 + educ + y85educ + exper + expersq + union + female + y85fem',
                              data = wdg.data('cps78_85')).fit()

reg_educ_gender_gap.summary()

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.426
Model:,OLS,Adj. R-squared:,0.422
Method:,Least Squares,F-statistic:,99.8
Date:,"Fri, 10 Jun 2022",Prob (F-statistic):,4.46e-124
Time:,13:44:28,Log-Likelihood:,-574.24
No. Observations:,1084,AIC:,1166.0
Df Residuals:,1075,BIC:,1211.0
Df Model:,8,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.4589,0.093,4.911,0.000,0.276,0.642
y85,0.1178,0.124,0.952,0.341,-0.125,0.361
educ,0.0747,0.007,11.192,0.000,0.062,0.088
y85educ,0.0185,0.009,1.974,0.049,0.000,0.037
exper,0.0296,0.004,8.293,0.000,0.023,0.037
expersq,-0.0004,7.75e-05,-5.151,0.000,-0.001,-0.000
union,0.2021,0.030,6.672,0.000,0.143,0.262
female,-0.3167,0.037,-8.648,0.000,-0.389,-0.245
y85fem,0.0851,0.051,1.658,0.098,-0.016,0.186

0,1,2,3
Omnibus:,83.747,Durbin-Watson:,1.918
Prob(Omnibus):,0.0,Jarque-Bera (JB):,317.985
Skew:,-0.271,Prob(JB):,8.920000000000001e-70
Kurtosis:,5.597,Cond. No.,8770.0


## Example 13.3: Effect of a Garbage Incinerator’s Location on Housing Prices

### Diferenças em Diferenças


Efeito do incinerador de lixo no preço dos imóveis mais próximos: definidos como os que estavam a 3 milhas ou menos de distância do incinerador.

Ao estimar um modelo de regressão simples, veremos que o coeficiente da dummy para a proximidade do incinerador estará superestimado, ou seja, dirá que os preços dos imóveis perto do incinerado são menores em relação aos demais imóveis mais distantes em uma quantidade maior.

Para controlar esses efeitos, precisaremos estimar uma outra regressão antes do anúncio da construção do incinerador, para saber se os preços dos imóveis na área do incinerador já tendiam a ter menores valores, devido a outros fatores como padrão de construção, localização, distância de pontos como praias e farmácias, entre outros. Ou podemos também criar dummies para os anos em que temos dados e estimar apenas uma regressão.

Portanto, estimaremos o seguinte modelo:

$$
rprice = \beta_0 + \delta_0y81 + \beta_1nearinc + \delta_1y81*nearinc + u
$$

em que:

1. $rprice$ = preço real dos imóveis;

2. $y81$ = dummy para o ano de 1981 após a contrução do incinerador, comparando com o ano de 1978 antes da construção do incinerador;

3. $nearinc$ = dummy para controlar a proximidade do incinerador ($\leq$ 3 milhas é considerado perto).

> Observações:

- $\beta_0$ representa o preço médio das casas longe da área do incinerador em 1978;

- $\delta_0$ captura a mudança nos valores das casas entre 1978 e 1981;

- $\beta_1$ representa o efeito da localização particular das casas que estão na área do incinerador, mas ainda sem a presença do incinerados, oferecendo informações sobre efeitos particulares dessa área sobre o preço do imóvel;

- $\delta_1$ seria o nosso efeito causal que promove a diferença entre os preços por conta do incinerador.

Outro ponto relevante é que nessa regressão mais simples, $\delta_1$ seria individualmente não significante do ponto de vista estatístico, como vemos nos resultados abaixo. 

Contudo, em seu trabalho original, **Kiel e McClain (1995) incluenm mais variáveis no modelo para controlar a diferença de preços dos imóveis nas áreas distintas. Essa inclusão diminui o erro padrão dos coeficientes estimados e ainda altera os seus valores**, nos indicando que, além de ser estatísticamente significante, o efeito do incinerador sobre os preços dos imóveis pode ser ainda mais relevante quando controlamos outros fatores como tamanho dos terrenos dos imóveis, tamanho do imóvel em si, número de quartos e banheiros, entre outros.

In [32]:
wdg.data('kielmc').head()

Unnamed: 0,year,age,agesq,nbh,cbd,intst,lintst,price,rooms,area,...,lprice,y81,larea,lland,y81ldist,lintstsq,nearinc,y81nrinc,rprice,lrprice
0,1978,48,2304.0,4,3000.0,1000.0,6.9078,60000.0,7,1660,...,11.0021,0,7.414573,8.429017,0.0,47.717705,1,0,60000.0,11.0021
1,1978,83,6889.0,4,4000.0,1000.0,6.9078,40000.0,6,2612,...,10.596635,0,7.867871,9.032409,0.0,47.717705,1,0,40000.0,10.596635
2,1978,58,3364.0,4,4000.0,1000.0,6.9078,34000.0,6,1144,...,10.434115,0,7.042286,8.517193,0.0,47.717705,1,0,34000.0,10.434115
3,1978,11,121.0,4,4000.0,1000.0,6.9078,63900.0,5,1136,...,11.065075,0,7.035269,9.21034,0.0,47.717705,1,0,63900.0,11.065075
4,1978,48,2304.0,4,4000.0,2000.0,7.6009,44000.0,5,1868,...,10.691945,0,7.532624,9.21034,0.0,57.773682,1,0,44000.0,10.691945


In [33]:
reg_did_incinerator = smf.ols('rprice ~ y81 + nearinc + y81nrinc', data = wdg.data('kielmc')).fit()
print(reg_did_incinerator.summary())

                            OLS Regression Results                            
Dep. Variable:                 rprice   R-squared:                       0.174
Model:                            OLS   Adj. R-squared:                  0.166
Method:                 Least Squares   F-statistic:                     22.25
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           4.22e-13
Time:                        13:44:29   Log-Likelihood:                -3765.2
No. Observations:                 321   AIC:                             7538.
Df Residuals:                     317   BIC:                             7554.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   8.252e+04   2726.910     30.260      0.0

Podemos estimar o modelo em relação ao log dos preços, para obter de forma mais simples os efeitos do incinerador nos preços em termos de porcentagem. 

Vamos estimar usando o modelo mais simples novamente, o que nos levará ao $\delta_1$ subestimado e não significante estatisticamente. 

Contudo vale lembrar que, controlando para todas as variáveis do *dataset*, o estimador será maior em termos numéricos e se tornará significante estatísticamente.

$$
log(price) = \beta_0 + \delta_0y81 + \beta_1nearinc + \delta_1y81*nearinc + u
$$

Por fim, vale lembrar que ao usar os log nos preços, não precisamos corrigir pelo efeito da inflação e obter os preços reais, dado que essa transformação não afetará os coeficientes angulares, mas apenas o intercepto da regressão:

$log\frac{price}{deflator} = log(price) - log(def)$, ou seja, alterando apenas os intercepto, como destacado no exemplo **13.2**

In [34]:
reg_did_incinerator_log = smf.ols('lprice ~ y81 + nearinc + y81nrinc', data = wdg.data('kielmc')).fit()
print(reg_did_incinerator_log.summary())

                            OLS Regression Results                            
Dep. Variable:                 lprice   R-squared:                       0.409
Model:                            OLS   Adj. R-squared:                  0.403
Method:                 Least Squares   F-statistic:                     73.15
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           5.61e-36
Time:                        13:44:29   Log-Likelihood:                -105.68
No. Observations:                 321   AIC:                             219.4
Df Residuals:                     317   BIC:                             234.4
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     11.2854      0.031    369.839      0.0

## Example 13.4: Effect of Worker Compensation Laws on Weeks out of Work

### DiD II (Differences in Differences)

In [35]:
wdg.data('injury').head()

Unnamed: 0,durat,afchnge,highearn,male,married,hosp,indust,injtype,age,prewage,...,head,neck,upextr,trunk,lowback,lowextr,occdis,manuf,construc,highlpre
0,1.0,1,1,1.0,0.0,1,3.0,1,26.0,404.950012,...,1,0,0,0,0,0,0,0.0,0.0,6.003764
1,1.0,1,1,1.0,1.0,0,3.0,1,31.0,643.825012,...,1,0,0,0,0,0,0,0.0,0.0,6.467427
2,84.0,1,1,1.0,1.0,1,3.0,1,37.0,398.125,...,1,0,0,0,0,0,0,0.0,0.0,5.986766
3,4.0,1,1,1.0,1.0,1,3.0,1,31.0,527.799988,...,1,0,0,0,0,0,0,0.0,0.0,6.268717
4,1.0,1,1,1.0,1.0,0,3.0,1,23.0,528.9375,...,1,0,0,0,0,0,0,0.0,0.0,6.27087


In [36]:
wdg.data('injury').columns

Index(['durat', 'afchnge', 'highearn', 'male', 'married', 'hosp', 'indust',
       'injtype', 'age', 'prewage', 'totmed', 'injdes', 'benefit', 'ky', 'mi',
       'ldurat', 'afhigh', 'lprewage', 'lage', 'ltotmed', 'head', 'neck',
       'upextr', 'trunk', 'lowback', 'lowextr', 'occdis', 'manuf', 'construc',
       'highlpre'],
      dtype='object')

In [37]:
wdg.data('injury').afhigh

0       1
1       1
2       1
3       1
4       1
       ..
7145    0
7146    0
7147    0
7148    0
7149    0
Name: afhigh, Length: 7150, dtype: int64

In [38]:
reg_duration_comp = smf.ols('ldurat ~ afchnge + highearn + afhigh', data = wdg.data('injury')).fit()
print(reg_duration_comp.summary())

                            OLS Regression Results                            
Dep. Variable:                 ldurat   R-squared:                       0.016
Model:                            OLS   Adj. R-squared:                  0.015
Method:                 Least Squares   F-statistic:                     38.34
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           1.42e-24
Time:                        13:44:30   Log-Likelihood:                -12011.
No. Observations:                7150   AIC:                         2.403e+04
Df Residuals:                    7146   BIC:                         2.406e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.1993      0.027     44.241      0.0

## Example 13.5: Sleeping vs. Working

### OLS com EFEITOS FIXOS

In [39]:
wdg.data('slp75_81')

Unnamed: 0,age75,educ75,educ81,gdhlth75,gdhlth81,male,marr75,marr81,slpnap75,slpnap81,totwrk75,totwrk81,yngkid75,yngkid81,ceduc,cgdhlth,cmarr,cslpnap,ctotwrk,cyngkid
0,46,16,16,1,1,1,0,1,3991,4695,2050,2430,0,0,0,0,1,704,380,0
1,39,16,16,1,1,1,1,0,2243,2195,2713,2610,0,0,0,0,-1,-48,-103,0
2,55,15,15,1,0,1,1,1,3285,3115,2493,0,0,0,0,-1,0,-170,-2493,0
3,39,16,16,1,1,1,1,1,3158,3387,2778,1787,0,0,0,0,0,229,-991,0
4,54,17,17,1,1,1,1,1,3743,3800,3118,552,0,0,0,0,0,57,-2566,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
234,44,8,9,1,1,1,1,1,3533,3182,3071,2312,0,0,1,0,0,-351,-759,0
235,29,12,12,1,1,1,1,1,3255,3300,2100,2390,0,1,0,0,0,45,290,1
236,37,6,6,0,0,0,0,0,4367,3692,1751,957,0,0,0,0,0,-675,-794,0
237,24,16,16,1,1,1,1,1,3798,3157,2438,2137,0,1,0,0,0,-641,-301,1


In [40]:
wdg.data('slp75_81').columns

Index(['age75', 'educ75', 'educ81', 'gdhlth75', 'gdhlth81', 'male', 'marr75',
       'marr81', 'slpnap75', 'slpnap81', 'totwrk75', 'totwrk81', 'yngkid75',
       'yngkid81', 'ceduc', 'cgdhlth', 'cmarr', 'cslpnap', 'ctotwrk',
       'cyngkid'],
      dtype='object')

In [41]:
reg_sleep_work = smf.ols('cslpnap ~ ctotwrk + ceduc + cmarr + cyngkid + cgdhlth', data = wdg.data('slp75_81')).fit()
print(reg_sleep_work.summary())

                            OLS Regression Results                            
Dep. Variable:                cslpnap   R-squared:                       0.150
Model:                            OLS   Adj. R-squared:                  0.131
Method:                 Least Squares   F-statistic:                     8.191
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           3.83e-07
Time:                        13:44:30   Log-Likelihood:                -1864.4
No. Observations:                 239   AIC:                             3741.
Df Residuals:                     233   BIC:                             3762.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    -92.6340     45.866     -2.020      0.0

## Example 13.6: Distributed Lag of Crime Rate on Clear-Up Rate

EFEITOS FIXOS com lags

In [42]:
wdg.data('crime3')

Unnamed: 0,district,year,crime,clrprc1,clrprc2,d78,avgclr,lcrime,clcrime,cavgclr,cclrprc1,cclrprc2
0,1,72,49.540001,22,23,0,22.5,3.902781,,,,
1,1,78,71.320000,15,17,1,16.0,4.267177,0.364396,-6.5,-7.0,-6.0
2,2,72,14.770000,51,62,0,56.5,2.692598,,,,
3,2,78,17.850000,39,40,1,39.5,2.882004,0.189405,-17.0,-12.0,-22.0
4,3,72,17.350000,33,34,0,33.5,2.853593,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
101,51,78,10.540000,47,41,1,44.0,2.355178,-0.393374,-4.0,-6.0,-2.0
102,52,72,6.770000,53,56,0,54.5,1.912501,,,,
103,52,78,6.250000,40,69,1,54.5,1.832582,-0.079920,0.0,-13.0,13.0
104,53,72,14.290000,43,59,0,51.0,2.659560,,,,


In [43]:
reg_crime_clean_rate = smf.ols('clcrime ~ cclrprc1 + cclrprc2', data = wdg.data('crime3')).fit()
print(reg_crime_clean_rate.summary())

                            OLS Regression Results                            
Dep. Variable:                clcrime   R-squared:                       0.193
Model:                            OLS   Adj. R-squared:                  0.161
Method:                 Least Squares   F-statistic:                     5.992
Date:                Fri, 10 Jun 2022   Prob (F-statistic):            0.00465
Time:                        13:44:31   Log-Likelihood:                -17.194
No. Observations:                  53   AIC:                             40.39
Df Residuals:                      50   BIC:                             46.30
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0857      0.064      1.343      0.1

## Example 13.7: Effect of Drunk Driving Laws on Traffic Fatalities

EFEITOS FIXOS para políticas públicas ao longo do tempo

In [44]:
wdg.data('traffic1').head()

Unnamed: 0,state,admn90,admn85,open90,open85,dthrte90,dthrte85,speed90,speed85,cdthrte,cadmn,copen,cspeed
0,AL,0,0,0,0,2.6,2.9,1,0,-0.3,0,0,1
1,AK,1,1,1,0,2.1,3.2,0,0,-1.1,0,1,0
2,AZ,1,0,0,0,2.5,4.4,1,0,-1.9,1,0,1
3,AR,0,0,0,0,2.9,3.4,1,0,-0.5,0,0,1
4,CA,1,0,1,1,2.0,2.6,1,0,-0.6,1,0,1


In [45]:
reg_drink_death = smf.ols('cdthrte ~ cadmn + copen', data = wdg.data('traffic1')).fit()
print(reg_drink_death.summary())

                            OLS Regression Results                            
Dep. Variable:                cdthrte   R-squared:                       0.119
Model:                            OLS   Adj. R-squared:                  0.082
Method:                 Least Squares   F-statistic:                     3.231
Date:                Fri, 10 Jun 2022   Prob (F-statistic):             0.0482
Time:                        13:44:31   Log-Likelihood:                -16.323
No. Observations:                  51   AIC:                             38.65
Df Residuals:                      48   BIC:                             44.44
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4968      0.052     -9.476      0.0

## Example 13.8: Effect of Enterprise Zones on Unemployment Claims

In [46]:
wdg.data('ezunem')

Unnamed: 0,year,uclms,ez,d81,d82,d83,d84,d85,d86,d87,...,c17,c18,c19,c20,c21,c22,luclms,guclms,cez,city
0,1980,166746.0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,12.024227,,,1
1,1981,83561.0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,11.333332,-0.690895,0.0,1
2,1982,158146.0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,11.971274,0.637942,0.0,1
3,1983,83572.0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,11.333464,-0.637811,0.0,1
4,1984,45949.0,1,0,0,0,1,0,0,0,...,0,0,0,0,0,0,10.735288,-0.598176,1.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,1984,80605.0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,11.297316,-0.547271,0.0,22
194,1985,82758.0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,11.323676,0.026361,0.0,22
195,1986,67815.0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,11.124538,-0.199138,0.0,22
196,1987,67762.0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,1,11.123756,-0.000782,0.0,22


In [47]:
wdg.data('ezunem').columns

Index(['year', 'uclms', 'ez', 'd81', 'd82', 'd83', 'd84', 'd85', 'd86', 'd87',
       'd88', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'c10',
       'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 'c18', 'c19', 'c20',
       'c21', 'c22', 'luclms', 'guclms', 'cez', 'city'],
      dtype='object')

In [48]:
reg_indiana_ez = smf.ols('guclms ~ d82 + d83 + d84 + d85 + d86 + d87 + d88 + cez', data = wdg.data('ezunem')).fit()
print(reg_indiana_ez.summary())

                            OLS Regression Results                            
Dep. Variable:                 guclms   R-squared:                       0.623
Model:                            OLS   Adj. R-squared:                  0.605
Method:                 Least Squares   F-statistic:                     34.50
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           1.08e-31
Time:                        13:44:32   Log-Likelihood:                 24.553
No. Observations:                 176   AIC:                            -31.11
Df Residuals:                     167   BIC:                            -2.573
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.3216      0.046     -6.982      0.0

Breusch-Pagan test for heterokedasticity:

$H_0$: há homoscedasticidade;

$H_1$: não há homoscedasticidade, logo, temos heteroscedasticidade.

In [49]:
indiana_ez_bptest = sm.stats.het_breuschpagan(reg_indiana_ez.resid, reg_indiana_ez.model.exog)
bptest_labels = ['Lagrange multiplier statistic', 'p-value','F-value', 'F p-value']

print(dict(zip(bptest_labels, indiana_ez_bptest)))

{'Lagrange multiplier statistic': 6.913966430356922, 'p-value': 0.5459429362157441, 'F-value': 0.8535835053121316, 'F p-value': 0.5570473394318125}


> Há grande chance de equívoco se rejeitarmos $H_0$, então, nesse caso, não podemos dizer que há heteroscedasticidade.

## Example 13.9: County Crime Rates in North Carolina

In [50]:
wdg.data('crime4').head()

Unnamed: 0,county,year,crmrte,prbarr,prbconv,prbpris,avgsen,polpc,density,taxpc,...,lpctymle,lpctmin,clcrmrte,clprbarr,clprbcon,clprbpri,clavgsen,clpolpc,cltaxpc,clmix
0,1,81,0.039885,0.289696,0.402062,0.472222,5.61,0.001787,2.307159,25.69763,...,-2.43387,3.006608,,,,,,,,
1,1,82,0.038345,0.338111,0.433005,0.506993,5.59,0.001767,2.330254,24.874252,...,-2.449038,3.006608,-0.039376,0.154542,0.074143,0.071048,-0.003571,-0.011364,-0.032565,0.030857
2,1,83,0.030305,0.330449,0.525703,0.479705,5.8,0.001836,2.341801,26.451443,...,-2.464036,3.006608,-0.235316,-0.022922,0.193987,-0.055326,0.036879,0.038413,0.061477,-0.244732
3,1,84,0.034726,0.362525,0.604706,0.520104,6.89,0.001886,2.34642,26.842348,...,-2.478925,3.006608,0.13618,0.092641,0.140006,0.080857,0.172213,0.02693,0.01467,-0.027331
4,1,85,0.036573,0.325395,0.578723,0.497059,6.55,0.001924,2.364896,28.140337,...,-2.497306,3.006608,0.051825,-0.108054,-0.043918,-0.04532,-0.050606,0.020199,0.047223,0.172125


In [51]:
wdg.data('crime4').columns

Index(['county', 'year', 'crmrte', 'prbarr', 'prbconv', 'prbpris', 'avgsen',
       'polpc', 'density', 'taxpc', 'west', 'central', 'urban', 'pctmin80',
       'wcon', 'wtuc', 'wtrd', 'wfir', 'wser', 'wmfg', 'wfed', 'wsta', 'wloc',
       'mix', 'pctymle', 'd82', 'd83', 'd84', 'd85', 'd86', 'd87', 'lcrmrte',
       'lprbarr', 'lprbconv', 'lprbpris', 'lavgsen', 'lpolpc', 'ldensity',
       'ltaxpc', 'lwcon', 'lwtuc', 'lwtrd', 'lwfir', 'lwser', 'lwmfg', 'lwfed',
       'lwsta', 'lwloc', 'lmix', 'lpctymle', 'lpctmin', 'clcrmrte', 'clprbarr',
       'clprbcon', 'clprbpri', 'clavgsen', 'clpolpc', 'cltaxpc', 'clmix'],
      dtype='object')

In [52]:
reg_carolina_crime = smf.ols('clcrmrte ~ d83 + d84 + d85 + d86 + d87 + clprbarr + clprbcon + clprbpri + clavgsen + clpolpc',
                             data = wdg.data('crime4')).fit()
reg_carolina_crime.summary()

0,1,2,3
Dep. Variable:,clcrmrte,R-squared:,0.433
Model:,OLS,Adj. R-squared:,0.422
Method:,Least Squares,F-statistic:,40.32
Date:,"Fri, 10 Jun 2022",Prob (F-statistic):,6.3e-59
Time:,13:44:33,Log-Likelihood:,248.48
No. Observations:,540,AIC:,-475.0
Df Residuals:,529,BIC:,-427.7
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0077,0.017,0.452,0.651,-0.026,0.041
d83,-0.0999,0.024,-4.179,0.000,-0.147,-0.053
d84,-0.0479,0.024,-2.040,0.042,-0.094,-0.002
d85,-0.0046,0.023,-0.196,0.845,-0.051,0.042
d86,0.0275,0.024,1.139,0.255,-0.020,0.075
d87,0.0408,0.024,1.672,0.095,-0.007,0.089
clprbarr,-0.3275,0.030,-10.924,0.000,-0.386,-0.269
clprbcon,-0.2381,0.018,-13.058,0.000,-0.274,-0.202
clprbpri,-0.1650,0.026,-6.356,0.000,-0.216,-0.114

0,1,2,3
Omnibus:,60.932,Durbin-Watson:,2.366
Prob(Omnibus):,0.0,Jarque-Bera (JB):,419.621
Skew:,-0.12,Prob(JB):,7.59e-92
Kurtosis:,7.312,Cond. No.,7.44


White Test para Heteroscedasticidade:

$H_0$: há homoscedasticidade;

$H_1$: não há homoscedasticidade, logo, temos heteroscedasticidade.

In [53]:
from statsmodels.stats.diagnostic import het_white

carolina_crime_whitetest = het_white(reg_carolina_crime.resid, reg_carolina_crime.model.exog)
whitetest_labels = ['Test Statistic', 'Test Statistic p-value', 'F-Statistic', 'F-Test p-value']

print(dict(zip(whitetest_labels, carolina_crime_whitetest)))

{'Test Statistic': 257.57287535810474, 'Test Statistic p-value': 1.0036365053570545e-29, 'F-Statistic': 8.919337065079429, 'F-Test p-value': 3.5592736360154102e-43}


> Podemos rejeitar $H_0$, ou seja, há evidências para a presença de heteroscedasticidade.

# Advanced Panel Data Methods
---

## Example 14.1: Effect of Job Training on Firm Scrap Rates

In [54]:
jtrain = wdg.data('jtrain')
jtrain.columns

Index(['year', 'fcode', 'employ', 'sales', 'avgsal', 'scrap', 'rework',
       'tothrs', 'union', 'grant', 'd89', 'd88', 'totrain', 'hrsemp', 'lscrap',
       'lemploy', 'lsales', 'lrework', 'lhrsemp', 'lscrap_1', 'grant_1',
       'clscrap', 'cgrant', 'clemploy', 'clsales', 'lavgsal', 'clavgsal',
       'cgrant_1', 'chrsemp', 'clhrsemp'],
      dtype='object')

Tomaremos o ano e o código da firma como referências para os efeitos fixos, logo, para rodar a regressão em painel, devemos colocá-los como índices e criar uma coluna para o efeito fixo da firma (entidade estudada neste exemplo).

In [55]:
jtrain['entity'] = jtrain['fcode']

In [56]:
jtrain = jtrain.set_index(['fcode', 'year'])

In [57]:
FE_jtrain_reg = plm.PanelOLS.from_formula(formula='lscrap ~ d88 + d89 + grant + grant_1 + EntityEffects', data=jtrain)
results_FE_jtrain = FE_jtrain_reg.fit()
results_FE_jtrain.summary

Inputs contain missing values. Dropping rows with missing observations.
  super().__init__(dependent, exog, weights=weights, check_rank=check_rank)


0,1,2,3
Dep. Variable:,lscrap,R-squared:,0.2010
Estimator:,PanelOLS,R-squared (Between):,-0.1103
No. Observations:,162,R-squared (Within):,0.2010
Date:,"Fri, Jun 10 2022",R-squared (Overall):,-0.0839
Time:,13:44:34,Log-likelihood,-80.946
Cov. Estimator:,Unadjusted,,
,,F-statistic:,6.5426
Entities:,54,P-value,0.0001
Avg Obs:,3.0000,Distribution:,"F(4,104)"
Min Obs:,3.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
d88,-0.0802,0.1095,-0.7327,0.4654,-0.2973,0.1369
d89,-0.2472,0.1332,-1.8556,0.0663,-0.5114,0.0170
grant,-0.2523,0.1506,-1.6751,0.0969,-0.5510,0.0464
grant_1,-0.4216,0.2102,-2.0057,0.0475,-0.8384,-0.0048


## Example 14.2: Has the Return to Education Changed over Time?

In [58]:
wdg.data()

  J.M. Wooldridge (2016) Introductory Econometrics: A Modern Approach,
  Cengage Learning, 6th edition.

  401k       401ksubs    admnrev       affairs     airfare
  alcohol    apple       approval      athlet1     athlet2
  attend     audit       barium        beauty      benefits
  beveridge  big9salary  bwght         bwght2      campus
  card       catholic    cement        census2000  ceosal1
  ceosal2    charity     consump       corn        countymurders
  cps78_85   cps91       crime1        crime2      crime3
  crime4     discrim     driving       earns       econmath
  elem94_95  engin       expendshares  ezanders    ezunem
  fair       fertil1     fertil2       fertil3     fish
  fringe     gpa1        gpa2          gpa3        happiness
  hprice1    hprice2     hprice3       hseinv      htv
  infmrt     injury      intdef        intqrt      inven
  jtrain     jtrain2     jtrain3       kielmc      lawsch85
  loanapp    lowbrth     mathpnl       meap00_01   meap01
  meap93    

In [59]:
wagepan = wdg.data('wagepan')

In [60]:
wagepan = wagepan.set_index(['nr', 'year'], drop=False )

In [61]:
wagepan.lwage.dtypes

dtype('float64')

In [62]:
FE_ret_educ = plm.PanelOLS.from_formula(formula='lwage ~ married + union + C(year)*educ + EntityEffects',
                                        data=wagepan, drop_absorbed=True)
results_FE_ret_educ = FE_ret_educ.fit()
results_FE_ret_educ.summary

Variables have been fully absorbed and have removed from the regression:

educ

  results_FE_ret_educ = FE_ret_educ.fit()


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1708
Estimator:,PanelOLS,R-squared (Between):,0.0905
No. Observations:,4360,R-squared (Within):,0.1708
Date:,"Fri, Jun 10 2022",R-squared (Overall):,0.1277
Time:,13:44:35,Log-likelihood,-1350.7
Cov. Estimator:,Unadjusted,,
,,F-statistic:,48.907
Entities:,545,P-value,0.0000
Avg Obs:,8.0000,Distribution:,"F(16,3799)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
C(year)[T.1980],1.3625,0.0162,83.903,0.0000,1.3306,1.3943
C(year)[T.1981],1.3400,0.1452,9.2307,0.0000,1.0554,1.6247
C(year)[T.1982],1.3567,0.1451,9.3481,0.0000,1.0722,1.6412
C(year)[T.1983],1.3729,0.1452,9.4561,0.0000,1.0882,1.6575
C(year)[T.1984],1.4468,0.1452,9.9617,0.0000,1.1621,1.7316
C(year)[T.1985],1.4122,0.1451,9.7315,0.0000,1.1277,1.6967
C(year)[T.1986],1.4281,0.1451,9.8404,0.0000,1.1435,1.7126
C(year)[T.1987],1.4529,0.1452,10.006,0.0000,1.1682,1.7376
married,0.0548,0.0184,2.9773,0.0029,0.0187,0.0909


## Example 14.3: Effect of Job Training on Firm Scrap Rates

In [63]:
jtrain.columns

Index(['employ', 'sales', 'avgsal', 'scrap', 'rework', 'tothrs', 'union',
       'grant', 'd89', 'd88', 'totrain', 'hrsemp', 'lscrap', 'lemploy',
       'lsales', 'lrework', 'lhrsemp', 'lscrap_1', 'grant_1', 'clscrap',
       'cgrant', 'clemploy', 'clsales', 'lavgsal', 'clavgsal', 'cgrant_1',
       'chrsemp', 'clhrsemp', 'entity'],
      dtype='object')

In [64]:
FE_jtrain_reg_2 = plm.PanelOLS.from_formula(formula='lscrap ~ d88 + d89 + grant + grant_1 + lsales + lemploy + EntityEffects', 
                                            data=jtrain)
results_FE_jtrain_2 = FE_jtrain_reg_2.fit()
results_FE_jtrain_2.summary

0,1,2,3
Dep. Variable:,lscrap,R-squared:,0.2131
Estimator:,PanelOLS,R-squared (Between):,-2.2478
No. Observations:,148,R-squared (Within):,0.2131
Date:,"Fri, Jun 10 2022",R-squared (Overall):,-2.0639
Time:,13:44:36,Log-likelihood,-68.887
Cov. Estimator:,Unadjusted,,
,,F-statistic:,4.1063
Entities:,51,P-value,0.0011
Avg Obs:,2.9020,Distribution:,"F(6,91)"
Min Obs:,1.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
d88,-0.0040,0.1195,-0.0331,0.9736,-0.2414,0.2335
d89,-0.1322,0.1537,-0.8601,0.3920,-0.4375,0.1731
grant,-0.2968,0.1571,-1.8891,0.0621,-0.6088,0.0153
grant_1,-0.5356,0.2242,-2.3888,0.0190,-0.9809,-0.0902
lemploy,-0.0764,0.3503,-0.2180,0.8279,-0.7722,0.6194
lsales,-0.0869,0.2597,-0.3345,0.7388,-0.6027,0.4290


## Example 14.4: A Wage Equation Using Panel Data

In [65]:
wagepan.columns

Index(['nr', 'year', 'agric', 'black', 'bus', 'construc', 'ent', 'exper',
       'fin', 'hisp', 'poorhlth', 'hours', 'manuf', 'married', 'min',
       'nrthcen', 'nrtheast', 'occ1', 'occ2', 'occ3', 'occ4', 'occ5', 'occ6',
       'occ7', 'occ8', 'occ9', 'per', 'pro', 'pub', 'rur', 'south', 'educ',
       'tra', 'trad', 'union', 'lwage', 'd81', 'd82', 'd83', 'd84', 'd85',
       'd86', 'd87', 'expersq'],
      dtype='object')

1) Pooled OLS

In [66]:
'''
Também podemos usar o I(exper**2) para gerar a interação entre exper e obter seu valor ao quadrado, equilavente a expersq
'''

# pool_ols_wage = plm.PooledOLS.from_formula('lwage ~ educ + black + hisp + exper + I(exper**2) + married + union + C(year)',
#                                            data = wagepan)
# results_ols_wage = pool_ols_wage.fit()
# results_ols_wage.summary

'\nTambém podemos usar o I(exper**2) para gerar a interação entre exper e obter seu valor ao quadrado, equilavente a expersq\n'

In [67]:
pool_ols_wage = plm.PooledOLS.from_formula('lwage ~ educ + black + hisp + exper + expersq + married + union + C(year)',
                                            data = wagepan)
results_ols_wage = pool_ols_wage.fit()
results_ols_wage.summary

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1893
Estimator:,PooledOLS,R-squared (Between):,0.2066
No. Observations:,4360,R-squared (Within):,0.1692
Date:,"Fri, Jun 10 2022",R-squared (Overall):,0.1893
Time:,13:44:37,Log-likelihood,-2982.0
Cov. Estimator:,Unadjusted,,
,,F-statistic:,72.459
Entities:,545,P-value,0.0000
Avg Obs:,8.0000,Distribution:,"F(14,4345)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
C(year)[T.1980],0.0921,0.0783,1.1761,0.2396,-0.0614,0.2455
C(year)[T.1981],0.1504,0.0838,1.7935,0.0730,-0.0140,0.3148
C(year)[T.1982],0.1548,0.0893,1.7335,0.0831,-0.0203,0.3299
C(year)[T.1983],0.1541,0.0944,1.6323,0.1027,-0.0310,0.3391
C(year)[T.1984],0.1825,0.0990,1.8437,0.0653,-0.0116,0.3766
C(year)[T.1985],0.2013,0.1031,1.9523,0.0510,-0.0008,0.4035
C(year)[T.1986],0.2340,0.1068,2.1920,0.0284,0.0247,0.4433
C(year)[T.1987],0.2659,0.1100,2.4166,0.0157,0.0502,0.4816
black,-0.1392,0.0236,-5.9049,0.0000,-0.1855,-0.0930


2. Random Effects 

In [68]:
RE_wage = plm.RandomEffects.from_formula('lwage ~ educ + black + hisp + exper + expersq + married + union + C(year)',
                                            data = wagepan)
results_RE_wage = RE_wage.fit()
results_RE_wage.summary

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1806
Estimator:,RandomEffects,R-squared (Between):,0.1853
No. Observations:,4360,R-squared (Within):,0.1799
Date:,"Fri, Jun 10 2022",R-squared (Overall):,0.1828
Time:,13:44:37,Log-likelihood,-1622.5
Cov. Estimator:,Unadjusted,,
,,F-statistic:,68.409
Entities:,545,P-value,0.0000
Avg Obs:,8.0000,Distribution:,"F(14,4345)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
C(year)[T.1980],0.0234,0.1514,0.1546,0.8771,-0.2735,0.3203
C(year)[T.1981],0.0638,0.1601,0.3988,0.6901,-0.2500,0.3777
C(year)[T.1982],0.0543,0.1690,0.3211,0.7481,-0.2770,0.3856
C(year)[T.1983],0.0436,0.1780,0.2450,0.8065,-0.3054,0.3926
C(year)[T.1984],0.0664,0.1871,0.3551,0.7225,-0.3003,0.4332
C(year)[T.1985],0.0811,0.1961,0.4136,0.6792,-0.3034,0.4656
C(year)[T.1986],0.1152,0.2052,0.5617,0.5744,-0.2870,0.5175
C(year)[T.1987],0.1583,0.2143,0.7386,0.4602,-0.2618,0.5783
black,-0.1394,0.0480,-2.9054,0.0037,-0.2334,-0.0453


In [69]:
results_RE_wage.theta.iloc[0,0] # iloc (index location - localizar pelo índice) 
                                # no 0,0 para pegar localizar o valor da linha 0 na coluna 0

0.6450593029243452

3. Fixed Effects

In [70]:
FE_wage = plm.PanelOLS.from_formula('lwage ~ expersq + married + union + C(year) + EntityEffects',
                                            data = wagepan)
results_FE_wage = FE_wage.fit()
results_FE_wage.summary

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1806
Estimator:,PanelOLS,R-squared (Between):,-0.0052
No. Observations:,4360,R-squared (Within):,0.1806
Date:,"Fri, Jun 10 2022",R-squared (Overall):,0.0807
Time:,13:44:37,Log-likelihood,-1324.8
Cov. Estimator:,Unadjusted,,
,,F-statistic:,83.851
Entities:,545,P-value,0.0000
Avg Obs:,8.0000,Distribution:,"F(10,3805)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
C(year)[T.1980],1.4260,0.0183,77.748,0.0000,1.3901,1.4620
C(year)[T.1981],1.5772,0.0216,72.966,0.0000,1.5348,1.6196
C(year)[T.1982],1.6790,0.0265,63.258,0.0000,1.6270,1.7310
C(year)[T.1983],1.7805,0.0333,53.439,0.0000,1.7151,1.8458
C(year)[T.1984],1.9161,0.0417,45.982,0.0000,1.8344,1.9978
C(year)[T.1985],2.0435,0.0515,39.646,0.0000,1.9424,2.1446
C(year)[T.1986],2.1915,0.0630,34.771,0.0000,2.0679,2.3151
C(year)[T.1987],2.3510,0.0762,30.867,0.0000,2.2017,2.5004
expersq,-0.0052,0.0007,-7.3612,0.0000,-0.0066,-0.0038


# Instrumental variables estimation and Two Stage Least Squares
---

## Example 15.1: Estimating the Return to Education for Married Women

In [71]:
wdg.data('mroz')

Unnamed: 0,inlf,hours,kidslt6,kidsge6,age,educ,wage,repwage,hushrs,husage,...,faminc,mtr,motheduc,fatheduc,unem,city,exper,nwifeinc,lwage,expersq
0,1,1610,1,0,32,12,3.3540,2.65,2708,34,...,16310.0,0.7215,12,7,5.0,0,14,10.910060,1.210154,196
1,1,1656,0,2,30,12,1.3889,2.65,2310,30,...,21800.0,0.6615,7,7,11.0,1,5,19.499981,0.328512,25
2,1,1980,1,3,35,12,4.5455,4.04,3072,40,...,21040.0,0.6915,12,7,5.0,0,15,12.039910,1.514138,225
3,1,456,0,3,34,12,1.0965,3.25,1920,53,...,7300.0,0.7815,7,7,5.0,0,6,6.799996,0.092123,36
4,1,1568,1,2,31,14,4.5918,3.60,2000,32,...,27300.0,0.6215,12,14,9.5,1,7,20.100058,1.524272,49
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
748,0,0,0,2,40,13,,0.00,3020,43,...,28200.0,0.6215,10,10,9.5,1,5,28.200001,,25
749,0,0,2,3,31,12,,0.00,2056,33,...,10000.0,0.7715,12,12,7.5,0,14,10.000000,,196
750,0,0,0,0,43,12,,0.00,2383,43,...,9952.0,0.7515,10,3,7.5,0,4,9.952000,,16
751,0,0,0,0,60,12,,0.00,1705,55,...,24984.0,0.6215,12,12,14.0,1,15,24.983999,,225


In [72]:
mroz_nona = wdg.data('mroz').dropna(subset=['lwage'])

In [73]:
ols_151 = smf.ols('lwage ~ educ', data = mroz_nona).fit()
print(ols_151.summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.118
Model:                            OLS   Adj. R-squared:                  0.116
Method:                 Least Squares   F-statistic:                     56.93
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           2.76e-13
Time:                        13:44:38   Log-Likelihood:                -441.26
No. Observations:                 428   AIC:                             886.5
Df Residuals:                     426   BIC:                             894.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.1852      0.185     -1.000      0.3

In [74]:
import linearmodels.iv as iv
iv_151 = iv.IV2SLS.from_formula('lwage ~ 1 + [educ ~ fatheduc]', data = mroz_nona)
results_iv151 = iv_151.fit(cov_type='unadjusted', debiased=True) # unadjusted cov_type significa que assumimos que há homoscedasticidade
print(results_iv151.summary)

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                      0.0934
Estimator:                    IV-2SLS   Adj. R-squared:                 0.0913
No. Observations:                 428   F-statistic:                    2.8354
Date:                Fri, Jun 10 2022   P-value (F-stat)                0.0929
Time:                        13:44:38   Distribution:                 F(1,426)
Cov. Estimator:            unadjusted                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      0.4411     0.4461     0.9888     0.3233     -0.4357      1.3179
educ           0.0592     0.0351     1.6839     0.09

## Example 15.2: Estimating the Return to Education for Men

In [75]:
wdg.data('wage2')

Unnamed: 0,wage,hours,IQ,KWW,educ,exper,tenure,age,married,black,south,urban,sibs,brthord,meduc,feduc,lwage
0,769,40,93,35,12,11,2,31,1,0,0,1,1,2.0,8.0,8.0,6.645091
1,808,50,119,41,18,11,16,37,1,0,0,1,1,,14.0,14.0,6.694562
2,825,40,108,46,14,11,9,33,1,0,0,1,1,2.0,14.0,14.0,6.715384
3,650,40,96,32,12,13,7,32,1,0,0,1,4,3.0,12.0,12.0,6.476973
4,562,40,74,27,11,14,5,34,1,0,0,1,10,6.0,6.0,11.0,6.331502
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
930,520,40,79,28,16,6,1,30,1,1,1,0,0,1.0,11.0,,6.253829
931,1202,40,102,32,13,10,3,31,1,0,1,1,7,7.0,8.0,6.0,7.091742
932,538,45,77,22,12,12,10,28,1,1,1,0,9,,7.0,,6.287858
933,873,44,109,25,12,12,12,28,1,0,1,0,1,1.0,,11.0,6.771935


In [76]:
wage2_nona = wdg.data('wage2').dropna(subset=['lwage'])

In [77]:
iv_152 = iv.IV2SLS.from_formula('lwage ~ 1 + [educ ~ sibs]', data = wage2_nona)
results_iv152 = iv_152.fit(cov_type='unadjusted', debiased=True) # unadjusted cov_type significa que assumimos que há homoscedasticidade
print(results_iv152.summary)

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                     -0.0092
Estimator:                    IV-2SLS   Adj. R-squared:                -0.0103
No. Observations:                 935   F-statistic:                    21.588
Date:                Fri, Jun 10 2022   P-value (F-stat)                0.0000
Time:                        13:44:38   Distribution:                 F(1,933)
Cov. Estimator:            unadjusted                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      5.1300     0.3552     14.444     0.0000      4.4330      5.8271
educ           0.1224     0.0264     4.6463     0.00

## Example 15.4: Using College Proximity as an IV for Education

In [78]:
card = wdg.data('card')
card.head()

Unnamed: 0,id,nearc2,nearc4,educ,age,fatheduc,motheduc,weight,momdad14,sinmom14,...,smsa66,wage,enroll,KWW,IQ,married,libcrd14,exper,lwage,expersq
0,2,0,0,7,29,,,158413.0,1,0,...,1,548,0,15.0,,1.0,0.0,16,6.306275,256
1,3,0,0,12,27,8.0,8.0,380166.0,1,0,...,1,481,0,35.0,93.0,1.0,1.0,9,6.175867,81
2,4,0,0,12,34,14.0,12.0,367470.0,1,0,...,1,721,0,42.0,103.0,1.0,1.0,16,6.580639,256
3,5,1,1,11,27,11.0,12.0,380166.0,1,0,...,1,250,0,25.0,88.0,1.0,1.0,10,5.521461,100
4,6,1,1,12,34,8.0,7.0,367470.0,1,0,...,1,729,0,34.0,108.0,1.0,0.0,16,6.591674,256


In [79]:
reg_reduc154 = smf.ols(
    formula='educ ~ nearc4 + exper + I(exper**2) + black + smsa +'
    'south + smsa66 + reg662 + reg663 + reg664 + reg665 + reg666 +'
    'reg667 + reg668 + reg669', data=card)
results_redf = reg_reduc154.fit()
print(results_redf.summary())

                            OLS Regression Results                            
Dep. Variable:                   educ   R-squared:                       0.477
Model:                            OLS   Adj. R-squared:                  0.474
Method:                 Least Squares   F-statistic:                     182.1
Date:                Fri, 10 Jun 2022   Prob (F-statistic):               0.00
Time:                        13:44:39   Log-Likelihood:                -6258.5
No. Observations:                3010   AIC:                         1.255e+04
Df Residuals:                    2994   BIC:                         1.265e+04
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept        16.6383      0.241     69.145

In [84]:
iv_154 = iv.IV2SLS.from_formula(
    formula='np.log(wage)~ 1 + exper + I(exper**2) + black + smsa + '
            'south + smsa66 + reg662 + reg663 + reg664 + reg665 +'
            'reg666 + reg667 + reg668 + reg669 + [educ ~ nearc4]',
    data=card)
results_iv154 = iv_154.fit(cov_type='unadjusted', debiased=True)
results_iv154.summary

0,1,2,3
Dep. Variable:,np.log(wage),R-squared:,0.2382
Estimator:,IV-2SLS,Adj. R-squared:,0.2343
No. Observations:,3010,F-statistic:,51.008
Date:,"Fri, Jun 10 2022",P-value (F-stat),0.0000
Time:,13:54:31,Distribution:,"F(15,2994)"
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Intercept,3.6662,0.9248,3.9641,0.0001,1.8528,5.4795
I(exper**2),-0.0023,0.0003,-7.0014,0.0000,-0.0030,-0.0017
black,-0.1468,0.0539,-2.7231,0.0065,-0.2525,-0.0411
exper,0.1083,0.0237,4.5764,0.0000,0.0619,0.1547
reg662,0.1008,0.0377,2.6739,0.0075,0.0269,0.1747
reg663,0.1483,0.0368,4.0272,0.0001,0.0761,0.2204
reg664,0.0499,0.0437,1.1408,0.2541,-0.0359,0.1357
reg665,0.1463,0.0471,3.1079,0.0019,0.0540,0.2386
reg666,0.1629,0.0519,3.1382,0.0017,0.0611,0.2647


## Example 15.5: Return to Education for Working Women

In [85]:
mroz_nona

Unnamed: 0,inlf,hours,kidslt6,kidsge6,age,educ,wage,repwage,hushrs,husage,...,faminc,mtr,motheduc,fatheduc,unem,city,exper,nwifeinc,lwage,expersq
0,1,1610,1,0,32,12,3.3540,2.65,2708,34,...,16310.0,0.7215,12,7,5.0,0,14,10.910060,1.210154,196
1,1,1656,0,2,30,12,1.3889,2.65,2310,30,...,21800.0,0.6615,7,7,11.0,1,5,19.499981,0.328512,25
2,1,1980,1,3,35,12,4.5455,4.04,3072,40,...,21040.0,0.6915,12,7,5.0,0,15,12.039910,1.514138,225
3,1,456,0,3,34,12,1.0965,3.25,1920,53,...,7300.0,0.7815,7,7,5.0,0,6,6.799996,0.092123,36
4,1,1568,1,2,31,14,4.5918,3.60,2000,32,...,27300.0,0.6215,12,14,9.5,1,7,20.100058,1.524272,49
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
423,1,680,0,5,36,10,2.3118,0.00,3430,43,...,19772.0,0.7215,7,7,7.5,0,2,18.199976,0.838026,4
424,1,2450,0,1,40,12,5.3061,6.50,2008,40,...,35641.0,0.6215,7,7,5.0,1,21,22.641056,1.668857,441
425,1,2144,0,2,43,13,5.8675,0.00,2140,43,...,34220.0,0.5815,7,7,7.5,1,22,21.640079,1.769429,484
426,1,1760,0,1,33,12,3.4091,3.21,3380,34,...,30000.0,0.5815,12,16,11.0,1,14,23.999985,1.226448,196


In [122]:
reteduc_155 = iv.IV2SLS.from_formula(
    formula='lwage ~ 1 + exper + I(exper**2) + [educ ~ motheduc + fatheduc]',
    data = mroz_nona
).fit(cov_type='unadjusted', debiased=True)

print(reteduc_155.summary)

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                      0.1357
Estimator:                    IV-2SLS   Adj. R-squared:                 0.1296
No. Observations:                 428   F-statistic:                    8.1407
Date:                Fri, Jun 10 2022   P-value (F-stat)                0.0000
Time:                        15:56:39   Distribution:                 F(3,424)
Cov. Estimator:            unadjusted                                         
                                                                              
                              Parameter Estimates                              
             Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
-------------------------------------------------------------------------------
Intercept       0.0481     0.4003     0.1202     0.9044     -0.7388      0.8350
I(exper**2)    -0.0009     0.0004    -2.2380    

In [108]:
print(reteduc_155.first_stage)

    First Stage Estimation Results    
                                  educ
--------------------------------------
R-squared                       0.2115
Partial R-squared               0.2076
Shea's R-squared                0.2076
Partial F-statistic             55.400
P-value (Partial F-stat)      1.11e-16
Partial F-stat Distn          F(2,423)
Intercept                       9.1026
                              (21.340)
I(exper**2)                    -0.0010
                             (-0.8386)
exper                           0.0452
                              (1.1236)
fatheduc                        0.1895
                              (5.6152)
motheduc                        0.1576
                              (4.3906)
--------------------------------------

T-stats reported in parentheses
T-stats use same covariance type as original model


## Example 15.7: Return to Education for Working Women

In [119]:
fstg_reg = smf.ols('educ ~ exper + I(exper**2) + fatheduc + motheduc', data = mroz_nona).fit()
print(fstg_reg.summary())

                            OLS Regression Results                            
Dep. Variable:                   educ   R-squared:                       0.211
Model:                            OLS   Adj. R-squared:                  0.204
Method:                 Least Squares   F-statistic:                     28.36
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           6.87e-21
Time:                        15:52:38   Log-Likelihood:                -909.72
No. Observations:                 428   AIC:                             1829.
Df Residuals:                     423   BIC:                             1850.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept         9.1026      0.427     21.340

In [120]:
mroz_nona['resid'] = fstg_reg.resid

In [121]:
secstg_reg = smf.ols('lwage ~ resid + educ + exper + I(exper**2)', data = mroz_nona).fit()
print(secstg_reg.summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.162
Model:                            OLS   Adj. R-squared:                  0.154
Method:                 Least Squares   F-statistic:                     20.50
Date:                Fri, 10 Jun 2022   Prob (F-statistic):           1.89e-15
Time:                        15:52:41   Log-Likelihood:                -430.19
No. Observations:                 428   AIC:                             870.4
Df Residuals:                     423   BIC:                             890.7
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept         0.0481      0.395      0.122

## Example 15.8: Return to Education for Working Women

In [123]:
reteduc_155.summary # regressão principal

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1357
Estimator:,IV-2SLS,Adj. R-squared:,0.1296
No. Observations:,428,F-statistic:,8.1407
Date:,"Fri, Jun 10 2022",P-value (F-stat),0.0000
Time:,15:56:39,Distribution:,"F(3,424)"
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Intercept,0.0481,0.4003,0.1202,0.9044,-0.7388,0.8350
I(exper**2),-0.0009,0.0004,-2.2380,0.0257,-0.0017,-0.0001
exper,0.0442,0.0134,3.2883,0.0011,0.0178,0.0706
educ,0.0614,0.0314,1.9530,0.0515,-0.0004,0.1232


In [125]:
mroz_nona['resid_iv'] = reteduc_155.resids

reteduc_aux = smf.ols('resid_iv ~ exper + I(exper**2) + fatheduc + motheduc', data = mroz_nona).fit() # regressão auxiliar para verificar exogeneidade
print(reteduc_aux.summary())

                            OLS Regression Results                            
Dep. Variable:               resid_iv   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                 -0.009
Method:                 Least Squares   F-statistic:                   0.09350
Date:                Fri, 10 Jun 2022   Prob (F-statistic):              0.984
Time:                        16:01:14   Log-Likelihood:                -436.70
No. Observations:                 428   AIC:                             883.4
Df Residuals:                     423   BIC:                             903.7
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept         0.0110      0.141      0.078

In [128]:
# calculations for test:
import scipy.stats as stats

r2 = reteduc_aux.rsquared
n = reteduc_aux.nobs
teststat = n * r2
pval = 1 - stats.chi2.cdf(teststat, 1)
pval

0.5386372330714363

In [131]:
# OU PODEMOS COMPUTAR AUTOMATICAMENTE COM O TESTE DE SARGAN

print(reteduc_155.sargan) # Aceitamos a hipótese de que todos os instrumentos são de fato exógenas e o modelos não está sobre-identificado
                          # NÃO CORRELACIONADAS COM O ERRO ESTRUTURAL

                          ### VALE LEMBRAR QUE O TESTE DE SARGAN SÓ VALE SE TIVERMOS MAIS INSTRUMENTOS DO QUE VARIÁVEIS ENDÓGENAS

Sargan's test of overidentification
H0: The model is not overidentified.
Statistic: 0.3781
P-value: 0.5386
Distributed: chi2(1)


## Example 15.10: Job Training and Worker Productivity

In [136]:
jtrain = wdg.data('jtrain')
jtrain.head()

Unnamed: 0,year,fcode,employ,sales,avgsal,scrap,rework,tothrs,union,grant,...,grant_1,clscrap,cgrant,clemploy,clsales,lavgsal,clavgsal,cgrant_1,chrsemp,clhrsemp
0,1987,410032.0,100.0,47000000.0,35000.0,,,12.0,0,0,...,0,,0,,,10.463103,,,,
1,1988,410032.0,131.0,43000000.0,37000.0,,,8.0,0,0,...,0,,0,0.270027,-0.088949,10.518673,0.05557,0.0,-8.946565,-1.165385
2,1989,410032.0,123.0,49000000.0,39000.0,,,8.0,0,0,...,0,,0,-0.063013,0.130621,10.571317,0.052644,0.0,0.198597,0.047832
3,1987,410440.0,12.0,1560000.0,10500.0,,,12.0,0,0,...,0,,0,,,9.25913,,,,
4,1988,410440.0,13.0,1970000.0,11000.0,,,12.0,0,0,...,0,,0,0.080043,0.233347,9.305651,0.04652,0.0,0.0,0.0


In [147]:
# define panel data (for 1987 and 1988 only):
jtrain_87_88 = jtrain.loc[(jtrain['year'] == 1987) | (jtrain['year'] == 1988), :]
jtrain_87_88 = jtrain_87_88.set_index(['fcode', 'year'])

# manual computation of deviations of entity means:
jtrain_87_88['lscrap_diff1'] =  jtrain_87_88.sort_values(['fcode', 'year']).groupby('fcode')['lscrap'].diff()
jtrain_87_88['hrsemp_diff1'] = jtrain_87_88.sort_values(['fcode', 'year']).groupby('fcode')['hrsemp'].diff()
jtrain_87_88['grant_diff1'] = jtrain_87_88.sort_values(['fcode', 'year']).groupby('fcode')['grant'].diff()

# removing NAs that make IV unavailable
jtrain_87_88.dropna(subset=['lscrap_diff1', 'hrsemp_diff1', 'grant_diff1'], inplace=True)

In [145]:
iv_jtrain = iv.IV2SLS.from_formula('lscrap_diff1 ~ 1 + [hrsemp_diff1 ~ grant_diff1]', data = jtrain_87_88).fit(cov_type='unadjusted', debiased=True)
print(iv_jtrain.summary)

                          IV-2SLS Estimation Summary                          
Dep. Variable:           lscrap_diff1   R-squared:                      0.0159
Estimator:                    IV-2SLS   Adj. R-squared:                -0.0070
No. Observations:                  45   F-statistic:                    3.1977
Date:                Fri, Jun 10 2022   P-value (F-stat)                0.0808
Time:                        16:46:31   Distribution:                  F(1,43)
Cov. Estimator:            unadjusted                                         
                                                                              
                              Parameter Estimates                               
              Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
--------------------------------------------------------------------------------
Intercept       -0.0327     0.1270    -0.2573     0.7982     -0.2887      0.2234
hrsemp_diff1    -0.0142     0.0079    -1.788

In [146]:
print(iv_jtrain.first_stage)

     First Stage Estimation Results    
                           hrsemp_diff1
---------------------------------------
R-squared                        0.3408
Partial R-squared                0.3408
Shea's R-squared                 0.3408
Partial F-statistic              22.232
P-value (Partial F-stat)      2.559e-05
Partial F-stat Distn            F(1,43)
Intercept                        1.5806
                               (0.4962)
grant_diff1                      24.437
                               (4.7151)
---------------------------------------

T-stats reported in parentheses
T-stats use same covariance type as original model
