# OLS - Wooldridge Computer Exercise
## Chapter 6, Exercise 3

## To add a heading:
- Insert a new cell
- Type or paste-in content
- Place a single / just one "pound-sign" in front of the heading content
- Select "Markdown"
- Press "Shift", "Enter" at same time to convert to clean commentary

## To add a sub-heading:
- Insert a new cell
- Type or paste-in content
- Place two "pound-signs" in front of the sub-heading
- Select "Markdown"
- Press "Shift", "Enter" at same time to convert to clean commentary

## To add new bulleted documentation:

- Insert a new cell
- Type or paste-in content
- Place a "dash" character in front of the bulleted content
- Select "Markdown"
- Press "Shift", "Enter" at same time to convert to clean commentary

# References
- Wooldridge, J.M. (2016). Introductory econometrics: A modern approach (6thed.). Mason, OH: South-Western, Cengage Learning.
- Residual Plots: https://medium.com/@emredjan/emulating-r-regression-plots-in-python-43741952c034
- Understanding residual plots: https://data.library.virginia.edu/diagnostic-plots/
- VIF: https://etav.github.io/python/vif_factor_python.html
- VIF: https://en.wikipedia.org/wiki/Variance_inflation_factor

# Instantiate libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import statsmodels
import statsmodels.api as sm
import statsmodels.stats.api as sms

from statsmodels.formula.api import ols
from statsmodels.compat import lzip
from patsy import dmatrices
from statsmodels.stats.outliers_influence import variance_inflation_factor

from statsmodels.graphics.gofplots import ProbPlot

#import pandas.tseries.api as sm
#from tseries.formula.apt import ols

from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr


plt.style.use('seaborn') # pretty matplotlib plots

plt.rc('font', size=14)
plt.rc('figure', titlesize=18)
plt.rc('axes', labelsize=15)
plt.rc('axes', titlesize=18)


# Data Read from csv

In [2]:
%%time
#df = pd.read_csv(BytesIO(csv_as_bytes),sep='|',nrows=100000)
df1 = pd.read_stata('C://Users//Family//Documents//DataSetEconomics//Wooldridge//WAGE2.dta')
print(df1.head())

   wage  hours   IQ  KWW  educ  exper  tenure  age  married  black  south  \
0   769     40   93   35    12     11       2   31        1      0      0   
1   808     50  119   41    18     11      16   37        1      0      0   
2   825     40  108   46    14     11       9   33        1      0      0   
3   650     40   96   32    12     13       7   32        1      0      0   
4   562     40   74   27    11     14       5   34        1      0      0   

   urban  sibs  brthord  meduc  feduc     lwage  
0      1     1      2.0    8.0    8.0  6.645091  
1      1     1      NaN   14.0   14.0  6.694562  
2      1     1      2.0   14.0   14.0  6.715384  
3      1     4      3.0   12.0   12.0  6.476973  
4      1    10      6.0    6.0   11.0  6.331502  
Wall time: 551 ms


In [3]:
df1['constant'] = 1

# Data Checks
- Columns

In [4]:
%%time
df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 935 entries, 0 to 934
Data columns (total 18 columns):
wage        935 non-null int16
hours       935 non-null int8
IQ          935 non-null int16
KWW         935 non-null int8
educ        935 non-null int8
exper       935 non-null int8
tenure      935 non-null int8
age         935 non-null int8
married     935 non-null int8
black       935 non-null int8
south       935 non-null int8
urban       935 non-null int8
sibs        935 non-null int8
brthord     852 non-null float64
meduc       857 non-null float64
feduc       741 non-null float64
lwage       935 non-null float32
constant    935 non-null int64
dtypes: float32(1), float64(3), int16(2), int64(1), int8(11)
memory usage: 53.9 KB
Wall time: 11 ms


# OLS: Regress lwage on educ, exper, exper^2 

In [4]:

formula = '''lwage ~ educ + exper + (educ*exper)
'''
#model = ols(formula, df).fit(cov_type='HC0')
model = ols(formula, df1)
results = model.fit()
aov_table = statsmodels.stats.anova.anova_lm(results, typ=2)
print(aov_table)
print(results.summary())

                sum_sq     df           F        PR(>F)
educ         21.607355    1.0  140.376716  2.908162e-30
exper         5.539957    1.0   35.991493  2.834572e-09
educ:exper    0.675313    1.0    4.387312  3.647715e-02
Residual    143.303309  931.0         NaN           NaN
                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.135
Model:                            OLS   Adj. R-squared:                  0.132
Method:                 Least Squares   F-statistic:                     48.41
Date:                Thu, 06 Dec 2018   Prob (F-statistic):           4.47e-29
Time:                        19:36:58   Log-Likelihood:                -449.87
No. Observations:                 935   AIC:                             907.7
Df Residuals:                     931   BIC:                             927.1
Df Model:                           3                                         
Covarianc

In [5]:
mean_educ = np.mean(df1.educ)
mean_exper = np.mean(df1.exper)
print(results.params)
coef_educ = results.params["educ"]
coef_exper = results.params["exper"]
coef_educ_exper = results.params["educ:exper"]
effect_educ = coef_educ+(coef_educ_exper*mean_educ)
effect_exper = coef_exper+(coef_educ_exper*mean_exper)
print(mean_educ)
print(mean_exper)
print(coef_educ)
print(coef_exper)
print(coef_educ_exper)
print(effect_educ)
print(effect_exper)

Intercept     5.949455
educ          0.044050
exper        -0.021496
educ:exper    0.003203
dtype: float64
13.468449197860963
11.563636363636364
0.044049796767684085
-0.021495925819032535
0.003202973747224915
0.08718888596426522
0.015542097876150122
