# OLS - Wooldridge Computer Exercise
## Chapter 2, Exercise 9

## To add a heading:
- Insert a new cell
- Type or paste-in content
- Place a single / just one "pound-sign" in front of the heading content
- Select "Markdown"
- Press "Shift", "Enter" at same time to convert to clean commentary

## To add a sub-heading:
- Insert a new cell
- Type or paste-in content
- Place two "pound-signs" in front of the sub-heading
- Select "Markdown"
- Press "Shift", "Enter" at same time to convert to clean commentary

## To add new bulleted documentation:

- Insert a new cell
- Type or paste-in content
- Place a "dash" character in front of the bulleted content
- Select "Markdown"
- Press "Shift", "Enter" at same time to convert to clean commentary

# References
- Wooldridge, J.M. (2016). Introductory econometrics: A modern approach (6thed.). Mason, OH: South-Western, Cengage Learning.
- Residual Plots: https://medium.com/@emredjan/emulating-r-regression-plots-in-python-43741952c034
- Understanding residual plots: https://data.library.virginia.edu/diagnostic-plots/

# Instantiate libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import statsmodels
import statsmodels.api as sm
import statsmodels.stats.api as sms

from statsmodels.formula.api import ols
from statsmodels.compat import lzip

from statsmodels.graphics.gofplots import ProbPlot

#import pandas.tseries.api as sm
#from tseries.formula.apt import ols

from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr


plt.style.use('seaborn') # pretty matplotlib plots

plt.rc('font', size=14)
plt.rc('figure', titlesize=18)
plt.rc('axes', labelsize=15)
plt.rc('axes', titlesize=18)

# Latex markup language 
from IPython.display import Latex


# Data Read from csv

In [2]:
%%time
#df = pd.read_csv(BytesIO(csv_as_bytes),sep='|',nrows=100000)
df1 = pd.read_stata('C://Users//Family//Documents//DataSetEconomics//Wooldridge//catholic.dta')
print(df1.head())

         id     read12     math12  female  asian  hispan  black  motheduc  \
0  124902.0  61.410000  49.770000       0      0       0      0      14.0   
1  124915.0  58.340000  59.840000       0      0       0      0      14.0   
2  124916.0  59.330002  50.380001       1      0       0      0      14.0   
3  124932.0  49.590000  45.029999       1      0       0      0      12.0   
4  124944.0  57.619999  54.259998       1      0       0      0      12.0   

   fatheduc    lfaminc  hsgrad  cathhs  parcath  
0      12.0  10.308952     1.0       0        1  
1      14.0  10.308952     1.0       0        1  
2      11.0  10.308952     1.0       0        1  
3      14.0  10.308952     1.0       0        1  
4      12.0  10.657259     1.0       0        1  
Wall time: 110 ms


In [3]:
df1['constant'] = 1

# Data Checks
- Columns

In [4]:
%%time
df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7430 entries, 0 to 7429
Data columns (total 14 columns):
id          7430 non-null float32
read12      7430 non-null float32
math12      7430 non-null float32
female      7430 non-null int8
asian       7430 non-null int8
hispan      7430 non-null int8
black       7430 non-null int8
motheduc    7430 non-null float32
fatheduc    7430 non-null float32
lfaminc     7430 non-null float32
hsgrad      5970 non-null float64
cathhs      7430 non-null int8
parcath     7430 non-null int8
constant    7430 non-null int64
dtypes: float32(6), float64(1), int64(1), int8(6)
memory usage: 391.8 KB
Wall time: 14 ms


In [None]:
# i. How many students are in the sample? Find the mean / standard deviations of math12 and read12.

In [12]:
print('Number of students in the sample:')
count_students = np.sum(df1['constant'])
print(count_students)
print()

print('Mean of math12')
mean_math12 = np.mean(df1['math12'])
print(mean_math12)
print()

print('Std Deviation of math12')
std_math12 = np.std(df1['math12'])
print(std_math12)
print()

print('Mean of read12')
mean_read12 = np.mean(df1['read12'])
print(mean_read12)
print()

print('Std Deviation of read12')
std_read12 = np.std(df1['read12'])
print(std_read12)


Number of students in the sample:
7430

Mean of math12
52.133609771728516

Std Deviation of math12
9.458475112915039

Mean of read12
51.77239227294922

Std Deviation of read12
9.40711784362793


### ii. Estimate The Equation: $math12 = \beta_{0} + \beta_{1} read12 + \mu$

In [14]:
formula = '''math12 ~ read12
'''
#model = ols(formula, df).fit(cov_type='HC0')
model = ols(formula, df1)
results = model.fit()
aov_table = statsmodels.stats.anova.anova_lm(results, typ=2)
print(aov_table)
print(results.summary())

                 sum_sq      df            F  PR(>F)
read12    335470.111083     1.0  7568.582389     0.0
Residual  329238.932350  7428.0          NaN     NaN
                            OLS Regression Results                            
Dep. Variable:                 math12   R-squared:                       0.505
Model:                            OLS   Adj. R-squared:                  0.505
Method:                 Least Squares   F-statistic:                     7569.
Date:                Tue, 01 Jan 2019   Prob (F-statistic):               0.00
Time:                        18:59:01   Log-Likelihood:                -24627.
No. Observations:                7430   AIC:                         4.926e+04
Df Residuals:                    7428   BIC:                         4.927e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>

### ii. Estimate The Equation: $read12 = \beta_{0} + \beta_{1} math12 + \mu$

In [16]:
formula = '''read12 ~ math12
'''
#model = ols(formula, df).fit(cov_type='HC0')
model = ols(formula, df1)
results = model.fit()
aov_table = statsmodels.stats.anova.anova_lm(results, typ=2)
print(aov_table)
print(results.summary())

                 sum_sq      df            F  PR(>F)
math12    331837.260642     1.0  7568.582389     0.0
Residual  325673.560173  7428.0          NaN     NaN
                            OLS Regression Results                            
Dep. Variable:                 read12   R-squared:                       0.505
Model:                            OLS   Adj. R-squared:                  0.505
Method:                 Least Squares   F-statistic:                     7569.
Date:                Tue, 01 Jan 2019   Prob (F-statistic):               0.00
Time:                        19:00:26   Log-Likelihood:                -24587.
No. Observations:                7430   AIC:                         4.918e+04
Df Residuals:                    7428   BIC:                         4.919e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>