# OLS - Wooldridge Computer Exercise
## Chapter 3, Exercise 2

## To add a heading:
- Insert a new cell
- Type or paste-in content
- Place a single / just one "pound-sign" in front of the heading content
- Select "Markdown"
- Press "Shift", "Enter" at same time to convert to clean commentary

## To add a sub-heading:
- Insert a new cell
- Type or paste-in content
- Place two "pound-signs" in front of the sub-heading
- Select "Markdown"
- Press "Shift", "Enter" at same time to convert to clean commentary

## To add new bulleted documentation:

- Insert a new cell
- Type or paste-in content
- Place a "dash" character in front of the bulleted content
- Select "Markdown"
- Press "Shift", "Enter" at same time to convert to clean commentary

# References
- Wooldridge, J.M. (2016). Introductory econometrics: A modern approach (6thed.). Mason, OH: South-Western, Cengage Learning.
- Residual Plots: https://medium.com/@emredjan/emulating-r-regression-plots-in-python-43741952c034
- Understanding residual plots: https://data.library.virginia.edu/diagnostic-plots/
- Objects available in our regression results object: https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html

# Instantiate libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import statsmodels
import statsmodels.api as sm
import statsmodels.stats.api as sms

from statsmodels.formula.api import ols
from statsmodels.compat import lzip

from statsmodels.graphics.gofplots import ProbPlot

#import pandas.tseries.api as sm
#from tseries.formula.apt import ols

from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr


plt.style.use('seaborn') # pretty matplotlib plots

plt.rc('font', size=14)
plt.rc('figure', titlesize=18)
plt.rc('axes', labelsize=15)
plt.rc('axes', titlesize=18)

# Latex markup language 
from IPython.display import Latex

# Data Read from csv

In [2]:
%%time
#df = pd.read_csv(BytesIO(csv_as_bytes),sep='|',nrows=100000)
df1 = pd.read_csv('C://Users//a1000391//Desktop//Machine Learning Lab//Pandas//firepit-master//HPRICE1.csv',sep=',')
print(df1.head())

   price  assess  bdrms  lotsize  sqrft  colonial    lprice   lassess  \
0  300.0   349.1      4     6126   2438         1  5.703783  5.855359   
1  370.0   351.5      3     9903   2076         1  5.913503  5.862210   
2  191.0   217.7      3     5200   1374         0  5.252274  5.383118   
3  195.0   231.8      3     4600   1448         1  5.273000  5.445875   
4  373.0   319.1      4     6095   2514         1  5.921578  5.765504   

   llotsize    lsqrft  
0  8.720297  7.798934  
1  9.200593  7.638198  
2  8.556414  7.225482  
3  8.433811  7.277938  
4  8.715224  7.829630  
Wall time: 19 ms


In [3]:
df1['constant'] = 1

# Data Checks
- Columns

In [4]:
%%time
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88 entries, 0 to 87
Data columns (total 11 columns):
price       88 non-null float64
assess      88 non-null float64
bdrms       88 non-null int64
lotsize     88 non-null int64
sqrft       88 non-null int64
colonial    88 non-null int64
lprice      88 non-null float64
lassess     88 non-null float64
llotsize    88 non-null float64
lsqrft      88 non-null float64
constant    88 non-null int64
dtypes: float64(6), int64(5)
memory usage: 7.6 KB
Wall time: 5.98 ms


### Estimate: $price = \alpha + \beta_{1}sqrft + \beta_{2}bdrms + \mu$

In [5]:
formula = '''price ~ sqrft + bdrms
'''
#model = ols(formula, df).fit(cov_type='HC0')
model = ols(formula, df1)
results = model.fit()
aov_table = statsmodels.stats.anova.anova_lm(results, typ=2)
print(aov_table)
print(results.summary())

                 sum_sq    df          F        PR(>F)
sqrft     343066.057841   1.0  86.313500  1.393748e-14
bdrms      10208.077786   1.0   2.568295  1.127350e-01
Residual  337845.355171  85.0        NaN           NaN
                            OLS Regression Results                            
Dep. Variable:                  price   R-squared:                       0.632
Model:                            OLS   Adj. R-squared:                  0.623
Method:                 Least Squares   F-statistic:                     72.96
Date:                Thu, 30 Jan 2020   Prob (F-statistic):           3.57e-19
Time:                        15:47:29   Log-Likelihood:                -488.00
No. Observations:                  88   AIC:                             982.0
Df Residuals:                      85   BIC:                             989.4
Df Model:                           2                                         
Covariance Type:            nonrobust                                

### Questions: iii - What is the estimated increase in home price for a house with 1 additional bedroom that is 140 square feet in size?

In [13]:
coefs = results.params
print(coefs)

Intercept   -19.314996
sqrft         0.128436
bdrms        15.198191
dtype: float64


In [18]:
sqrft = coefs.sqrft
bdrms = coefs.bdrms
sqrft_140 = sqrft*140
price_add = bdrms+sqrft_140
print("The coefficient for square feet is: " + str(sqrft))
print("The coefficient for bedrooms is: " + str(bdrms))
print("The addition to home price due to square feet is: $" + str(sqrft_140*1000))
print("The additional value is: $" + str(price_add*1000))

The coefficient for square feet is: 0.1284362104196576
The coefficient for bedrooms is: 15.198191084137807
The addition to home price due to square feet is: $17981.069458752063
The additional value is: $33179.26054288987


### Questions: v & vi

In [22]:
pred_obs = results.fittedvalues
resids = results.resid
price = df1.price
actual_sqrft = df1.sqrft
actual_bdrms = df1.bdrms

print('Actual selling price:')
print(price[0:1])
print(' ')
print('Predicted selling price:')
print(pred_obs[0:1])
print(' ')
print('Residual:')
print(resids[0:1])
print(' ')
print('Another way to compute estimated sell price using our Xs and our regression coefficients:')
pred_price = coefs.Intercept + coefs.bdrms*actual_bdrms[0:1] + coefs.sqrft*actual_sqrft[0:1]
print(pred_price)


Actual selling price:
0    300.0
Name: price, dtype: float64
 
Predicted selling price:
0    354.605249
dtype: float64
 
Residual:
0   -54.605249
dtype: float64
 
Another way to compute estimated sell price using our Xs and our regression coefficients:
0    354.605249
dtype: float64


### What other objects are contained in the "results" object?

In [19]:
dir(results)

['HC0_se',
 'HC1_se',
 'HC2_se',
 'HC3_se',
 '_HCCM',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_cache',
 '_data_attr',
 '_get_robustcov_results',
 '_is_nested',
 '_wexog_singular_values',
 'aic',
 'bic',
 'bse',
 'centered_tss',
 'compare_f_test',
 'compare_lm_test',
 'compare_lr_test',
 'condition_number',
 'conf_int',
 'conf_int_el',
 'cov_HC0',
 'cov_HC1',
 'cov_HC2',
 'cov_HC3',
 'cov_kwds',
 'cov_params',
 'cov_type',
 'df_model',
 'df_resid',
 'diagn',
 'eigenvals',
 'el_test',
 'ess',
 'f_pvalue',
 'f_test',
 'fittedvalues',
 'fvalue',
 'get_influence',
 'get_prediction',
 'get_robustcov_results',
 'initialize',
 'k_constant',
 'llf',
 'load',
 'model',
