### Interpreting Coefficients

It is important that not only can you fit complex linear models, but that you then know which variables you can interpret. 

In this notebook, you will fit a few different models and use the quizzes below to match the appropriate interpretations to your coefficients when possible.

In some cases, the coefficients of your linear regression models wouldn't be kept due to the lack of significance. But that is not the aim of this notebook - **this notebook is strictly to assure you are comfortable with how to interpret coefficients when they are interpretable at all**.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm;

df = pd.read_csv('./house_prices.csv')
df.head()

  from pandas.core import datetools


Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price
0,1112,B,1188,3,2,ranch,598291
1,491,B,3512,5,3,victorian,1744259
2,5952,B,1134,3,2,ranch,571669
3,3525,A,1940,4,2,ranch,493675
4,5108,B,2208,6,4,victorian,1101539


We will be fitting a number of different models to this dataset throughout this notebook.  For each model, there is a quiz question that will allow you to match the interpretations of the model coefficients to the corresponding values.  If there is no 'nice' interpretation, this is also an option!

### Model 1

`1.` For the first model, fit a model to predict `price` using `neighborhood`, `style`, and the `area` of the home.  Use the output to match the correct values to the corresponding interpretation in quiz 1 below.  Don't forget an intercept!  You will also need to build your dummy variables, and don't forget to drop one of the columns when you are fitting your linear model. It may be easiest to connect your interpretations to the values in the first quiz by creating the baselines as neighborhood C and home style **lodge**.

In [17]:
df[['A', 'B', 'C']] = pd.get_dummies(df['neighborhood'])
df[['lodge', 'ranch', 'victorian']] = pd.get_dummies(df['style'])

In [18]:
df['intercept'] = 1


linear_model_1 = sm.OLS(df['price'], df[['intercept', 'A', 'B',  'ranch', 'victorian', 'area']])
results_1 = linear_model_1.fit()
results_1.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.919
Model:,OLS,Adj. R-squared:,0.919
Method:,Least Squares,F-statistic:,13720.0
Date:,"Fri, 19 Feb 2021",Prob (F-statistic):,0.0
Time:,05:11:00,Log-Likelihood:,-80348.0
No. Observations:,6028,AIC:,160700.0
Df Residuals:,6022,BIC:,160700.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,-1.983e+05,5540.744,-35.791,0.000,-2.09e+05,-1.87e+05
A,-194.2464,4965.459,-0.039,0.969,-9928.324,9539.832
B,5.243e+05,4687.484,111.844,0.000,5.15e+05,5.33e+05
ranch,-1974.7032,5757.527,-0.343,0.732,-1.33e+04,9312.111
victorian,-6262.7365,6893.293,-0.909,0.364,-1.98e+04,7250.586
area,348.7375,2.205,158.177,0.000,344.415,353.060

0,1,2,3
Omnibus:,114.369,Durbin-Watson:,2.002
Prob(Omnibus):,0.0,Jarque-Bera (JB):,139.082
Skew:,0.271,Prob(JB):,6.290000000000001e-31
Kurtosis:,3.509,Cond. No.,11200.0


### Model 2

`2.` Now let's try a second model for predicting price.  This time, use `area` and `area squared` to predict price.  Also use the `style` of the home, but not `neighborhood` this time. You will again need to use your dummy variables, and add an intercept to the model. Use the results of your model to answer quiz questions 2 and 3.

In [14]:
df[['lodge', 'ranch', 'victorian']] = pd.get_dummies(df['style'])

In [16]:
df['intercept'] = 1

df['area_squared'] = df['area'] * df['area']

linear_model_2 = sm.OLS(df['price'], df[['intercept', 'area', 'area_squared', 'ranch', 'victorian', 'lodge']])
result2 = linear_model_2.fit()
result2.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.678
Model:,OLS,Adj. R-squared:,0.678
Method:,Least Squares,F-statistic:,3173.0
Date:,"Fri, 19 Feb 2021",Prob (F-statistic):,0.0
Time:,05:10:19,Log-Likelihood:,-84516.0
No. Observations:,6028,AIC:,169000.0
Df Residuals:,6023,BIC:,169100.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,1.702e+04,1.16e+04,1.465,0.143,-5748.089,3.98e+04
area,334.0146,13.525,24.696,0.000,307.501,360.528
area_squared,0.0029,0.002,1.283,0.199,-0.002,0.007
ranch,1.145e+04,7467.967,1.533,0.125,-3192.682,2.61e+04
victorian,4039.3476,1.04e+04,0.388,0.698,-1.64e+04,2.45e+04
lodge,1529.9519,7151.325,0.214,0.831,-1.25e+04,1.55e+04

0,1,2,3
Omnibus:,375.22,Durbin-Watson:,2.009
Prob(Omnibus):,0.0,Jarque-Bera (JB):,340.688
Skew:,0.519,Prob(JB):,1.05e-74
Kurtosis:,2.471,Cond. No.,6.97e+21
