### 1. Weather model

For this assignment, you'll revisit the historical temperature dataset. To complete this assignment, submit a link a Jupyter notebook containing your solutions to the following tasks:

* First, load the dataset from the **weatherinszeged** table from Thinkful's database.
* Like in the previous checkpoint, build a linear regression model where your target variable is the difference between the *apparenttemperature* and the *temperature*. As explanatory variables, use *humidity* and *windspeed*. Now, estimate your model using OLS. What are the R-squared and adjusted R-squared values? Do you think they are satisfactory? Why? 
* Next, include the interaction of *humidity* and *windspeed* to the model above and estimate the model using OLS. Now, what is the R-squared of this model? Does this model improve upon the previous one? 
* Add *visibility* as an additional explanatory variable to the first model and estimate it. Did R-squared increase? What about adjusted R-squared? Compare the differences put on the table by the interaction term and the *visibility* in terms of the improvement in the adjusted R-squared. Which one is more useful?
* Choose the best one from the three models above with respect to their AIC and BIC scores. Validate your choice by discussing your justification with your mentor.

In [5]:
import math
import numpy as ny
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import scipy.stats as stats
from sqlalchemy import create_engine
from sklearn import linear_model
import statsmodels.api as sm
import warnings 

warnings.filterwarnings('ignore')




In [6]:
#data base credentials 
postgres_user = 'dsbc_student'
postgres_pw = '7*.8G9QH21'
postgres_host = '142.93.121.174'
postgres_port = '5432'
postgres_db = 'weatherinszeged'

#create engine to access database
engine = create_engine('postgresql://{}:{}@{}:{}/{}'.format(
postgres_user, postgres_pw, postgres_host, postgres_port, postgres_db
))

#creat dataframe from data 
weather_df = pd.read_sql_query('select * from weatherinszeged', con=engine)

#dispose of engine
engine.dispose() 
#As in earlier assignments, be sure to close the database connection 
# after initially pulling in your data.



In [7]:
weather_df['diff_in_appar_temp'] = weather_df['apparenttemperature']-weather_df['temperature']

In [8]:
Y = weather_df['diff_in_appar_temp']

X = weather_df[['windspeed','humidity']]

In [9]:
X = sm.add_constant(X)

results = sm.OLS(Y,X).fit()

results.summary()

0,1,2,3
Dep. Variable:,diff_in_appar_temp,R-squared:,0.288
Model:,OLS,Adj. R-squared:,0.288
Method:,Least Squares,F-statistic:,19490.0
Date:,"Mon, 27 Jan 2020",Prob (F-statistic):,0.0
Time:,15:23:04,Log-Likelihood:,-170460.0
No. Observations:,96453,AIC:,340900.0
Df Residuals:,96450,BIC:,340900.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,2.4381,0.021,115.948,0.000,2.397,2.479
windspeed,-0.1193,0.001,-176.164,0.000,-0.121,-0.118
humidity,-3.0292,0.024,-126.479,0.000,-3.076,-2.982

0,1,2,3
Omnibus:,3935.747,Durbin-Watson:,0.267
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4613.311
Skew:,-0.478,Prob(JB):,0.0
Kurtosis:,3.484,Cond. No.,88.1


R squared is 0.288 and R adjusted is the same this is quite a low Rsquared value the model only acounts for 28% of the variance 

In [10]:
V = weather_df['diff_in_appar_temp']

weather_df['wetnessofwind'] = weather_df['windspeed']*weather_df['humidity']

W = weather_df['wetnessofwind']
W = sm.add_constant(W)

results = sm.OLS(V,W).fit()

results.summary()

0,1,2,3
Dep. Variable:,diff_in_appar_temp,R-squared:,0.315
Model:,OLS,Adj. R-squared:,0.315
Method:,Least Squares,F-statistic:,44260.0
Date:,"Mon, 27 Jan 2020",Prob (F-statistic):,0.0
Time:,15:24:37,Log-Likelihood:,-168610.0
No. Observations:,96453,AIC:,337200.0
Df Residuals:,96451,BIC:,337200.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.3511,0.008,43.171,0.000,0.335,0.367
wetnessofwind,-0.1870,0.001,-210.380,0.000,-0.189,-0.185

0,1,2,3
Omnibus:,2785.669,Durbin-Watson:,0.258
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4041.543
Skew:,-0.307,Prob(JB):,0.0
Kurtosis:,3.793,Cond. No.,16.8


the R squared Value increases slightly but hardly much this model cold be improved 

In [11]:
weather_df.columns

Index(['date', 'summary', 'preciptype', 'temperature', 'apparenttemperature',
       'humidity', 'windspeed', 'windbearing', 'visibility', 'loudcover',
       'pressure', 'dailysummary', 'diff_in_appar_temp', 'wetnessofwind'],
      dtype='object')

In [12]:
V = weather_df['diff_in_appar_temp']

weather_df['wetnessofwind'] = weather_df['windspeed']*weather_df['humidity']

W = weather_df[['wetnessofwind','visibility']]
W = sm.add_constant(W)

results = sm.OLS(V,W).fit()

results.summary()

0,1,2,3
Dep. Variable:,diff_in_appar_temp,R-squared:,0.346
Model:,OLS,Adj. R-squared:,0.346
Method:,Least Squares,F-statistic:,25570.0
Date:,"Mon, 27 Jan 2020",Prob (F-statistic):,0.0
Time:,15:26:20,Log-Likelihood:,-166310.0
No. Observations:,96453,AIC:,332600.0
Df Residuals:,96450,BIC:,332700.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.4049,0.014,-29.829,0.000,-0.432,-0.378
wetnessofwind,-0.1850,0.001,-213.084,0.000,-0.187,-0.183
visibility,0.0716,0.001,68.672,0.000,0.070,0.074

0,1,2,3
Omnibus:,3365.218,Durbin-Watson:,0.287
Prob(Omnibus):,0.0,Jarque-Bera (JB):,5580.918
Skew:,-0.313,Prob(JB):,0.0
Kurtosis:,3.998,Cond. No.,42.5


In [14]:
V = weather_df['diff_in_appar_temp']

weather_df['wetnessofwind'] = weather_df['windspeed']*weather_df['humidity']

W = weather_df[['windspeed','humidity','visibility']]
W = sm.add_constant(W)

results = sm.OLS(V,W).fit()

results.summary()

0,1,2,3
Dep. Variable:,diff_in_appar_temp,R-squared:,0.304
Model:,OLS,Adj. R-squared:,0.303
Method:,Least Squares,F-statistic:,14010.0
Date:,"Mon, 27 Jan 2020",Prob (F-statistic):,0.0
Time:,15:27:53,Log-Likelihood:,-169380.0
No. Observations:,96453,AIC:,338800.0
Df Residuals:,96449,BIC:,338800.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1.5756,0.028,56.605,0.000,1.521,1.630
windspeed,-0.1199,0.001,-179.014,0.000,-0.121,-0.119
humidity,-2.6066,0.025,-102.784,0.000,-2.656,-2.557
visibility,0.0540,0.001,46.614,0.000,0.052,0.056

0,1,2,3
Omnibus:,3833.895,Durbin-Watson:,0.282
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4584.022
Skew:,-0.459,Prob(JB):,0.0
Kurtosis:,3.545,Cond. No.,131.0


the R squared Value increased with the addition of visibility to the interaction model we have a larger R squared of 0.346 but still not above 50%

the BIC and AIC scores are lowest for the interaction model with visibility 