In this exercise, you'll work with the historical temperature data from the previous checkpoint. To complete this assignment, submit a link to a Jupyter notebook containing your solutions to the following tasks:

First, load the dataset from the weatherinszeged table from Thinkful's database.
Build a linear regression model where your target variable is the difference between the apparenttemperature and the temperature. As explanatory variables, use humidity and windspeed. Now, estimate your model using OLS. Are the estimated coefficients statistically significant? Are the signs of the estimated coefficients in line with your previous expectations? Interpret the estimated coefficients. What are the relations between the target and the explanatory variables?
Next, include the interaction of humidity and windspeed to the model above and estimate the model using OLS. Are the coefficients statistically significant? Did the signs of the estimated coefficients for humidity and windspeed change? Interpret the estimated coefficients.

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn import linear_model
from sqlalchemy import create_engine
import warnings

warnings.filterwarnings('ignore')

postgres_user = 'dsbc_student'
postgres_pw = '7*.8G9QH21'
postgres_host = '142.93.121.174'
postgres_port = '5432'
postgres_db = 'weatherinszeged'
table_name = 'weatherinszeged'

engine = create_engine('postgresql://{}:{}@{}:{}/{}'.format(
    postgres_user, postgres_pw, postgres_host, postgres_port, postgres_db))

weather_df = pd.read_sql_query('select * from weatherinszeged',con=engine)

engine.dispose()

In [6]:
weather_df["temp_diff"] = weather_df.apparenttemperature - weather_df.temperature

Y = weather_df['temp_diff']
X = weather_df[['humidity','windspeed']]

X = sm.add_constant(X)

# We fit an OLS model using statsmodels
results = sm.OLS(Y, X).fit()

# We print the summary results.
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:              temp_diff   R-squared:                       0.288
Model:                            OLS   Adj. R-squared:                  0.288
Method:                 Least Squares   F-statistic:                 1.949e+04
Date:                Fri, 24 Jan 2020   Prob (F-statistic):               0.00
Time:                        14:33:15   Log-Likelihood:            -1.7046e+05
No. Observations:               96453   AIC:                         3.409e+05
Df Residuals:                   96450   BIC:                         3.409e+05
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.4381      0.021    115.948      0.0

This model states that the difference between apparent and actual temperate is 2.4 degrees (apparent being 2.4 degrees warmer on average), minus 3 degrees for every point of humidity and minus .1 degrees for every additional unit of windspeed. All of the p values are 0, meaning that these features are significant. I would have thought that additional humidity would have increased the apparent temperature instead of decreasing it. The fact that windspeed decreases apparent temperature is in line with my expectations. 

In [7]:
weather_df["hum_wind"] = weather_df.humidity * weather_df.windspeed

Y = weather_df['temp_diff']
X = weather_df[['humidity','windspeed', 'hum_wind']]

X = sm.add_constant(X)

# We fit an OLS model using statsmodels
results = sm.OLS(Y, X).fit()

# We print the summary results.
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:              temp_diff   R-squared:                       0.341
Model:                            OLS   Adj. R-squared:                  0.341
Method:                 Least Squares   F-statistic:                 1.666e+04
Date:                Fri, 24 Jan 2020   Prob (F-statistic):               0.00
Time:                        14:43:08   Log-Likelihood:            -1.6669e+05
No. Observations:               96453   AIC:                         3.334e+05
Df Residuals:                   96449   BIC:                         3.334e+05
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0839      0.033      2.511      0.0

In this model, the constant is near zero and not statistially significant, which makes sense. Humidity and Windspeed though have changed signs (both are still statistically significant) and are now both contributing to increasing the apparent temperature. The interacting variable though (also significant) is much larger and decreases the apparent temperature. I'm thinking that this means that on days when there is low humidity and high winds, it feels hotter. On days when there is high humidity and low wind, it feels hotter. But on days when there is high humidity and high wind it feels a bit cooler than the actual temperature. 