# In-Class Prediction Demo
## Instructions:
#### Run the code below completely through. This file assumes all other files are in the same directory. All of the models are pre-trained and are obtained from "get_pickle." The data is centered to predict the weather on May 19, 2022.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor
import pickle

In [2]:
pickle_model = None
filename = './mean temperature_4_gbtCV.pkl'
with open(filename, 'rb') as file:
    pickle_model = pickle.load(file)
est = pickle_model.best_estimator_

In [3]:
df = pd.read_csv('./wunderground_scraped_data_daily_presentation_clean.csv')
df.head(3)

Unnamed: 0,Date,Time,Temperature,Dew Point,Humidity,Wind Direction,Wind Speed,Wind Gust,Pressure,Precipitation,Condition
0,2022-5-10-12:53 AM,12:53 AM,71,61,70,S,14,0,29.36,0.0,Fair
1,2022-5-10-1:53 AM,1:53 AM,70,61,73,S,15,21,29.37,0.0,Fair
2,2022-5-10-2:53 AM,2:53 AM,69,61,75,S,9,0,29.39,0.0,Fair


In [4]:
df_1 = df.loc[df.Date.str.contains('2022-5-18')]

In [5]:
df_4 = df.loc[df.Date.str.contains('2022-5-15')]

In [7]:
df_7 = df.loc[df.Date.str.contains('2022-5-12')]

In [16]:
def populate_test(df, est, day=4):
    date = 138 - day
    th_altitude = 499
    th_latitude = 39.4697
    out = pd.DataFrame(columns=est.feature_names_in_)
    out['day_of_year'] = [134]
    out['altitude'] = [th_altitude]
    out['latitude'] = [th_latitude]
    
    df.Time = pd.to_datetime(df.Time)
    df['Wind Direction'] = df['Wind Direction'].replace(\
        ['CALM','N','NNW','NW','WNW','W','WSW','SW','SSW','S','SSE','SE','ESE','E','ENE','NE','NNE','VAR'],
        [0, 0, 337.5, 315, 292.5, 270, 247.5, 225, 202.5, 180, 157.5, 135, 112.5, 90, 67.5, 45, 22.5, 0])
    for k in range(24):
        col = f"temperature_0_{k}"
        out[col] = ((df.loc[df.Time.dt.hour == k].Temperature - 32) * (5/9) + 273.15).mean() # Convert to K
        col = f"pressure_0_{k}"
        out[col] = df.loc[df.Time.dt.hour == k].Pressure.mean()
        col = f"wind_speed_0_{k}"
        out[col] = df.loc[df.Time.dt.hour == k]['Wind Speed'].mean()
        col = f"humidity_0_{k}"
        out[col] = df.loc[df.Time.dt.hour == k].Humidity.mean()
        col = f"wind_direction_0_{k}"
        out[col] = df.loc[df.Time.dt.hour == k]['Wind Direction'].mean()
    return out

def k_to_f(pred):
    return (pred - 273.15) * 9/5 + 32

def get_pickle(filename):
    with open(filename, 'rb') as file:
        pickle_model = pickle.load(file)
    return pickle_model.best_estimator_

In [23]:
est = get_pickle('./mean temperature_4_gbtCV.pkl')
print("4-day temperature prediction", k_to_f(est.predict(populate_test(df_4, est, 4))))

4-day temperature prediction [68.67416597]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]


In [24]:
est = get_pickle('./mean temperature_7_gbtCV.pkl')
print("7-day temperature prediction", k_to_f(est.predict(populate_test(df_7, est, 7))))

7-day temperature prediction [58.80086088]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]


In [26]:
est = get_pickle('./mean humidity_4_gbtCV.pkl')
print("4-day humidity prediction", est.predict(populate_test(df_4, est, 4)))

4-day humidity prediction [72.18337495]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]


In [27]:
est = get_pickle('./mean humidity_7_gbtCV.pkl')
print("7-day humidity prediction", est.predict(populate_test(df_7, est, 7)))

7-day humidity prediction [72.10873759]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]


In [25]:
est = get_pickle('./mean temperature_1_gbtCV.pkl')
print("1-day temperature prediction", k_to_f(est.predict(populate_test(df_1, est, 1))))

1-day temperature prediction [61.49510693]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]


In [28]:
est = get_pickle('./mean humidity_1_gbtCV.pkl')
print("1-day humidity prediction", est.predict(populate_test(df_1, est, 1)))

1-day humidity prediction [81.28814374]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]
