# H2O AutoML 

This notebook applies H2O AutoML on energy usage prediction problem. H2O’s AutoML can be used for automating the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit.

* Check the link for more information on H2O AutoML [H2O AutoML documentation](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html)

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import h2o
print(h2o.__version__)
from h2o.automl import H2OAutoML

h2o.init(max_mem_size='16G')

In [None]:
#upload the data 
df = pd.read_csv('../input/hourly-energy-consumption/PJME_hourly.csv')
df.head()

It can be confirmed the dataset has two columns: datetime and consumed energy 

In [None]:
#check for missing values
df.isnull().sum()

There's no missing values in the data

In [None]:
color_pal = ["#F8766D", "#D39200", "#93AA00", "#00BA38", "#00C19F", "#00B9E3", "#619CFF", "#DB72FB"]
_ = df.plot(style='.', figsize=(15,5), color=color_pal[0], title='PJM East')

## Feature engineering

In [None]:
#add time based new features
df['date'] = pd.to_datetime(df['Datetime'])
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['hour'] = df['date'].dt.hour
df['weekday'] = df['date'].dt.weekday
df['weekend'] = df['weekday'].isin([5,6]).astype(int)

In [None]:
## cyclic transformation on hour
df['hour_sin'] = np.sin(2 * np.pi * df['hour']/23.0)
df['hour_cos'] = np.cos(2 * np.pi * df['hour']/23.0)
## cyclic transformation on date 
df['date_sin'] = -np.sin(2 * np.pi * (df['month']+df['day']/31)/12)
df['date_cos'] = -np.cos(2 * np.pi * (df['month']+df['day']/31)/12)
## cyclic transformation on month
df['month_sin'] = -np.sin(2 * np.pi * df['month']/12.0)
df['month_cos'] = -np.cos(2 * np.pi * df['month']/12.0)
## cyclic transformation on weekday
df['weekday_sin'] = -np.sin(2 * np.pi * (df['weekday']+1)/7.0)
df['weekday_cos'] = -np.cos(2 * np.pi * (df['weekday']+1)/7.0)

In [None]:
df

In [None]:
#drop unnecessary columns 
df.drop(['Datetime', 'hour'], axis=1, inplace=True)

In [None]:
#Split the data into train and test. The last two months data is left for testing.
df_train = df[df['date'] <= '2017-10-30 00:00:00'].reset_index(drop=True)
df_test = df[df['date'] > '2017-10-30 00:00:00'].reset_index(drop=True)

In [None]:
#convert the data into h2o frame
train = h2o.H2OFrame(df_train)
test = h2o.H2OFrame(df_test)

In [None]:
#set x and y
x = train.columns
y = 'PJME_MW'
x.remove(y)

Build H2OAutoML model and start training

In [None]:
aml = H2OAutoML(max_runtime_secs = 3500, seed = 1, project_name = "PJME_MW")
aml.train(x = x, y = y, training_frame = train)

In [None]:
#extract the leaderboard 
lb = aml.leaderboard
lb.head()

In [None]:
# The leader model is stored here
aml.leader

## Prediction with the leader model

In [None]:
#prepare the test data
test_x = test.drop('PJME_MW', axis=1)

In [None]:
#make predictions on the test data
pred = aml.predict(test_x)
pred.head()

In [None]:
#make comparison df
result_comparison = test[['date', 'PJME_MW']]
result_comparison['predictions'] = pred
#convert h2o df into pandas df
result_comparison = h2o.as_list(result_comparison)
result_comparison["date"] = pd.to_datetime(result_comparison["date"],unit='ms')

In [None]:
result_comparison.head(20)

In [None]:
#save the leader model 
h2o.save_model(aml.leader, path = "./my_h2o_leader_model")

## Thank you 