# Neurel Net Models

Notebook for attempting various neural networks. We will attempt a CNN, RNN and LSTM neural net, all approaches that are fairly popular for time series data. Each neural net will be fit with a single hidden layer of size 64 (approximately the mean of our 103 X variables and 24 y variables) and we will use a dropout layer of 0.5 when to avoid overfitting.

## Problem Statement

Predict electricity prices in Spain for each hour of the upcoming day more accurately than estimates provided by the Spanish transmission agent and operator. 

Use information available during the 2pm-3pm window the previous day during which generators in Spain submit their bids. 

## Contents

### Imports

In [2]:
# General Imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# General modeling imports
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.preprocessing import StandardScaler

# Neural Net imports
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, GRU, Dropout
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator

In [4]:
df = pd.read_csv('../Data/Analysis/model_data.csv')
visuals = pd.read_csv('../Data/intermediary/energy.csv')

# visuals contains the prices attached to their original hour
# and will be used to visualize and compare our predictions
visuals.set_index(pd.DatetimeIndex(visuals['time']), inplace=True)
visuals = visuals[['price_actual','price_day_ahead']]

### Functions Used

In [5]:
# Function for evaluating the regressions
# and outputting a dataframe of different metrics
# for each hour predicted
def reg_metrics(y_train, y_train_p, y_test, y_test_p, mod):
    test_rmse = np.sqrt(((y_test-y_test_p)**(2)).mean())
    train_r2 = metrics.r2_score(y_train, y_train_p, multioutput='raw_values')
    test_r2 =  metrics.r2_score(y_test, y_test_p, multioutput='raw_values')
    metrics_df = pd.DataFrame(data = zip(test_rmse, train_r2, test_r2),
                              columns=[mod+'test_rmse',mod+'train_r2',mod+'test_r2'])
    return metrics_df

# function to convert predictions into dataframe for plotting
def append_preds(preds, previous_preds, name):
    new_preds = pd.DataFrame(np.ravel(preds),columns=[name], index=previous_preds.index)
    return previous_preds.join(new_preds)

### Prepare Data

In [6]:
# Set up data frame for modeling
# Drop time column
df.drop(columns=['time'], inplace=True)
# set index as date
df.set_index(pd.DatetimeIndex(df['date']), inplace=True)
# sort index
df.sort_index(inplace=True)
# drop hour of day and date column
df.drop(columns=['hour_of_day','date'], inplace=True)

# Get columns for y
y_cols = [col for col in df.columns if col.startswith('t_price')]

# Set X and y
X = df.drop(columns=y_cols)
y = df[y_cols]

# Train test split
X_train, X_test, y_train, y_test = train_test_split(X,y,
                                                    shuffle=False)
# Need validation set as well for neural networks
X_train, X_val, y_train, y_val = train_test_split(X_train,y_train,
                                                  shuffle=False,
                                                  test_size=0.2)

# Scaled data for KNeighbors
ss = StandardScaler()
Z_train = ss.fit_transform(X_train)
Z_val = ss.transform(X_val)
Z_test = ss.transform(X_test)

In [36]:
# Create training sequences
train_sequences = TimeseriesGenerator(Z_train, 
                                      y_train, 
                                      length=7, 
                                      batch_size=16)

# Create val sequences
val_sequences = TimeseriesGenerator(Z_val, 
                                      y_val, 
                                      length=7, 
                                      batch_size=16)

# Create test sequences
test_sequences = TimeseriesGenerator(Z_test, 
                                      y_test, 
                                      length=7, 
                                      batch_size=16)

In [37]:
X_train.shape

(875, 103)

### RNN

In [38]:
# Design RNN
model = Sequential()
model.add(GRU(64, input_shape=(7, 103), 
              return_sequences=True)) 
model.add(GRU(64, return_sequences=False))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(24, activation='linear'))
model.compile(loss='mse', optimizer='adam')

In [39]:
rnn_hist = model.fit(train_sequences,
                     epochs=10,
                     validation_data=val_sequences,
                     verbose=0)

model.save('./rnn.h5')

KeyError: 7