# Recurrent Neural Network

Let's try throwing a neural network at this problem. (This was my ultimate goal all along.)  We'll give input method the day of the week, time of day, day of the year, and temperature.  This first version uses a single recurrent cell, with a linear layer at the end.  This could be enhanced by making deeper networks at both the beginning and end, using a fancier cell (LSTM, GRU).

The networks will be trained on one year's worth of data, and then tested on the remainder.

This desperately needs some regularization (dropout?), as it is overfitting the training data.

## TODO

- start using RNN module.
- fix the graph to be as desired. 
- add and fix dropout.  (multiple graphs - train/infer)
- extend to allow multiple input/outputs for demand/temp.
- visualization for input weights. 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from util.get_weather_data import convert_isd_to_df, convert_state_isd
from util.EBA_util import remove_na, avg_extremes

%matplotlib inline
%load_ext autoreload
%autoreload 2

import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.rnn import BasicRNNCell,LSTMCell

In [2]:
#Extend to multiple temperature series
try:
    gfg
    df_joint=pd.read_csv('data/pdx_joint.txt',
        index_col=0, parse_dates=True)
    print('Read in PDX Frame from file')
except:
    print('Creating PDX DataFrame from scratch')
    air_df = pd.read_csv('data/air_code_df.gz')
    #Just get the weather station data for cities in Oregon.
    df_weather=convert_state_isd(air_df,'OR')
    #Select temperature for Portland, OR
    #msk1=np.array(df_weather['city']=='Portland')
    msk2=np.array(df_weather['state']=='OR')
    df_pdx_weath=df_weather.loc[msk2]
    #find number of unique station city/state combinations
    Nstation = len(df_pdx_weath['city, state'].unique())

    #reshape the single temperature column into Nstation copies.  
    unique_station=df_pdx_weath['city, state'].unique()
    temp_df=pd.DataFrame()
    for station in unique_station:
        colname=str('Temp-'+station)
        temp_df[colname]=df_pdx_weath.loc[df_pdx_weath['city, state']==station,'Temp']

    #get electricity data for Portland General Electric
    df_eba=pd.read_csv('data/EBA_time.gz',index_col=0,parse_dates=True)
    msk=df_eba.columns.str.contains('Portland')
    df_pdx=df_eba.loc[:,msk]
    #select out demand data
    msk1 = df_pdx.columns.str.contains('[Dd]emand') 
    dem=df_pdx.loc[:,msk1]
    #Make a combined Portland Dataframe for demand vs weather.
    df_joint=pd.DataFrame(dem)
    df_joint=df_joint.join(temp_df)
    df_joint = df_joint.rename(columns={df_joint.columns[0]:'Demand',
             df_joint.columns[1]:'Forecast'})
    df_joint.to_csv('data/pdx_joint.txt')
    
dem=df_joint['Demand'].copy()
temp=df_joint.loc[:,df_joint.columns.str.contains('Temp')].copy()
fore=df_joint['Forecast'].copy()


done with Mahlon Sweet Field


done with Salem Municipal Airport/McNary Field


done with Portland International Airport


Creating PDX DataFrame from scratch


In [3]:
#clean up data, remove NA
#remove NA values, and average extreme values down
for y in [temp,dem]:
    if len(y.shape)>1:
        for i in range(y.shape[1]):
            x= y.iloc[:,i]
            x = remove_na(x)
            y.iloc[:,i] = avg_extremes(x)
    else:
        x= y
        x = remove_na(x)
        y = avg_extremes(x)


Number of extreme values 1. Number of zero values 3


Number of NA values 156


Number of extreme values 0. Number of zero values 143


Number of NA values 126


Number of extreme values 0. Number of zero values 138


Number of NA values 181


Number of extreme values 0. Number of zero values 148


Number of NA values 56


In [66]:
def make_temptime_data(temp_mat):
    """make_input_data
    Takes input temperature data matrix (for multiple locations),
    and extends with extra indices for time of day, day of year, day of week, and holiday. 

    Input: temp_mat - pandas series of temperatures a location.  
    Output: in_mat - scaled matrix of temperatures, and scaled times of day and year.
            temp_max - maximum temperature for series (needed to invert transformations?)
            temp_min - minimum temperature
    """
    Tind = temp_mat.index
    Nt=len(Tind)
    hr = Tind.hour.values/(24-1)
    #scale length of year
    dyear = Tind.dayofyear.values/(365-1+Tind.is_leap_year.astype(int))
    dweek = Tind.dayofweek.values/(7-1)
    #scale temperature data to so that max/min correspond to [0,1]  
    temp_max = temp_mat.max(axis=0)
    temp_min = temp_mat.min(axis=0)
    temp_mat = (temp_mat-temp_min)/(temp_max-temp_min)
    in_mat=np.stack([hr,dweek,dyear]).T
    in_mat= np.hstack([temp_mat.values,in_mat])
    return in_mat, temp_max,temp_min

def scale_demand(dem):
    """scale_demand
    Scale demand to be on 0,1 scale.
    Input: demand - series at single location
    Output: dem_scale - scaled array of values.
            dem_max, dem_min - the maximum and minimum values.
    """
    dem_scale = dem.values
    dem_max = np.max(dem_scale)
    dem_min = np.min(dem_scale)
    dem_scale = (dem_scale-dem_min)/(dem_max-dem_min)
    return dem_scale, dem_max,dem_min

#drop data prior to 
temp_mat,tmax,tmin=make_temptime_data(temp[:Ntest)
dem_mat,dmax,dmin=scale_demand(dem)

Nt=len(dem)
Ntest = Nt//2

temp_train = temp_mat[0:Ntest,:]
temp_test = temp_mat[Ntest:,:]
dem_train = dem_mat[0:Ntest]
dem_test = dem_mat[Ntest:]

So strictly speaking, the scaling should be chosen solely from the training data, and then applied to the testing data. 

In [5]:
temp_mat.shape

(20216, 6)

In [6]:
def get_random_batch(X,y,n_batch,seq_len):
    """get_random_batch(Xsig,t,n_batch)   
    Gets multiple random samples for the data.
    Samples generated by 'get_selection' function.
    Makes list of returned entries.
    Then combines together with 'stack' function at the end.

    X - matrix of inputs, (Nt, Ninputs)
    y - vector of desired outputs (Nt)
    n_batch - number of batches
    seq_len - length of sequence to extract in each batch

    Outputs:
    X_batch - random subset of inputs shape (Nbatch,seq_len,Ninputs) 
    y_batch - corresponding subset of outputs (Nbatch,seq_len)
    """
    Nt,Nin = X.shape
    x_list=[]
    y_list=[]
    for i in range(n_batch):
        n0=int(np.random.random()*(Nt-seq_len-1))
        x_sub = X[n0:n0+seq_len]
        y_sub = y[n0:n0+seq_len]
        x_list.append(x_sub)
        y_list.append(y_sub)
    x_batch=np.stack(x_list,axis=0)
    y_batch=np.stack(y_list,axis=0)
    y_batch=y_batch.reshape( [n_batch,seq_len,-1])                    
    return x_batch,y_batch

Xb,yb=get_random_batch(temp_mat,dem_mat,1000,24)


In [7]:
n_steps=24
n_inputs=len(temp.iloc[0])+3
n_neurons=120
n_layers=3
n_outputs=1  #number of stations to predict at that time.
lr=1E-2
np.random.seed(seed=3453)

In [7]:
def make_RNN_cell(n_neurons,fn=tf.nn.relu):
    cell=BasicRNNCell(num_units=n_neurons,activation=fn)
    return cell

In [19]:
#Initial test with code liberally borrowed from ch14 of Geron's 
#"Practical Machine Learning with scikit-learn and Tensorflow"

#Makes a single RNN cell, with a fully connected output layer (with no activation on the output).

print('setting up graphs:Multi-layer RNN')
tf.reset_default_graph()
#inputs:  Nobs, with n_steps, and n_inputs per step
X = tf.placeholder(tf.float32,[None,n_steps,n_inputs],name='X')
#Outputs: n_outputs we want to predict in the future.
y = tf.placeholder(tf.float32,[None,n_steps,n_outputs],name='y')

#define neural network shape
#works:make a list of them.  
# cell=BasicRNNCell(num_units=n_neurons,activation=tf.nn.relu)

#Make a list of cells to pass along.  
cell_list=[]
for i in range(n_layers):
    cell_list.append(make_RNN_cell(n_neurons,tf.nn.relu))

multi_cell=tf.contrib.rnn.MultiRNNCell(cell_list,state_is_tuple=True)
#Note that using [cell]*n_layers did not work since that copies the memory location, rather than making
#a number of independent copies.
rnn_outputs,states=tf.nn.dynamic_rnn(multi_cell,X,dtype=tf.float32)
#this maps the number of hidden units to fewer outputs.
stacked_rnn_outputs = tf.reshape(rnn_outputs,[-1,n_neurons])
stacked_outputs = fully_connected(stacked_rnn_outputs,n_outputs,activation_fn=None)
outputs=tf.reshape(stacked_outputs,[-1,n_steps,n_outputs])

#define loss (mean-square-error)
loss = tf.reduce_mean(tf.square(outputs-y))
#define optimization function.
optimizer=tf.train.AdamOptimizer(learning_rate=lr)
training_op=optimizer.minimize(loss)
init=tf.global_variables_initializer()

saver = tf.train.Saver()
#Try adding everything by name to a collection to save and restore later
tf.add_to_collection('X',X)
tf.add_to_collection('y',y)
tf.add_to_collection('loss',loss)
tf.add_to_collection('pred',outputs)
tf.add_to_collection('train',training_op)

#compute number correct.
print('Loading data')
n_iter=1000
n_batch=100
run_network=True

if (run_network==True):
    print('Running this thang')
    with tf.Session() as sess:
        init.run()
        for iteration in range(n_iter):
            #select random starting point. 
            X_batch,y_batch=get_random_batch(
                            temp_train, dem_train, n_batch, n_steps)

            sess.run(training_op, feed_dict={X: X_batch, y:y_batch})
            if iteration%50 ==0:
                mse =loss.eval(feed_dict={X:X_batch,y:y_batch})
                print("MSE on batch ",iteration,':\t',mse)
                #save model
                saver.save(sess, "./models/pdx_RNN_model",
                           write_meta_graph=True)

MSE on batch  950 :	 0.00134905


MSE on batch  900 :	 0.00153049


MSE on batch  850 :	 0.00146475


MSE on batch  800 :	 0.00187774


MSE on batch  750 :	 0.00162251


MSE on batch  700 :	 0.00157671


MSE on batch  650 :	 0.00153585


MSE on batch  600 :	 0.00159368


MSE on batch  550 :	 0.00179491


MSE on batch  500 :	 0.00205737


MSE on batch  450 :	 0.00225957


MSE on batch  400 :	 0.00255632


MSE on batch  350 :	 0.0033832


MSE on batch  300 :	 0.00287184


MSE on batch  250 :	 0.00393118


MSE on batch  200 :	 0.00471667


MSE on batch  150 :	 0.00831685


MSE on batch  100 :	 0.0122283


MSE on batch  50 :	 0.0184005


MSE on batch  0 :	 37.4075


Loading data
Running this thang


setting up graphs:Multi-layer RNN


So multiple tanhs are bad.  A couple ReLU layers seem to work well, but do lead to negative predictions.  Note that in comparisons that the early 2015 data is pretty flaky (like the forecasts are zero, and I had to fix multiple issues in the demand data).

In [8]:
def model_predict_whole(Xin,path_str="pdx_RNN_model"):
    """model_predict_whole(tstart)
    Retrieve the outputs of the network for all values of the inputs 
    """
    Nt,Nin=Xin.shape
    nmax = int(Nt/n_steps)
    ytot = np.zeros((Nt,1))
    #Note that loading/saving graph is not properly implemented yet.    
    #reset graph, and reload saved graph
    tf.reset_default_graph()
    model_path = "./models/"+path_str    
    saver = tf.train.import_meta_graph(model_path+".meta")
    #saver=tf.train.import_meta_graph(full_model_name+'.meta')
    #restore graph structure
    X=tf.get_collection('X')[0]
    y=tf.get_collection('y')[0]
    outputs=tf.get_collection('pred')[0]
    train_op=tf.get_collection('train_op')[0]
    loss=tf.get_collection('loss')[0]
    #restores weights etc.
    #saver.restore(sess,full_model_name)
    
    with tf.Session() as sess:

        #restore variables
        saver.restore(sess,model_path)
        for i in range(nmax-1):
            n0=n_steps*i
            x_sub = Xin[n0:n0+n_steps,:]
            x_sub = x_sub.reshape(-1,n_steps,Nin)
            y_pred=sess.run(outputs,feed_dict={X:x_sub})
            #nn_pred=predict_on_batch(sess,X_batch)            
            ytot[n0:n0+n_steps]=y_pred
    return ytot

In [9]:
def plot_whole_sample_fit(X,y,ntest,n_steps,path_str="pdx_RNN_model"):
    """plot_whole_sample_fit

    Plot ALL of the predictions of the trained model
    on a 'test' set with different noise, and longer
    times.  Concatenates the predicted results together.  
    """
    #pull in the inputs, and predictions
    Nt, Nin = X.shape
    ytot=model_predict_whole(X,path_str)
    plt.figure()
    #now plot against the test sets defined earlier
    plt.plot(np.arange(0,ntest),X[:ntest,0],'b',label='Training')
    plt.plot(np.arange(ntest,Nt), X[ntest:,0],'g',label='Test')
    plt.plot(np.arange(Nt),ytot,'r',label='Predicted')
    plt.plot(np.arange(Nt),dem_mat,label='Real')
    plt.legend(loc='right')
    plt.show()
    return ytot

In [10]:
#n0,x_sub,y_pred=toy_predict(2.5)
ytot=plot_whole_sample_fit(temp_mat,dem_mat,Ntest,n_steps,'pdx_RNN_model')

<matplotlib.figure.Figure at 0x7efbb0530ef0>

INFO:tensorflow:Restoring parameters from ./models/pdx_RNN_model


In [12]:
#convert the RNN output to a pandas time-series
pred=pd.Series(((dmax-dmin)*ytot+dmin).reshape(-1),index=dem.index)

In [13]:
def rmse(x,y):
    z = np.sqrt(np.sum((x-y)*(x-y))/len(x))
    return z

def mape(x,y):
    z = np.mean(np.abs((1-x/y)))
    return z

plt.plot(dem['2015-11':],pred['2015-11':],'.')
plt.xlabel('Actual Demand')
plt.ylabel('RNN Prediction')
plt.show()

<matplotlib.figure.Figure at 0x7efbae2fe0f0>

So let's compute some actual figures here: What was the mean error over the training and test periods?

In [14]:
nt = len(ytot)//2
fore_train_rmse=rmse(fore[:nt],dem[:nt])
fore_test_rmse=rmse(fore[nt:],dem[nt:])
pred_train_rmse=rmse(pred[:nt],dem[:nt])
pred_test_rmse=rmse(pred[nt:],dem[nt:])

print("Forecast RMSE in training/test      : {}, {}".format(fore_train_rmse,fore_test_rmse))
print("RNN Prediction RMSE in training/test: {}, {}".format(pred_train_rmse,pred_test_rmse))

Forecast RMSE in training/test      : 174.69476178243696, 84.06011120151884
RNN Prediction RMSE in training/test: 111.7127825772051, 134.4575370601223


Let's also check the mean absolute percentage error, and also compare against a persistence forecast.

In [15]:
fore_train_mape=mape(fore[:nt],dem[:nt])
fore_test_mape=mape(fore[nt:],dem[nt:])

pers_train_mape=mape(dem[:nt-24].values,dem[24:nt].values)
pers_test_mape=mape(dem[nt:-24].values,dem[nt+24:].values)

pred_train_mape=mape(pred[:nt],dem[:nt])
pred_test_mape=mape(pred[nt:],dem[nt:])

print("Forecast MAPE in training/test      : {}, {}".format(fore_train_mape,fore_test_mape))
print("Persistence MAPE in training/test   : {}, {}".format(pers_train_mape,pers_test_mape))
print("RNN Prediction MAPE in training/test: {}, {}".format(pred_train_mape,pred_test_mape))

Forecast MAPE in training/test      : 0.03083559781902836, 0.025407175204897305
Persistence MAPE in training/test   : 0.05668131298764103, 0.05417891849965209
RNN Prediction MAPE in training/test: 0.034385709158925734, 0.03882750694951655


So this simple RNN does worse than the actual forecast, but does out perform persistence.  Well, that's at least something.
Obviously, this can be greatly improved.  The above is a simple toy model, one input station, one output series for the same set of time.
We can play with other architectures, activations, and using more data.


In [14]:
plt.figure(figsize=(10,6))
date_slice=slice('2016-12-20','2017-01-02')
plt.plot(pred[date_slice],label='pred')
plt.plot(dem[date_slice],label='demand')
plt.plot(fore[date_slice],label='fore')
plt.legend(loc='right')
plt.show()

<matplotlib.figure.Figure at 0x7efbac42c080>

In [31]:
pred.tail()

2017-10-20 03:00:00    1492.0
2017-10-20 04:00:00    1492.0
2017-10-20 05:00:00    1492.0
2017-10-20 06:00:00    1492.0
2017-10-20 07:00:00    1492.0
dtype: float64

In [15]:
plt.figure(figsize=(10,6))
date_slice=slice('2017-06-01','2017-08-01')
plt.plot(pred[date_slice]/dem[date_slice]-1,label='pred err')
plt.plot(fore[date_slice]/dem[date_slice]-1,label='fore err')
plt.ylabel('Percentage Error')
plt.legend(loc='right')
plt.show()

<matplotlib.figure.Figure at 0x7efbac4a9b38>

So looking at the percentage errors, this model (which currently lacks knowledge of holidays) is messing up on Thanksgiving.  Also the model seems to make opposite errors to the forecast model.  It's probably worth checking that the distribution of errors.  Eyeballing the curves shows that the errors are lowest early in the morning, and highest at midday.  The error signal probably has a significant daily frequency component.

Right now this is a 3-layer RNN.  We can extend it to include different cell types, fiddle with the network size, and maybe a different layout.
I'm going to retry this in a more modular approach (and for a more general set of code), with multiple inputs, differing sizes, dropout, more efficient loading.

In [17]:
#from EBA_RNN import EBA_RNN
from tf_rnn import recurrent_NN, RNNConfig

# Retrying with module

This is going to use similar code in a more OOP framework, and ideally a more powerful, flexible model that can handle time-series of varying lengths, and varying number of input/output variables.

Why do that here?  Well, the toy model only treats PGE, and uses the weather from Portland.  Including weather and energy usage from other nearby places may be a better predictor for each ISO's demand. 

Also, this could then be used to model likely demand from elsewhere.

Also: why bother with building a separate object? This is all pretty small.  Well, I found that particularly with a Jupyter Notebook it was
easy to lose track of what the status of the variables and graphs was.
This neatly encapsulates all of that.  (And is easier to modify later, and better programming practice.)


In [94]:
Nstation=3
NISO=1
Nextra=2
Ninputs=Nstation+NISO+Nextra
%pdb off
rnn_conf=RNNConfig(Ninputs=Ninputs,Nepoch=5000,Nhidden=20,Nprint=100)
RNN=recurrent_NN(rnn_conf)

Automatic pdb calling has been turned OFF


In [95]:
dem_train2=dem_train.reshape((len(dem_train),1))
RNN.train_graph(temp_train,dem_train2,'models/pdx_test')

<matplotlib.figure.Figure at 0x7efb5d895390>

iter #5000. Current MSE:1.8405640125274658
Total Time taken:219.41894936561584




In [96]:
%pdb off
ytot=RNN.predict_all(temp_mat,'models/pdx_test')

0 2400
2400 2400
4800 2400
7200 2400
9600 2400
12000 2400
14400 2400
16800 2400
19200 1008
20208 0
No entries left.  Breaking loop


INFO:tensorflow:Restoring parameters from models/pdx_test-5000


Automatic pdb calling has been turned OFF


In [97]:
pred=pd.Series(((dmax-dmin)*ytot+dmin).reshape(-1),index=dem.index)
#pred=pd.Series((ytot).reshape(-1),index=dem.index)

In [98]:
plt.figure(figsize=(10,6))
plt.plot(pred)
plt.plot(dem)
plt.show()

<matplotlib.figure.Figure at 0x7efbac42ca58>