# Modeling Tesla Throughput Ratio with LSTMs and TensorFlow

In this tutorial, we will build a Long Short Term Memory (LSTM) Network to predict the Tesla Throughput Ratio based on a comment about the market.

## Setup

We will use the following libraries for our analysis:

* numpy - numerical computing library used to work with our data
* pandas - data analysis library used to read in our data from csv
* tensorflow - deep learning framework used for modeling

In [27]:
import numpy as np
import pandas as pd
import tensorflow as tf
import utils as utl

## Processing Data

#### Read and View Data

First we simply read in our data using pandas, pull out our windfarm, frequency and throuhput ratio into numpy arrays.

In [28]:
# read data from csv file
data = pd.read_csv("df_lower_alldata.csv").fillna(0)
df = data.iloc[:]
# get varibles and results
variables = (df.iloc[:,2:-1]+1.5)*10
results = df.iloc[:,-1]*100

In [29]:
variables.describe()

Unnamed: 0,BLUFF1,CLEMGPWF,HALLWF1,HALLWF2,HDWF1,HDWF2,HDWF3,LKBONNY2,LKBONNY3,NBHWF1,SNOWNTH1,SNOWSTH1,SNOWTWN1,WATERLWF,Unnamed: 15,Average Frequency (Hz),Median Frequency (Hz),Cummulative Frequency (Hz),Difference Frequency (Hz)
count,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0,20778.0
mean,207.036509,246.867874,419.69118,322.101596,495.142052,461.794423,493.784933,479.416315,130.7746,603.763897,574.253916,562.652758,449.434224,501.327606,22.972512,515.107004,515.09851,38.828768,14.98523
std,170.720842,182.954921,290.909831,222.130729,320.420734,318.936854,333.595021,442.046316,110.660329,413.383635,440.177101,392.120537,322.323606,390.690506,29.590132,0.391277,0.426374,2.313478,0.824185
min,13.0,8.5168,6.6,9.0,4.0,3.0,4.0,0.6634,11.6332,3.0,15.0,15.0,5.7118,3.0,15.0,513.422333,513.100014,30.599823,12.099991
25%,42.0,67.5856,145.8375,114.1,188.125,161.0,178.0,111.52405,40.01,209.0,166.0,188.55,120.963625,147.01,15.0,514.870333,514.900017,37.2997,14.400024
50%,161.905,227.9,407.25,311.58,502.0,437.0,480.595,328.55795,92.5926,612.985,493.0,533.0,456.3842,408.0,15.0,515.118,515.099983,38.70018,15.0
75%,377.751075,419.8368,697.8,528.9,779.0,748.7489,792.34265,786.867225,205.3531,983.8189,1003.0,918.54345,746.3488,875.0,15.0,515.355334,515.400009,40.200386,15.499992
max,544.62,579.15,921.7,718.6,1023.0,1038.0,1114.0,1592.45,401.8466,1325.59,1448.0,1265.17,981.7012,1312.06,194.0,516.678665,516.699982,59.40033,18.400002


In [30]:
results.describe()

count    20778.000000
mean        27.832102
std         23.087233
min          0.000100
25%          8.906000
50%         22.387350
75%         41.494775
max        100.000000
Name: Lower Throughput Ratio, dtype: float64

#### Train, Test, Validation Split

The last thing we do is split our data into tranining, validation and test sets and observe the size of each set.

In [31]:
train_x, val_x, test_x, train_y, val_y, test_y = utl.train_val_test_split(variables, results, split_frac=0.70)

print("Data Set Size")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

Data Set Size
Train set: 		(14544, 19) 
Validation set: 	(3117, 19) 
Test set: 		(3117, 19)


## Building and Training our LSTM Network

In this section we will define a number of functions that will construct the items in our network. We will then use these functions to build and train our network.

#### Model Inputs

Here we simply define a function to build TensorFlow Placeholders for our data. 

In [32]:
def model_inputs():
    """
    Create the model inputs
    """
    inputs_ = tf.placeholder(tf.int32, [None, None], name='inputs')
    labels_ = tf.placeholder(tf.int32, [None, None], name='labels')
    keep_prob_ = tf.placeholder(tf.float32, name='keep_prob')
    
    return inputs_, labels_, keep_prob_

#### Embedding Layer

In TensorFlow the word embeddings are represented as possible data size x embedding size matrix and will learn these weights during our training process.

In [33]:
def build_embedding_layer(inputs_, vocab_size, embed_size):
    """
    Create the embedding layer
    """
    embedding = tf.Variable(tf.random_uniform((vocab_size, embed_size), -1, 1))
    embed = tf.nn.embedding_lookup(embedding, inputs_)
    
    return embed

#### LSTM Layers

TensorFlow makes it extremely easy to build LSTM Layers and stack them on top of each other. We represent each LSTM layer as a BasicLSTMCell and keep these in a list to stack them together later. Here we will define a list with our LSTM layer sizes and the number of layers. 

We then take each of these LSTM layers and wrap them in a Dropout Layer. Dropout is a regularization technique using in Neural Networks in which any individual node has a probability of “dropping out” of the network during a given iteration of learning. The makes the model more generalizable by ensuring that it is not too dependent on any given nodes. 

Finally, we stack these layers using a MultiRNNCell, generate a zero initial state and connect our stacked LSTM layer to our word embedding inputs using dynamic_rnn. Here we track the output and the final state of the LSTM cell, which we will need to pass between mini-batches during training.

In [34]:
def build_lstm_layers(lstm_sizes, embed, keep_prob_, batch_size):
    """
    Create the LSTM layers
    """
    lstms = [tf.contrib.rnn.BasicLSTMCell(size) for size in lstm_sizes]
    # Add dropout to the cell
    drops = [tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob_) for lstm in lstms]
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell(drops)
    # Getting an initial state of all zeros
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    lstm_outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state)
    
    return initial_state, lstm_outputs, cell, final_state

#### Loss Function and Optimizer

First, we get our predictions by passing the final output of the LSTM layers to a linear activation function via a Tensorflow fully connected layer.  we only care to use the final output for making predictions so we pull this out using the [: , -1] indexing on our LSTM outputs and pass it through a linear activation function to make the predictions. We pass then pass these predictions to our mean squared error loss function and use the Adadelta Optimizer to minimize the loss.

In [35]:
def build_cost_fn_and_opt(lstm_outputs, labels_, learning_rate):
    """
    Create the Loss function and Optimizer
    """
    predictions = tf.contrib.layers.fully_connected(lstm_outputs[:, -1], 1, activation_fn=tf.keras.activations.linear)
    loss = tf.losses.mean_squared_error(labels_, predictions)
    optimzer = tf.train.AdadeltaOptimizer(learning_rate).minimize(loss)
    return predictions, loss, optimzer

#### Accuracy

Finally, we define our accuracy metric for assessing the model performance across our training. Accuracy locates between (0,1), it is more accurate when the accuracy approaches 1.

In [36]:
def build_accuracy(predictions, labels_):
    labels_=tf.to_float(labels_, name='ToFloat')
    diff=tf.losses.mean_squared_error(labels_, predictions)
    accuracy=diff/10000
    return accuracy

#### Training

We are finally ready to build and train our LSTM Network! First, we call each of our each of the functions we have defined to construct the network. Then we define a Saver to be able to write our model to disk to load for future use. Finally, we call a Tensorflow Session to train the model over a predefined number of epochs using mini-batches. At the end of each epoch we will print the loss, training accuracy and validation accuracy to monitor the results as we train.

In [37]:
def build_and_train_network(lstm_sizes, vocab_size, embed_size, epochs, batch_size,
                            learning_rate, keep_prob, train_x, val_x, train_y, val_y):
    
    inputs_, labels_, keep_prob_ = model_inputs()
    embed = build_embedding_layer(inputs_, vocab_size, embed_size)
    initial_state, lstm_outputs, lstm_cell, final_state = build_lstm_layers(lstm_sizes, embed, keep_prob_, batch_size)
    predictions, loss, optimizer = build_cost_fn_and_opt(lstm_outputs, labels_, learning_rate)
    accuracy = build_accuracy(predictions, labels_)
    
    saver = tf.train.Saver()
    
    with tf.Session() as sess:
        
        sess.run(tf.global_variables_initializer())
        n_batches = len(train_x)//batch_size
        for e in range(epochs):
            state = sess.run(initial_state)
            
            train_acc = []
            for ii, (x, y) in enumerate(utl.get_batches(train_x, train_y, batch_size), 1):
                feed = {inputs_: x,
                        labels_: y[:, None],
                        keep_prob_: keep_prob,
                        initial_state: state}
                loss_, state, _,  batch_acc = sess.run([loss, final_state, optimizer, accuracy], feed_dict=feed)
                train_acc.append(batch_acc)
                
                if (ii + 1) % n_batches == 0:
                    
                    val_acc = []
                    val_state = sess.run(lstm_cell.zero_state(batch_size, tf.float32))
                    for xx, yy in utl.get_batches(val_x, val_y, batch_size):
                        feed = {inputs_: xx,
                                labels_: yy[:, None],
                                keep_prob_: 1,
                                initial_state: val_state}
                        val_batch_acc, val_state = sess.run([accuracy, final_state], feed_dict=feed)
                        val_acc.append(val_batch_acc)
                    
                    print("Epoch: {}/{}...".format(e+1, epochs),
                          "Batch: {}/{}...".format(ii+1, n_batches),
                          "Train Loss: {:.3f}...".format(loss_),
                          "Train Accruacy: {:.3f}...".format(np.mean(train_acc)*(np.mean(train_acc)>0)),
                          "Val Accuracy: {:.3f}".format(np.mean(val_acc)))
    
        saver.save(sess, "checkpoints/sentiment.ckpt")

Next we define our model hyper parameters. We will build a 2 Layer LSTM Newtork with hidden layer sizes of 128 and 64 respectively. We will use an embedding size of 256 and train over 20 epochs with mini-batches of size 256. We will use an initial learning rate of 0.1, though our Adadelta Optimizer will adapt this over time, and a keep probability of 0.5. 

In [38]:
# Define Inputs and Hyperparameters
lstm_sizes = [64, 128]
vocab_size = 3200
embed_size = 256
epochs = 20
batch_size = 256
learning_rate = 0.1
keep_prob = 0.5

and now we train!

In [39]:
with tf.Graph().as_default():
    build_and_train_network(lstm_sizes, vocab_size, embed_size, epochs, batch_size,
                            learning_rate, keep_prob, train_x, val_x, train_y, val_y)

Epoch: 1/20... Batch: 56/56... Train Loss: 763.156... Train Accruacy: 0.119... Val Accuracy: 0.086
Epoch: 2/20... Batch: 56/56... Train Loss: 601.740... Train Accruacy: 0.075... Val Accuracy: 0.068
Epoch: 3/20... Batch: 56/56... Train Loss: 565.046... Train Accruacy: 0.065... Val Accuracy: 0.063
Epoch: 4/20... Batch: 56/56... Train Loss: 523.104... Train Accruacy: 0.061... Val Accuracy: 0.059
Epoch: 5/20... Batch: 56/56... Train Loss: 508.472... Train Accruacy: 0.059... Val Accuracy: 0.057
Epoch: 6/20... Batch: 56/56... Train Loss: 502.852... Train Accruacy: 0.057... Val Accuracy: 0.056
Epoch: 7/20... Batch: 56/56... Train Loss: 502.510... Train Accruacy: 0.056... Val Accuracy: 0.055
Epoch: 8/20... Batch: 56/56... Train Loss: 501.193... Train Accruacy: 0.056... Val Accuracy: 0.055
Epoch: 9/20... Batch: 56/56... Train Loss: 498.781... Train Accruacy: 0.055... Val Accuracy: 0.054
Epoch: 10/20... Batch: 56/56... Train Loss: 493.232... Train Accruacy: 0.055... Val Accuracy: 0.054
Epoch: 11

## Testing our Network

The last thing we want to do is check the model accuracy on our testing data to make sure it is in line with expecations. We build the Computational Graph just like we did before, however, now instead of training we restore our saved model from our checkpoint directory and then run our test data through the model. 

In [40]:
def test_network(model_dir, batch_size, test_x, test_y):
    
    inputs_, labels_, keep_prob_ = model_inputs()
    embed = build_embedding_layer(inputs_, vocab_size, embed_size)
    initial_state, lstm_outputs, lstm_cell, final_state = build_lstm_layers(lstm_sizes, embed, keep_prob_, batch_size)
    predictions, loss, optimizer = build_cost_fn_and_opt(lstm_outputs, labels_, learning_rate)
    accuracy = build_accuracy(predictions, labels_)
    
    saver = tf.train.Saver()
    
    test_acc = []
    with tf.Session() as sess:
        saver.restore(sess, tf.train.latest_checkpoint(model_dir))
        test_state = sess.run(lstm_cell.zero_state(batch_size, tf.float32))
        for ii, (x, y) in enumerate(utl.get_batches(test_x, test_y, batch_size), 1):
            feed = {inputs_: x,
                    labels_: y[:, None],
                    keep_prob_: 1,
                    initial_state: test_state}
            batch_acc, test_state = sess.run([accuracy, final_state], feed_dict=feed)
            test_acc.append(batch_acc)
        print("Test Accuracy: {:.3f}".format(np.mean(test_acc)))

In [41]:
with tf.Graph().as_default():
    test_network('checkpoints', batch_size, test_x, test_y)

Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from checkpoints\sentiment.ckpt
Test Accuracy: 0.054
