# Predicting the Dyssynchrony Index
We will be tackling a sequence classification problem with recurrent neural networks. We believe that the vectorcardiogram is a good predictor of the dyssynchrony index, so we will treat the VCG as a sequence of coordinates and feed it into a LSTM network.

## Data Dimensions

### *Input*
Our input will be (simulated) vectorcardiograms generated by Chris Villongco using CMRG's Continuity.
Our dataset consists of 608 simulated VCGs. These simulations have varying parameters, such as differing stimulus sites and conduction velocities, but are based on the same patient (patient 2, BiV2). We limit our dataset to only 608 examples in the interest of speeding up computational time, as we are more interested in showing proof of concept than obtaining a higher accuracy (of course, we will aim for higher accuracy later).

Because we view the VCG as a sequence, each timestep can be viewed as one element in the sequence. For every time ```t```, we are given three inputs, the  ```(x, y, z)``` coordinates of the vector head.

### *Output*
We will be classifying each VCG based on the corresponding dyssynchrony index from that same simulation. The dyssynchrony index is a scalar value that theoretically ranges from 0 to 1, but we will only be concerned with the range of 0.5 to 1. This range will be further divided into 5 regular intervals, which will serve as our "classes". Specifically, these intervals will be ```[0.5, 0.6), [0.6, 0.7), [0.7, 0.8), [0.8, 0.9)```, and ```[0.9, 1.0]```. Simulations with dyssynchrony indices under 0.5 will be considered faulty and will be placed in the first interval, ```[0.5, 0.6)```. Also, note that simulations with dyssynchrony indices of 1.0 will be placed in the last interval, ```[0.9, 1.0]```. These intervals will be labeled as follows (intervals are 0-indexed):
* ```0: [0.5, 0.6)```
* ```1: [0.6, 0.7)```
* ```2: [0.7, 0.8)```
* ```3: [0.8, 0.9)```
* ```4: [0.9, 1.0]```
  
For example, a VCG sequence with a corresponding dyssynchrony index of 0.78 will be placed in class ```"2"``` since it falls in the range of 0.7 and 0.8, the third interval.

Thus, the output of the neural network will be a probability distribution signifying how likely a simulation with the given VCG will have a dyssynchrony index that falls under each of the five intervals.

## Dataset Wrapper
We've created a class that provides a basic interface for handling the dataset. Specifically, the wrapper will do the following: 
* Read in the dataset from three specified ```.npy``` files (VCG, VCG lengths, target class)
* Split the dataset into training, validation, and testing sets (the set sizes will be fixed for convenience)
* Provide a ```next_batch``` function that will return a batch of specified size for a given set.

We import the wrapper here. To instantiate, we specify the names of the NumPy files for the following:
* ```vcg.npy```: VCG sequences (input)
* ```vcg_length.npy```: VCG sequence lengths (passed as argument for ```sequence_length.npy``` parameter)
* ```target.npy```: dyssynchrony indices (target output)

In [1]:
from dataset import Patient

# Initialize dataset iterator
patient_dataset = Patient("dataset/vcg.npy", "dataset/vcg_length.npy", "dataset/target.npy")

## Initialization
### Network Dimensions
We will define the dimensions of our data, as well as the initial hyperparameters of our neural network here. Note: these parameters have not been optimized, they are simply for proof of concept.

In [2]:
# Hyperparameters
#learning_rate = 0.1
training_steps = 64 # 1472 training examples / 1 batch of 23 = 64 total batches
num_hidden = 130
display_step = 1

# Network Parameters
num_steps = 130
num_inputs = 3
num_classes = 5
epochs = 9

# Where TensorFlow saves metadata for TensorBoard
logs_path='Data/'

### Input Placeholders
We define three placeholders. They are for the following:
* VCG sequence input: ```[num_steps, batch_size, num_inputs]```
* VCG sequence length: ```[batch_size]```
* targets (not hot vectors): ```[batch_size]```

In [3]:
import tensorflow as tf 

# VCG input [130, None, 3]
x = tf.placeholder(tf.float32, [num_steps, None, num_inputs])

# VCG sequence lengths
sequence_length = tf.placeholder(tf.int32, [None])

# Index of class the VCG should be categorized as
y = tf.placeholder(tf.int64, [None])

### Weights and Biases
The recurrent neural network creates an output at every timestep. Since this is a problem of sequence classification, we are only interested in the output produced at the last timestep, ```t=t_end```. We then apply a linear activation on it. The weights and biases are initialized with random values from a normal distribution, with a mean of 0.0 and a standard deviation of 1.0.

In [4]:
import math
# Define weights and biases

# [100, 5]
# weights = tf.Variable(tf.random_uniform(
#         shape = [num_hidden, num_classes],
#         minval = (-1 / math.sqrt(num_hidden)),
#         maxval = (1 / math.sqrt(num_hidden))
# ))
weights = tf.Variable(tf.random_normal(
        shape = [num_hidden, num_classes],
        ))
# [5]
#biases = tf.Variable(tf.zeros([num_classes]))
biases = tf.Variable(tf.random_normal([num_classes]))

## Recurrent Neural Network Cell
Here we define what kind of recurrent neural network we will be using. We will be using a basic LSTM network with a default forget bias of 1.0, and ```tanh``` as the activation function. The ```BasicLSTMCell``` initializer function takes as parameters:
* ```num_units```: The number of units in a LSTM cell.
* ``` forget_bias```: float, the bias added to the forget gates.
* ``` activation```: activation function of the inner states. Default is ```tanh```.
* ``` state_is_tuple```: Accepted and returned states are 2-tuples of the c_state and m_state(???). Default is True

In [5]:
from tensorflow.python.ops import rnn_cell

# Define a lstm cell with tensorflow
cell = rnn_cell.BasicLSTMCell(
    num_units=num_hidden,
    forget_bias=1.0, 
    state_is_tuple=True
)

## Prediction Operation
We will be using the ```tf.nn.dynamic_rnn``` function, instead of the ```tf.nn.rnn``` function, to get the output of the recurrent neural network. Unlike ```tf.nn.rnn```, ```tf.nn.dynamic_rnn``` takes in variable sequence lengths (it uses a ``tf.While`` loop to dynamically construct the computational graph). Also, it is faster (supposedly), despite the fact that ```tf.nn.rnn``` prebuilds the graph. The parameters are as follows:
* ```cell```: an instance of RNN cell.
* ```dtype```: (optional) The data type for the initial state and the expected output. 
* ```sequence_length```: (optional) An int32/64 vector of size [batch_size] specifying the length of each sequence.
* ```inputs```: the RNN input, a single Tensor. The dimensions are [batch_size, sequence_length, num_inputs]
* ```time_major```: Specifies that the max number of timesteps comes as the first dimensions, so the input placeholder must be of shape ```[max_time, batch_size, num_inputs]```. This requires us to permute the input matrix.


In [6]:
outputs, states = tf.nn.dynamic_rnn(
    cell=cell,
    dtype=tf.float32,
    sequence_length=sequence_length,
    inputs=x,
    time_major=True
)

We bring to attention the parameter ```time_major```, which we set to ```True```. This specifies that for our input placeholder, the ```max_time``` will be the *first* dimension, so it must have shape ```[max_time, batch_size, num_inputs]```. As a result, the output tensor would have shape ```[max_time, batch_size, num_hidden]```. 


In [7]:
# Shape of output tensor
print "Output tensor shape: " + str(outputs.get_shape())

Output tensor shape: (130, ?, 130)


This is advantageous for two reasons:
* We can easily access the *last* timestep by calling ```output[-1]``` (we are only concerned with the last timestep as this is a sequence classification problem).
* Increases efficiency because it avoids transpositions at the beginning and end of the RNN calculation (https://www.tensorflow.org/versions/r0.10/api_docs/python/nn/recurrent_neural_networks#dynamic_rnn)

However, we must alter our input, because they come in shape ```[batch_size, max_time, num_input]```, as it was the more *intuitive* way of bundling our data. Specifically, we must permute the ```0th``` and ```1st``` dimension so that the shape will be ```[max_time, batch_size, num_inputs]```. We can do so by calling NumPy's ```np.swapaxes``` function on ```batch_x``` before passing it into the ```feed_dict```.

Below is an example of permuting the ```0th``` and ```1st``` axes of the batch of VCGs:

In [8]:
import numpy as np

# Initialize a dummy iterator for the purpose of this example
dummy_dataset = Patient("dataset/vcg.npy", "dataset/vcg_length.npy", "dataset/target.npy")

# Grab the first batch
dummy_x, dummy_length, dummy_target = dummy_dataset.train.next_batch()

# Get the shape before permutation
print "Input batch shape before swap: " + str(dummy_x.shape)

# Permute the 0th and 1st axes
dummy_x = np.swapaxes(dummy_x, 0, 1)
print "Input batch shape after swap: " + str(dummy_x.shape)

Input batch shape before swap: (23, 130, 3)
Input batch shape after swap: (130, 23, 3)


## Linear Activation
After we get the outputs for every timestep, we extract the output for the *last* timestep and apply a linear activation. 

In [9]:
prediction = tf.matmul(outputs[-1], weights) + biases

# Shape should be [batch_size, num_classes]
print "Shape of output after linear activation: " + str(prediction.get_shape())

Shape of output after linear activation: (?, 5)


## Training and Evaluation
Up to this point, we are able to feed forward our input and have the neural network output its "prediction". Now we need to set up some key functions to allow for this network to be trained.

### Cross Entropy Loss
We will be using TensorFlow's ```tf.nn.sparse_softmax_cross_entropy_with_logits``` function which measures the probability error in discrete classification tasks in which the classes are *mutually exclusive*. Note that this operation expects unscaled logits and it performs softmax internally for efficiency. 

Its parameters are as follows:
* logits: float32/64 with shape ```[batch_size, num_classes]```
* labels: int32/64 with shape ```[batch_size]``` where each entry is a value between ```[0, num_classes)```.

In [10]:
# Calculate costs for each example
costs = tf.nn.sparse_softmax_cross_entropy_with_logits(prediction, y)
print "Shape of costs (before averaging): " + str(costs.get_shape())

Shape of costs (before averaging): (?,)


This calculates the cost for each example in the batch. To find the batch cost, we average over all costs in the batch using ```tf.reduce_mean```.

Its parameters are as follows:
* input_tensor: The tensor to reduce. Should have numeric type.
* axis: The dimensions to reduce. If None (the default), reduces all dimensions.
* keep_dims: If true, retains reduced dimensions with length 1.
* name: A name for the operation (optional).

In [11]:
# Cost function
cost = tf.reduce_mean(costs)

# Output is a scalar value
print "Shape of cost (scalar value): " + str(cost.get_shape())

Shape of cost (scalar value): ()


## Optimizer
Once we define our cost function, we now know what we are trying to minimize. We will use a TensorFlow-defined operation that implements the Adam Algorithm, ```tf.train.AdamOptimizer``` with the following parameters:
* learning_rate: A Tensor or a floating point value. The learning rate (default is 0.001)
* beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates (default is 0.9).
* beta2: A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates (default is 0.999).
* epsilon: A small constant for numerical stability (default is 1e-08).
* use_locking: If True use locks for update operations (default is False).
* name: Optional name for the operations created when applying gradients. Defaults to "Adam".

In [12]:
# Initialize TF optimizer
optimizer = tf.train.AdamOptimizer(
    #learning_rate=learning_rate,
    name="AdamOptimizer"
)

Once we initialize an optimizer, we can simply call the ```optimizer.minimize()``` function to both calculate the gradients and apply them.

In [13]:
# Training op
train = optimizer.minimize(cost)

## Determining Accuracy
Instead of looking at the raw cost to determine the model's performance, we can simply calculate how accurately the model correctly predicted the right class that the VCG falls in. To extract the *class* that the model predicted, we simply call ```tf.argmax``` which returns the index with the largest across a specified axis of a tensor

In [14]:
# Compare predictions with targets
compare = tf.equal(tf.argmax(prediction, 1), y)

# Cast booleans to ints
count_correct = tf.reduce_mean(tf.cast(compare, tf.float32))
accuracy = tf.mul(count_correct, 100)

# Shape should be a single scalar value
print "Accuracy shape (scalar value): " + str(accuracy.get_shape())

Accuracy shape (scalar value): ()


We are now ready to begin training!

In [15]:
# Initialize all the variables
init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)
    
     # Grab testing set
    test_x = patient_dataset.test.vcg
    test_length = patient_dataset.test.vcg_length
    test_y = patient_dataset.test.target
    
    # Output accuracy for validation set
    print "Testing accuracy (before): %2f." % sess.run(accuracy, feed_dict={
            x: np.swapaxes(test_x, 0, 1), 
            sequence_length: test_length, 
            y: test_y 
        })

    # Training loop
    for epoch in range(epochs+1):
        epoch_loss = 0
        for step in range(training_steps):

            # Grab the next batch from the training set
            batch_x, batch_length, batch_y = patient_dataset.train.next_batch()
            
            # Accumulate cost for each epoch
            _, step_cost = sess.run([train, cost], feed_dict={
                    x: np.swapaxes(batch_x, 0, 1), 
                    sequence_length: batch_length, 
                    y: batch_y
            })
            
            epoch_loss += step_cost
        print 'Epoch', epoch, 'of', epochs, 'loss:', epoch_loss

    # Output accuracy for testing set
    print "Testing accuracy (After): %2f." % sess.run(accuracy, feed_dict={
            x: np.swapaxes(test_x, 0, 1), 
            sequence_length: test_length, 
            y: test_y 
        })                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      

Testing accuracy (before): 39.420288.
Epoch 0 of 9 loss: 75.5132257342
Epoch 1 of 9 loss: 54.1599846482
Epoch 2 of 9 loss: 48.3951685131
Epoch 3 of 9 loss: 46.6368784606
Epoch 4 of 9 loss: 43.1235064566
Epoch 5 of 9 loss: 42.3143689036
Epoch 6 of 9 loss: 36.8318191916
Epoch 7 of 9 loss: 33.2801501304
Epoch 8 of 9 loss: 32.6419562548
Epoch 9 of 9 loss: 30.8667163998
Testing accuracy (After): 72.173912.


# Logs

Below are testing accuracies with different hyperparameter configurations

I have been restarting the kernel and running every cell each time I train, because I believe that rerunning the training block trains the same weights, causing it to overfit.

I will restructure the code so that it will recreate the RNN fresh each time I call the training block

## 130 Hidden Units, Random Parameter Initialization Learning Rate = 0.001, Forget Bias = 1.0, 10 Epochs (3 trial)

Testing accuracy (before): 32.173912.
Epoch 0 of 9 loss: 75.557598412
Epoch 1 of 9 loss: 52.1823754311
Epoch 2 of 9 loss: 45.775316149
Epoch 3 of 9 loss: 43.0103892386
Epoch 4 of 9 loss: 39.3500338197
Epoch 5 of 9 loss: 37.4920585454
Epoch 6 of 9 loss: 36.1522545815
Epoch 7 of 9 loss: 33.536513567
Epoch 8 of 9 loss: 30.9474855363
Epoch 9 of 9 loss: 32.5721822828

**Testing accuracy (After): 69.565216.**

Testing accuracy (before): 8.985508.
Epoch 0 of 9 loss: 70.3054902554
Epoch 1 of 9 loss: 51.7699039578
Epoch 2 of 9 loss: 45.7114805579
Epoch 3 of 9 loss: 42.6487196684
Epoch 4 of 9 loss: 40.4050393403
Epoch 5 of 9 loss: 38.4582608938
Epoch 6 of 9 loss: 36.4260773659
Epoch 7 of 9 loss: 33.7049204707
Epoch 8 of 9 loss: 30.2670814097
Epoch 9 of 9 loss: 30.732228741

**Testing accuracy (After): 72.753624.**

Testing accuracy (before): 39.420288.
Epoch 0 of 9 loss: 75.5132257342
Epoch 1 of 9 loss: 54.1599846482
Epoch 2 of 9 loss: 48.3951685131
Epoch 3 of 9 loss: 46.6368784606
Epoch 4 of 9 loss: 43.1235064566
Epoch 5 of 9 loss: 42.3143689036
Epoch 6 of 9 loss: 36.8318191916
Epoch 7 of 9 loss: 33.2801501304
Epoch 8 of 9 loss: 32.6419562548
Epoch 9 of 9 loss: 30.8667163998

**Testing accuracy (After): 72.173912.**

## 140 Hidden Units, Learning Rate = 0.001, Forget Bias = 1.0, 10 Epochs (3 trial)
* 130 and 180 hidden units seemed viable, maybe parameter initialization can improve results?

Testing accuracy (before): 21.739130.
Epoch 0 of 9 loss: 85.331428647
Epoch 1 of 9 loss: 72.6410861611
Epoch 2 of 9 loss: 62.3723490834
Epoch 3 of 9 loss: 57.5069853067
Epoch 4 of 9 loss: 53.6119365692
Epoch 5 of 9 loss: 48.419193387
Epoch 6 of 9 loss: 45.6134702861
Epoch 7 of 9 loss: 44.2847471833
Epoch 8 of 9 loss: 41.9139312208
Epoch 9 of 9 loss: 40.0995288789

**Testing accuracy (After): 72.463768.**

Testing accuracy (before): 22.898550.
Epoch 0 of 9 loss: 87.2807415724
Epoch 1 of 9 loss: 76.3191206455
Epoch 2 of 9 loss: 72.4193108678
Epoch 3 of 9 loss: 63.6285290718
Epoch 4 of 9 loss: 53.1143798232
Epoch 5 of 9 loss: 48.6552578211
Epoch 6 of 9 loss: 49.3438004851
Epoch 7 of 9 loss: 57.9835896492
Epoch 8 of 9 loss: 47.5023836792
Epoch 9 of 9 loss: 43.2307367921

**Testing accuracy (After): 64.347824.**

Testing accuracy (before): 24.347826.
Epoch 0 of 9 loss: 85.8769261837
Epoch 1 of 9 loss: 66.9577563405
Epoch 2 of 9 loss: 58.7582893968
Epoch 3 of 9 loss: 55.6729531288
Epoch 4 of 9 loss: 49.5340246558
Epoch 5 of 9 loss: 45.4523643851
Epoch 6 of 9 loss: 43.2482728064
Epoch 7 of 9 loss: 46.2375109196
Epoch 8 of 9 loss: 61.4057783484
Epoch 9 of 9 loss: 50.8604812324

**Testing accuracy (After): 50.724636.** Bad...

## 180 Hidden Units, Learning Rate = 0.001, Forget Bias = 1.0 (3 trial)

### 10 Epochs
Testing accuracy (before): 18.840580.
Epoch 0 of 9 loss: 76.2906258702
Epoch 1 of 9 loss: 65.765389204
Epoch 2 of 9 loss: 55.0877380371
Epoch 3 of 9 loss: 51.0981811285
Epoch 4 of 9 loss: 49.8257372975
Epoch 5 of 9 loss: 44.2450459003
Epoch 6 of 9 loss: 38.0618970692
Epoch 7 of 9 loss: 39.3580549359
Epoch 8 of 9 loss: 34.6939296126
Epoch 9 of 9 loss: 30.6818228215

**Testing accuracy (After): 61.449276.**

Testing accuracy (before): 19.130434.
Epoch 0 of 9 loss: 85.1332124472
Epoch 1 of 9 loss: 70.4493879676
Epoch 2 of 9 loss: 66.93555516
Epoch 3 of 9 loss: 58.8476182222
Epoch 4 of 9 loss: 56.9981991053
Epoch 5 of 9 loss: 62.1156691313
Epoch 6 of 9 loss: 70.990103662
Epoch 7 of 9 loss: 57.2450824976
Epoch 8 of 9 loss: 53.5178214312
Epoch 9 of 9 loss: 46.9989397228

**Testing accuracy (After): 57.971012.**

esting accuracy (before): 30.144926.
Epoch 0 of 9 loss: 81.994656086
Epoch 1 of 9 loss: 67.7231312394
Epoch 2 of 9 loss: 59.8202496767
Epoch 3 of 9 loss: 50.6220470667
Epoch 4 of 9 loss: 48.6617728472
Epoch 5 of 9 loss: 45.4321470261
Epoch 6 of 9 loss: 45.5562474728
Epoch 7 of 9 loss: 47.4617035389
Epoch 8 of 9 loss: 43.3508636951
Epoch 9 of 9 loss: 46.354890734

**Testing accuracy (After): 73.043480.** That's weird...

### 5 epochs

Testing accuracy (before): 20.000000.
Epoch 0 of 4 loss: 88.4674886465
Epoch 1 of 4 loss: 79.127618432
Epoch 2 of 4 loss: 71.9268788695
Epoch 3 of 4 loss: 60.9751239419
Epoch 4 of 4 loss: 57.7176173329

**Testing accuracy (After): 53.043480.**

Testing accuracy (before): 17.681160.
Epoch 0 of 4 loss: 88.3053853512
Epoch 1 of 4 loss: 77.1043817401
Epoch 2 of 4 loss: 66.8408247232
Epoch 3 of 4 loss: 53.1759700179
Epoch 4 of 4 loss: 46.6574418247

**Testing accuracy (After): 62.318836.**

Testing accuracy (before): 21.449276.
Epoch 0 of 4 loss: 82.4324746728
Epoch 1 of 4 loss: 64.7673819065
Epoch 2 of 4 loss: 57.2799475789
Epoch 3 of 4 loss: 54.0620707572
Epoch 4 of 4 loss: 52.5710415244

**Testing accuracy (After): 60.289852.**

## 100 Hidden Units, Learning Rate = 0.001, Forget Bias = 1.0, 10 Epochs (3 trial)
* I think >100 hidden units is needed

Testing accuracy (before): 19.420290.
Epoch 0 of 9 loss: 87.4952960014
Epoch 1 of 9 loss: 78.2850236893
Epoch 2 of 9 loss: 74.4745118022
Epoch 3 of 9 loss: 68.4910777211
Epoch 4 of 9 loss: 61.5092281103
Epoch 5 of 9 loss: 55.3548156023
Epoch 6 of 9 loss: 49.5705607831
Epoch 7 of 9 loss: 55.7964946032
Epoch 8 of 9 loss: 53.1307385564
Epoch 9 of 9 loss: 44.5970182121

**Testing accuracy (After): 68.985512**

Testing accuracy (before): 31.014494.
Epoch 0 of 9 loss: 84.857060194
Epoch 1 of 9 loss: 69.5864434838
Epoch 2 of 9 loss: 57.7344907522
Epoch 3 of 9 loss: 56.691118896
Epoch 4 of 9 loss: 49.8172249198
Epoch 5 of 9 loss: 51.9684148133
Epoch 6 of 9 loss: 49.7311271131
Epoch 7 of 9 loss: 42.486456126
Epoch 8 of 9 loss: 40.7341637611
Epoch 9 of 9 loss: 37.7886915803

**Testing accuracy (After): 62.028988.**

Testing accuracy (before): 12.173913.
Epoch 0 of 9 loss: 85.4375839233
Epoch 1 of 9 loss: 67.3987196088
Epoch 2 of 9 loss: 56.5238130093
Epoch 3 of 9 loss: 51.6812948287
Epoch 4 of 9 loss: 48.3742640615
Epoch 5 of 9 loss: 45.2510369718
Epoch 6 of 9 loss: 41.1551375091
Epoch 7 of 9 loss: 42.221198231
Epoch 8 of 9 loss: 40.1072087884
Epoch 9 of 9 loss: 35.6482861936

**Testing accuracy (After): 67.246376.**

## 65 Hidden Units, Learning Rate = 0.001, Forget Bias = 1.0, 10 Epochs (3 trial)
* The learning rate may have been the key change

Testing accuracy (before): 19.130434.
Epoch 0 of 9 loss: 90.1926560402
Epoch 1 of 9 loss: 76.4868066907
Epoch 2 of 9 loss: 67.9105214477
Epoch 3 of 9 loss: 61.2231799364
Epoch 4 of 9 loss: 60.8890686631
Epoch 5 of 9 loss: 53.9355236292
Epoch 6 of 9 loss: 49.0524513125
Epoch 7 of 9 loss: 48.0616714358
Epoch 8 of 9 loss: 44.6592648327
Epoch 9 of 9 loss: 46.9891751409

**Testing accuracy (After): 61.449276.**

Testing accuracy (before): 15.072463.
Epoch 0 of 9 loss: 86.0159275532
Epoch 1 of 9 loss: 69.2517617941
Epoch 2 of 9 loss: 58.9999563694
Epoch 3 of 9 loss: 50.8192391992
Epoch 4 of 9 loss: 46.4358637929
Epoch 5 of 9 loss: 43.5678417981
Epoch 6 of 9 loss: 41.0317820311
Epoch 7 of 9 loss: 41.218421787
Epoch 8 of 9 loss: 40.7821920216
**Epoch 9 of 9 loss: 39.0734291673**

**Testing accuracy (After): 70.434784.**

Testing accuracy (before): 28.985506.
Epoch 0 of 9 loss: 87.9961923361
Epoch 1 of 9 loss: 79.2490195036
Epoch 2 of 9 loss: 68.9531587958
Epoch 3 of 9 loss: 58.8960590959
Epoch 4 of 9 loss: 54.0959773064
Epoch 5 of 9 loss: 49.3094486892
Epoch 6 of 9 loss: 45.3574355543
Epoch 7 of 9 loss: 45.1514770389
Epoch 8 of 9 loss: 45.0267317891
**Epoch 9 of 9 loss: 39.3672601879**

**Testing accuracy (After): 64.637680.**

## 130 Hidden Units, Learning Rate = 0.001, Forget Bias = 1.0, 10 Epochs (2 trial)
# Top results thusfar

Testing accuracy (before): 18.840580.
Epoch 0 of 9 loss: 86.9466418028
Epoch 1 of 9 loss: 71.7236349583
Epoch 2 of 9 loss: 67.3820146918
Epoch 3 of 9 loss: 55.0486302376
Epoch 4 of 9 loss: 48.5060537159
Epoch 5 of 9 loss: 49.4411396086
Epoch 6 of 9 loss: 46.7226067781
Epoch 7 of 9 loss: 46.3157350123
Epoch 8 of 9 loss: 38.8151058853
**Epoch 9 of 9 loss: 35.6387427151**

**Testing accuracy (After): 71.884064.**

Testing accuracy (before): 10.724638.
Epoch 0 of 9 loss: 85.7497484684
Epoch 1 of 9 loss: 74.6639196873
Epoch 2 of 9 loss: 65.0653736591
Epoch 3 of 9 loss: 56.1773189306
Epoch 4 of 9 loss: 50.0931537151
Epoch 5 of 9 loss: 47.9462119043
Epoch 6 of 9 loss: 45.2166830003
Epoch 7 of 9 loss: 39.0820546746
Epoch 8 of 9 loss: 36.6976678967
**Epoch 9 of 9 loss: 36.8438378274**

**Testing accuracy (After): 72.463768.**

Testing accuracy (before): 17.681160.
Epoch 0 of 9 loss: 84.9358751774
Epoch 1 of 9 loss: 65.0453099608
Epoch 2 of 9 loss: 58.7528081536
Epoch 3 of 9 loss: 51.4128648043
Epoch 4 of 9 loss: 46.5837843418
Epoch 5 of 9 loss: 45.0555068851
Epoch 6 of 9 loss: 44.3801488578
Epoch 7 of 9 loss: 42.1028029919
Epoch 8 of 9 loss: 37.5983171761
**Epoch 9 of 9 loss: 37.1865636706**

**Testing accuracy (After): 70.144928.**