# Predicting the Dyssynchrony Index
We will be classifying vectorcardiogram (VCG) data using a recurrent neural network. The VCG input are simulations generated from CMRG's Continuity software by Chris Villongco. The classes that we are placing these VCGs correspond to the dyssynchrony index.

## Data Dimensions
### *Input*
Our current dataset consists of 608 VCG simulations based on 8 clinical patients. We will be truncating each VCG simulation to 120 timesteps. Each timestep contains the x, y, z coordinates of the head of the vector; thus, there are 3 inputs. 
### *Output*
We will be classifying each VCG simulation based on the corresponding dyssynchrony index of that simulation. The dyssynchrony index ranges from 0 to 1, but we will only be concerned with the range of 0.5 to 1. This range will be further divided into 5 regular intervals. For example, a VCG sequence with a dyssynchrony index of 0.78 will be placed in class "3" since it falls in the range of 0.6 and 0.7, the third interval.

## Dataset Wrapper
We've created a class that provides a basic interface for handling the dataset. Specifically, the wrapper will do the following: 
* Read in the dataset from a .np file
* Split the dataset into the training, validation, and testing sets
* Provide a ```next_batch``` function that will return a batch of specified size for a certain set.

We import the wrapper here. To instantiate, we specify the name of the NumPy files for the VCG and corresponding dyssynchrony indices and designate what percentage of the entire dataset what each set should be.

In [None]:
from vcg import VCG

# Initialize dataset iterator
data_sets = VCG("sequence.npy", "target.npy", 0.6, 0.2, 0.2, True)

TODO: The "sequence.npy" VCG data is collapsed, and already randomly permuted.

## Network Dimensions
We will define the dimensions of our neural network, as well as its initial hyperparameters. Note: these parameters have not been optimized, they are simply for proof of concept.

In [None]:
# Hyperparameters
learning_rate = 0.05
training_iters = 125
batch_size = 20
display_step = 10
num_hidden = 100

# Network Parameters
num_steps = 120
num_inputs = 3
num_classes = 5

# Where TensorFlow saves metadata for TensorBoard
logs_path='rnndata/'

## Input Placeholders
We define one placeholder for the VCG input, and another for the index of what class the VCG sequence should be in. This is used for training.

In [None]:
import tensorflow as tf 

# VCG input 
x = tf.placeholder("float", [None, num_inputs, num_steps])

# Index of class the VCG should be categorized as
y = tf.placeholder("float", [None, num_classes])

## Linear Activation
The recurrent neural network creates an output at every timestep. Since this is a problem of sequence classification, we are only interested in the output produced at the last timestep, t=120. We then apply a linear activation on it. The weights and biases are initialized with random values from a normal distribution, with a mean of 0.0 and a standard deviation of 1.0.

In [None]:
# Define weights and biases
weights = tf.Variable(tf.random_normal([num_hidden, num_classes]))
biases = tf.Variable(tf.random_normal([num_classes]))

## Recurrent Neural Network Cell
Here we define what kind of recurrent neural network we will be using. We will be using a basic LSTM network with a default forget bias of 1.0, and ```tanh``` as the activation function. The ```BasicLSTMCell``` initializer function takes as parameters:
* ```num_units```: The number of units in a LSTM cell.
* ``` forget_bias```: float, the bias added to the forget gates.
* ``` activation```: activation function of the inner states. Default is ```tanh```.
* ``` state_is_tuple```: Accepted and returned states are 2-tuples of the c_state and m_state(???). Default is True

In [None]:
from tensorflow.python.ops import rnn_cell

# Define a lstm cell with tensorflow
cell = rnn_cell.BasicLSTMCell(num_units=num_hidden, forget_bias=1.0)

We will be using the ```tf.nn.dynamic_rnn``` function to get the output of the recurrent neural network, instead of the ```tf.nn.rnn``` function. Unlike ```tf.nn.rnn```, ```tf.nn.dynamic_rnn``` takes in sequences longer than the specified length because it used a ``tf.While`` loop to dynamically construct the graph when it is executed. Also, it is faster, despite the fact that ```tf.nn.rnn``` prebuilds the graph. The parameters are as follows:
* ```cell```: an instance of RNN cell.
* ```inputs```: the RNN input, a single Tensor. The dimensions are [batch_size, sequence_length, num_inputs]
* ```sequence_length```: (optional) An int32/64 vector of size [batch_size] specifying the length of each sequence.
* ```dtype```: (optional) The data type for the initial state and the expected output. 

In [None]:
sequence_lengths = []

output, states = tf.nn.dynamic_rnn(
    cell=cell,
    dtype=tf.float64,
    #sequence_length=sequence_lengths,
    inputs=x
)

TODO specify sequence lengths