# Spooky author identification



In [1]:
import numpy as np
import tensorflow as tf
import csv
from sklearn import preprocessing
from string import punctuation
from sklearn import metrics

In [2]:
reviews = []
labels = []
test = []
test_ids = []
with open('./text.csv', 'r',encoding="latin-1") as f:
    text_reader = csv.reader(f,delimiter=",")
    next(text_reader)
    for row in text_reader:
        reviews.append(row[0])
with open('./author.csv', 'r') as labels_csv:
    author_reader = csv.reader(labels_csv,delimiter =",")
    next(author_reader) #ignore header
    for row in author_reader:
        labels.append(row[0])
with open("./test.csv",'r') as test_csv:
    text_reader = csv.reader(test_csv, delimiter= ",")
    next(text_reader) #ignore header
    
    for row in text_reader:
        test_ids.append(row[0])
        test.append(row[1])

In [3]:
len(test)

8392

In [4]:
reviews[:20]

['This process, however, afforded me no means of ascertaining the dimensions of my dungeon; as I might make its circuit, and return to the point whence I set out, without being aware of the fact; so perfectly uniform seemed the wall.',
 'It never once occurred to me that the fumbling might be a mere mistake.',
 'In his left hand was a gold snuff box, from which, as he capered down the hill, cutting all manner of fantastic steps, he took snuff incessantly with an air of the greatest possible self satisfaction.',
 'How lovely is spring As we looked from Windsor Terrace on the sixteen fertile counties spread beneath, speckled by happy cottages and wealthier towns, all looked as in former years, heart cheering and fair.',
 'Finding nothing else, not even gold, the Superintendent abandoned his attempts; but a perplexed look occasionally steals over his countenance as he sits thinking at his desk.',
 'A youth passed in solitude, my best years spent under your gentle and feminine fosterage, h

In [5]:
labels[:20]

['EAP',
 'HPL',
 'EAP',
 'MWS',
 'HPL',
 'MWS',
 'EAP',
 'EAP',
 'EAP',
 'MWS',
 'MWS',
 'EAP',
 'HPL',
 'HPL',
 'EAP',
 'MWS',
 'EAP',
 'MWS',
 'EAP',
 'HPL']

In [6]:
test[:20]

['Still, as I urged our leaving Ireland with such inquietude and impatience, my father thought it best to yield.',
 'If a fire wanted fanning, it could readily be fanned with a newspaper, and as the government grew weaker, I have no doubt that leather and iron acquired durability in proportion, for, in a very short time, there was not a pair of bellows in all Rotterdam that ever stood in need of a stitch or required the assistance of a hammer.',
 'And when they had broken down the frail door they found only this: two cleanly picked human skeletons on the earthen floor, and a number of singular beetles crawling in the shadowy corners.',
 'While I was thinking how I should possibly manage without them, one actually tumbled out of my head, and, rolling down the steep side of the steeple, lodged in the rain gutter which ran along the eaves of the main building.',
 'I am not sure to what limit his knowledge may extend.',
 '"The thick and peculiar mist, or smoke, which distinguishes the Indi

## Data preprocessing

The first step when building a neural network model is getting your data into the proper form to feed into the network. Since we're using embedding layers, we'll need to encode each word with an integer. We'll also want to clean it up a bit.

You can see an example of the reviews data above. We'll want to get rid of those periods. Also, you might notice that the reviews are delimited with newlines `\n`. To deal with those, I'm going to split the text into each review using `\n` as the delimiter. Then I can combined all the reviews back together into one big string.

First, let's remove all punctuation. Then get all the text without the newlines and split it into individual words.

In [7]:
from string import punctuation
all_text = ' '.join([c for c in reviews + test if c not in punctuation])

words = all_text.split()

In [8]:
all_text[:305]

'This process, however, afforded me no means of ascertaining the dimensions of my dungeon; as I might make its circuit, and return to the point whence I set out, without being aware of the fact; so perfectly uniform seemed the wall. It never once occurred to me that the fumbling might be a mere mistake. I'

In [9]:
words[:100]

['This',
 'process,',
 'however,',
 'afforded',
 'me',
 'no',
 'means',
 'of',
 'ascertaining',
 'the',
 'dimensions',
 'of',
 'my',
 'dungeon;',
 'as',
 'I',
 'might',
 'make',
 'its',
 'circuit,',
 'and',
 'return',
 'to',
 'the',
 'point',
 'whence',
 'I',
 'set',
 'out,',
 'without',
 'being',
 'aware',
 'of',
 'the',
 'fact;',
 'so',
 'perfectly',
 'uniform',
 'seemed',
 'the',
 'wall.',
 'It',
 'never',
 'once',
 'occurred',
 'to',
 'me',
 'that',
 'the',
 'fumbling',
 'might',
 'be',
 'a',
 'mere',
 'mistake.',
 'In',
 'his',
 'left',
 'hand',
 'was',
 'a',
 'gold',
 'snuff',
 'box,',
 'from',
 'which,',
 'as',
 'he',
 'capered',
 'down',
 'the',
 'hill,',
 'cutting',
 'all',
 'manner',
 'of',
 'fantastic',
 'steps,',
 'he',
 'took',
 'snuff',
 'incessantly',
 'with',
 'an',
 'air',
 'of',
 'the',
 'greatest',
 'possible',
 'self',
 'satisfaction.',
 'How',
 'lovely',
 'is',
 'spring',
 'As',
 'we',
 'looked',
 'from',
 'Windsor']

### Encoding the words

The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our reviews into integers so they can be passed into the network.

> **Exercise:** Now you're going to encode the words with integers. Build a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers **start at 1, not 0**.
> Also, convert the reviews to integers and store the reviews in a new list called `reviews_ints`. 

In [10]:
reviews[2]

'In his left hand was a gold snuff box, from which, as he capered down the hill, cutting all manner of fantastic steps, he took snuff incessantly with an air of the greatest possible self satisfaction.'

In [11]:
# Create your dictionary that maps vocab words to integers here
vocab_to_int = {word:index for index,word in enumerate(set(words),1)}
vocab_to_int["<PAD>"] = 0

# Convert the reviews to integers, same shape as reviews list, but with integers
reviews_ints = []
for review in reviews:
    reviews_ints.append([vocab_to_int[word] for word in review.split()])
    
test_ints = []
for test_line in test:
    test_ints.append([vocab_to_int[word] for word in test_line.split()])

In [12]:
reviews_ints[1]

[22314,
 14422,
 33114,
 12459,
 15234,
 962,
 23712,
 37288,
 28133,
 54385,
 10575,
 10288,
 39539,
 44237]

In [13]:
test_ints[1]

[4706,
 10288,
 45065,
 469,
 12926,
 15702,
 39980,
 41619,
 10575,
 13586,
 2782,
 10288,
 32173,
 46082,
 55030,
 37288,
 7771,
 49109,
 44164,
 46256,
 45102,
 29069,
 390,
 23712,
 16183,
 46082,
 36934,
 31305,
 5063,
 10413,
 679,
 54562,
 10413,
 10288,
 12752,
 50255,
 8039,
 4195,
 947,
 52278,
 10288,
 9420,
 1218,
 43184,
 10413,
 31905,
 52279,
 23712,
 36578,
 20337,
 10413,
 1158,
 1218,
 10288,
 23449,
 54692,
 49239,
 37288,
 50594,
 1218,
 10288,
 3310]

### Encoding the labels

Our labels are "positive" or "negative". To use these labels in our network, we need to convert them to 0 and 1.

> **Exercise:** Convert labels from `positive` and `negative` to 1 and 0, respectively.

In [14]:
labels_to_int = {}
int_to_labels = {}
unique_labels = list(set(labels))
for i,label in enumerate(unique_labels):
    labels_to_int[label] = i
    int_to_labels[i] = label
    
int_labels = []

for label in labels:
    int_labels.append(labels_to_int[label])
    
print(labels_to_int)
print(int_to_labels)
print(int_labels[:10])

{'HPL': 0, 'MWS': 1, 'EAP': 2}
{0: 'HPL', 1: 'MWS', 2: 'EAP'}
[2, 0, 2, 1, 0, 1, 2, 2, 2, 1]


In [15]:
encoder = preprocessing.LabelBinarizer()
encoder.fit(list(set(int_labels)))
one_hot_labels = encoder.transform(int_labels)
                                   
one_hot_labels

array([[0, 0, 1],
       [1, 0, 0],
       [0, 0, 1],
       ..., 
       [0, 0, 1],
       [0, 0, 1],
       [1, 0, 0]])

In [16]:
from collections import Counter
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))
print("Minimum length: {}".format(min(review_lens)))
print("Average length: {}".format(sum(review_lens)/len(review_lens)))

Zero-length reviews: 0
Maximum review length: 861
Minimum length: 2
Average length: 92.09655172413792


In [17]:
test_lens = Counter([len(x) for x in test_ints])
print("Zero-length reviews: {}".format(test_lens[0]))
print("Maximum review length: {}".format(max(test_lens)))
print("Minimum length: {}".format(min(test_lens)))
print("Average length: {}".format(sum(test_lens)/len(test_lens)))

Zero-length reviews: 0
Maximum review length: 818
Minimum length: 3
Average length: 77.57377049180327


The maximum review length is way too many steps for our RNN. Let's truncate to 100 steps. For reviews shorter than 100, we'll pad with 0s. For reviews longer than 100, we can truncate them to the first 200 characters.

Now, create an array `features` that contains the data we'll pass to the network. The data should come from text ints, since we want to feed integers to the network. Each row should be 100 elements long. For lubes shorter than 200 words, left pad with 0s. 


In [18]:
seq_len = 100
features = []

for review in reviews_ints:
    review_size = len(review)
    if review_size < seq_len:
        padded_review = [0] * seq_len
        padded_review[seq_len-len(review):seq_len] = review
    elif review_size > seq_len:
        padded_review = review[:seq_len]
    
    features.append(padded_review)
features  = np.array(features)

In [19]:
features[:10,:100]

array([[    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,  8189,  6231, 55894, 20869,
          962, 29069,  7986,  1218, 42194, 37288, 45747,  1218,  9844,
        13491, 55030, 46256, 54385, 50960, 30648, 10581, 46082, 24424,
        15234, 37288, 32288,  4502, 46256, 38599,  9141, 27525, 40101,
        51384,  1218, 37288, 48094, 36199, 41218, 36998,  6114, 37288,
        47294],
       [    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     

In [20]:
test_features = []

for test_line in test_ints:
    line_size = len(test_line)
    if line_size < seq_len:
        padded_line = [0] * seq_len
        padded_line[seq_len-len(test_line):seq_len] = test_line
    elif line_size > seq_len:
        padded_line = test_line[:seq_len]
        
    test_features.append(padded_line)

test_features = np.array(test_features)

In [21]:
test_features.shape

(8392, 100)

In [22]:
test_features[:10,:100]

array([[    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
        30254, 55030, 46256,  7670,  2639, 28922, 26572,  2782, 25987,
        31282, 46082, 35983,  9844, 23513, 50335, 15702,  7965, 15234,
        15058],
       [    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     

## Training, Validation, Test



With our data in nice shape, we'll split it into training, validation, and test sets.

In [23]:
split_frac = 0.8

split_index  = int(len(features)*split_frac)

train_x, val_x = features[:split_index],features[split_index:]
train_y, val_y = one_hot_labels[:split_index],one_hot_labels[split_index:]

split_index = int(len(val_x)/2)

val_x, test_x = val_x[:split_index],val_x[split_index:]
val_y, test_y = val_y[:split_index],val_y[split_index:]

print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))


			Feature Shapes:
Train set: 		(15663, 100) 
Validation set: 	(1958, 100) 
Test set: 		(1958, 100)


## Build the graph

Here, we'll build the graph. First up, defining the hyperparameters.

* `lstm_size`: Number of units in the hidden layers in the LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.
* `lstm_layers`: Number of LSTM layers in the network. I'd start with 1, then add more if I'm underfitting.
* `batch_size`: The number of reviews to feed the network in one training pass. Typically this should be set as high as you can go without running out of memory.
* `learning_rate`: Learning rate

In [24]:
lstm_size = 500
lstm_layers = 1
batch_size = 500
learning_rate = 0.005
dropout_prob = 0.5

For the network itself, we'll be passing in our 100 element long review vectors. Each batch will be `batch_size` vectors. We'll also be using dropout on the LSTM layer, so we'll make a placeholder for the keep probability.

In [25]:
n_words = len(vocab_to_int)

# Create the graph object
graph = tf.Graph()
# Add nodes to the graph
with graph.as_default():
    inputs_ = tf.placeholder(tf.int32,shape=[batch_size,None],name="inputs")
    labels_ = tf.placeholder(tf.int32,shape=[batch_size,len(unique_labels)],name = "labels")
    keep_prob = tf.placeholder(tf.float32,name ="keep_prob")
    learning_rate_ = tf.placeholder(tf.float32,name = "learning_rate") 

### Embedding

Now we'll add an embedding layer. We need to do this because there are many words in our vocabulary. It is massively inefficient to one-hot encode our classes here. Instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using word2vec, then load it here. But, it's fine to just make a new layer and let the network learn the weights.

 Create the embedding lookup matrix as a `tf.Variable`. Use that embedding matrix to get the embedded vectors to pass to the LSTM cell with [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup). This function takes the embedding matrix and an input tensor, such as the review vectors. Then, it'll return another tensor with the embedded vectors. So, if the embedding layer has 200 units, the function will return a tensor with size [batch_size, 200].



In [26]:
# Size of the embedding vectors (number of units in the embedding layer)
embed_size = 300 

with graph.as_default():
    embedding = tf.Variable(tf.random_uniform((n_words,embed_size),-0.5,0.5))
    embed = tf.nn.embedding_lookup(embedding,inputs_)

### LSTM cell

<img src="assets/network_diagram.png" width=400px>

Next, we'll create our LSTM cells to use in the recurrent network ([TensorFlow documentation](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn)). Here we are just defining what the cells look like. This isn't actually building the graph, just defining the type of cells we want in our graph.

To create a basic LSTM cell for the graph, you'll want to use `tf.contrib.rnn.BasicLSTMCell`. Looking at the function documentation:

```
tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, input_size=None, state_is_tuple=True, activation=<function tanh at 0x109f1ef28>)
```

you can see it takes a parameter called `num_units`, the number of units in the cell, called `lstm_size` in this code. So then, you can write something like 

```
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
```

to create an LSTM cell with `num_units`. Next, you can add dropout to the cell with `tf.contrib.rnn.DropoutWrapper`. This just wraps the cell in another cell, but with dropout added to the inputs and/or outputs. It's a really convenient way to make your network better with almost no effort! So you'd do something like

```
drop = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)
```

Most of the time, your network will have better performance with more layers. That's sort of the magic of deep learning, adding more layers allows the network to learn really complex relationships. Again, there is a simple way to create multiple layers of LSTM cells with `tf.contrib.rnn.MultiRNNCell`:

```
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
```

Here, `[drop] * lstm_layers` creates a list of cells (`drop`) that is `lstm_layers` long. The `MultiRNNCell` wrapper builds this into multiple layers of RNN cells, one for each cell in the list.

So the final cell you're using in the network is actually multiple (or just one) LSTM cells with dropout. But it all works the same from an achitectural viewpoint, just a more complicated graph in the cell.

> **Exercise:** Below, use `tf.contrib.rnn.BasicLSTMCell` to create an LSTM cell. Then, add drop out to it with `tf.contrib.rnn.DropoutWrapper`. Finally, create multiple LSTM layers with `tf.contrib.rnn.MultiRNNCell`.

Here is [a tutorial on building RNNs](https://www.tensorflow.org/tutorials/recurrent) that will help you out.


In [27]:
with graph.as_default():
    # Your basic LSTM cell
    #lstm = tf.contrib.rnn.BasicLSTMCell(num_units=lstm_size)
    
    # Add dropout to the cell
    #drop = tf.contrib.rnn.DropoutWrapper(lstm,output_keep_prob=keep_prob)
    
    # Stack up multiple LSTM layers, for deep learning
    #cell = tf.contrib.rnn.MultiRNNCell([drop]*lstm_layers)
    cell_list = [tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(num_units=lstm_size) ,output_keep_prob=keep_prob)  ]
    cell = tf.contrib.rnn.MultiRNNCell(cell_list)
    # Getting an initial state of all zeros
    initial_state = cell.zero_state(batch_size, tf.float32)

### RNN forward pass

<img src="assets/network_diagram.png" width=400px>

Now we need to actually run the data through the RNN nodes. You can use [`tf.nn.dynamic_rnn`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) to do this. You'd pass in the RNN cell you created (our multiple layered LSTM `cell` for instance), and the inputs to the network.

```
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
```

Above I created an initial state, `initial_state`, to pass to the RNN. This is the cell state that is passed between the hidden layers in successive time steps. `tf.nn.dynamic_rnn` takes care of most of the work for us. We pass in our cell and the input to the cell, then it does the unrolling and everything else for us. It returns outputs for each time step and the final_state of the hidden layer.

> **Exercise:** Use `tf.nn.dynamic_rnn` to add the forward pass through the RNN. Remember that we're actually passing in vectors from the embedding layer, `embed`.



In [28]:
with graph.as_default():
    outputs, final_state = tf.nn.dynamic_rnn(cell,embed,initial_state=initial_state)

### Output

We only care about the final output, we'll be using that as our sentiment prediction. So we need to grab the last output with `outputs[:, -1]`, the calculate the cost from that and `labels_`.

In [29]:
with graph.as_default():
    fully_connected = tf.contrib.layers.fully_connected(outputs[:, -1], 10, activation_fn=tf.nn.relu)
    fully_connected = tf.contrib.layers.fully_connected(fully_connected, 3, activation_fn=tf.nn.relu)
    logits = tf.identity(fully_connected)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels=labels_))
    
    optimizer = tf.train.AdamOptimizer(learning_rate_).minimize(cost)

### Validation accuracy

Here we can add a few nodes to calculate the accuracy which we'll use in the validation pass.

In [30]:
with graph.as_default():
    predictions = tf.nn.softmax(logits)
    predictions_hardmax = tf.argmax(predictions,1)
#    correct_pred = tf.equal(tf.cast(tf.round(predictions), tf.int32), labels_)
#    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

### Batching

This is a simple function for returning batches from our data. First it removes data such that we only have full batches. Then it iterates through the `x` and `y` arrays and returns slices out of those arrays with size `[batch_size]`.

In [31]:
def get_batches(x, y, batch_size=100):
    
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]

## Training

Below is the typical training code. If you want to do this yourself, feel free to delete all this code and implement it yourself. Before you run this, make sure the `checkpoints` directory exists.

In [None]:
def calc_classification_metrics(predictions,real_values):
    accuracy =  sum(predictions == real_values)/predictions.shape[0] # metrics.accuracy_score(predictions,real_values)
    error = 1 - accuracy
    precision = 0# metrics.precision_score(predictions,real_values)
    recall = 0#metrics.recall_score(predictions,real_values)
    
    return accuracy,error,precision,recall


In [None]:
epochs = 5

with graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    iteration = 1
    for e in range(epochs):
        state = sess.run(initial_state)
        
        for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
            feed_dict = {inputs_: x,
                    labels_: y,
                    keep_prob: dropout_prob,
                    initial_state: state,
                        learning_rate_ : learning_rate}
            loss, state, _ = sess.run([cost, final_state, optimizer], feed_dict=feed_dict)
            
            if iteration%5==0:
                val_acc = []
                val_costs = []
                
                train_prediction_hardmax = sess.run(predictions_hardmax,feed_dict=feed_dict)
                train_real_hardmax = np.argmax(y,1)
                train_accuracy,train_error,train_precision,train_recall  = calc_classification_metrics(train_prediction_hardmax,train_real_hardmax)
                
                val_state = sess.run(cell.zero_state(batch_size, tf.float32))
                
                for x, y in get_batches(val_x, val_y, batch_size):
                    feed_dict = {inputs_: x,
                            labels_: y,
                            keep_prob: 1,
                            initial_state: val_state,
                        learning_rate_ : learning_rate}
                    
                    val_prediction = sess.run(predictions,feed_dict=feed_dict)
                    val_cost = sess.run(cost,feed_dict=feed_dict)
                    val_prediction_hardmax = sess.run(predictions_hardmax,feed_dict=feed_dict)
                    val_real_hardmax = np.argmax(y,1)
                    val_accuracy,val_error,val_precision,val_recall  = calc_classification_metrics(val_prediction_hardmax,val_real_hardmax)
                    val_acc.append(val_accuracy)
                    val_costs.append(val_cost)
                
                val_cost = np.mean(val_costs)  
                val_acc  = np.mean(val_acc)
                print("Epoch: {}/{}".format(e+1, epochs),
                      "Iteration: {}".format(iteration),
                      "Train loss: {:.3f}".format(loss),
                      "Train accuracy: {:.3f}".format(train_accuracy),
                     "Train error: {:.3f}".format(train_error),
                      "Cal cost: {:.3f}".format(val_cost),
                      "Val acc: {:.3f}".format(val_acc)
                     )

                    
            if iteration%25==0:
                val_acc = []
                val_costs = []
                val_state = sess.run(cell.zero_state(batch_size, tf.float32))
                for x, y in get_batches(val_x, val_y, batch_size):
                    feed_dict = {inputs_: x,
                            labels_: y,
                            keep_prob: 1,
                            initial_state: val_state,
                        learning_rate_ : learning_rate}
                    
                    val_prediction = sess.run(predictions,feed_dict=feed_dict)
                    val_cost = sess.run(cost,feed_dict=feed_dict)
                    val_prediction_hardmax = sess.run(predictions_hardmax,feed_dict=feed_dict)
                    val_real_hardmax = np.argmax(y,1)
                    val_accuracy,val_error,val_precision,val_recall  = calc_classification_metrics(val_prediction_hardmax,val_real_hardmax)
                    val_acc.append(val_accuracy)
                    val_costs.append(val_cost)
                    
                print("val cost",np.mean(val_cost) , np.mean(val_acc) )
                    
                    #batch_acc, val_state = sess.run([accuracy, final_state], feed_dict=feed)
                    #val_acc.append(batch_acc)
                #print("Val acc: {:.3f}".format(np.mean(val_acc)))
            iteration +=1
            learning_rate = 0.99 * learning_rate
    saver.save(sess, "checkpoints/checkpoint.ckpt")

Epoch: 1/5 Iteration: 5 Train loss: 1.162 Train accuracy: 0.430 Train error: 0.570 Cal cost: 1.090 Val acc: 0.391
Epoch: 1/5 Iteration: 10 Train loss: 1.099 Train accuracy: 0.286 Train error: 0.714 Cal cost: 1.099 Val acc: 0.303
Epoch: 1/5 Iteration: 15 Train loss: 1.099 Train accuracy: 0.310 Train error: 0.690 Cal cost: 1.099 Val acc: 0.303
Epoch: 1/5 Iteration: 20 Train loss: 1.099 Train accuracy: 0.270 Train error: 0.730 Cal cost: 1.099 Val acc: 0.303
Epoch: 1/5 Iteration: 25 Train loss: 1.099 Train accuracy: 0.278 Train error: 0.722 Cal cost: 1.099 Val acc: 0.303
val cost 1.09861 0.302666666667
Epoch: 1/5 Iteration: 30 Train loss: 1.099 Train accuracy: 0.300 Train error: 0.700 Cal cost: 1.099 Val acc: 0.303
Epoch: 2/5 Iteration: 35 Train loss: 1.099 Train accuracy: 0.280 Train error: 0.720 Cal cost: 1.099 Val acc: 0.303
Epoch: 2/5 Iteration: 40 Train loss: 1.099 Train accuracy: 0.298 Train error: 0.702 Cal cost: 1.099 Val acc: 0.303
Epoch: 2/5 Iteration: 45 Train loss: 1.099 Train 

KeyboardInterrupt: 

## Testing

In [None]:
test_acc = []
with tf.Session(graph=graph) as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    test_state = sess.run(cell.zero_state(batch_size, tf.float32))
    for ii, (x, y) in enumerate(get_batches(test_x, test_y, batch_size), 1):
        print(x.shape,y.shape)
        feed_dict = {inputs_: x,
                labels_: y,
                keep_prob: 1,
                initial_state: test_state,
                        learning_rate_ : learning_rate}
        
        test_prediction = sess.run(predictions,feed_dict=feed_dict)
        test_cost = sess.run(cost,feed_dict=feed_dict)
        test_prediction_hardmax = sess.run(predictions_hardmax,feed_dict=feed_dict)
        test_real_hardmax = np.argmax(y,1)
        test_accuracy,test_error,test_precision,test_recall  = calc_classification_metrics(test_prediction_hardmax,test_real_hardmax)
        print("test cost",test_cost , test_accuracy )
        #batch_acc, test_state = sess.run([accuracy, final_state], feed_dict=feed)
        #test_acc.append(batch_acc)
        #print(x.shape,y.shape)
    #print("Test accuracy: {:.3f}".format(np.mean(test_acc)))

## Generate submission file

In [None]:
with tf.Session(graph=graph) as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    test_state = sess.run(cell.zero_state(batch_size, tf.float32))
    
    full_batches =  int(test_features.shape[0]/batch_size)
    submit_file = open("submission.csv","w")
    csv_writer = csv.writer(submit_file,delimiter = ",")
    header = ["id","EAP","HPL","MWS"]
    csv_writer.writerow(header)
    
    # TODO: find a way to send 1 line at a time without having to take special care of last incomplete batch
    for i in range(full_batches):
        batch = test_features[ (i*batch_size):((i+1)*batch_size)]

        feed_dict = {inputs_: batch,
                keep_prob: 1,
                initial_state: test_state,
                        learning_rate_ : learning_rate}
    
        test_prediction = sess.run(predictions,feed_dict=feed_dict)
        test_prediction_hardmax = np.argmax(test_prediction,1)
    
        for j,prediction in enumerate(test_prediction_hardmax):
            EAP_prob = test_prediction[j,labels_to_int["EAP"]]
            HPL_prob = test_prediction[j,labels_to_int["HPL"]]
            MWS_prob = test_prediction[j,labels_to_int["MWS"]]
        
            line = [test_ids[(i*batch_size + j)] , EAP_prob, HPL_prob, MWS_prob]
            csv_writer.writerow(line)
        
        print("finished batch {}".format(i))
            #print("Finished writing {}".format(i))
            #print(test_ids[i] ,test[i][:50] , " ", int_to_labels[test_prediction_hardmax] , EAP_prob, HPL_prob, MWS_prob)
       
    if test_features.shape[0]%batch_size != 0:
        print("Last minibatch" ,full_batches*batch_size,",", test_features.shape[0])
        
        i+=1
        batch = np.zeros((batch_size,test_features.shape[1]))
        batch[0:(test_features.shape[0]-full_batches*batch_size),:] = test_features[full_batches*batch_size:,:]
        
        feed_dict = {inputs_: batch,
                keep_prob: 1,
                initial_state: test_state,
                        learning_rate_ : learning_rate}
    
        test_prediction = sess.run(predictions,feed_dict=feed_dict)
        test_prediction_hardmax = np.argmax(test_prediction,1)

        for j in range(test_features.shape[0]-full_batches*batch_size):
            EAP_prob = test_prediction[j,labels_to_int["EAP"]]
            HPL_prob = test_prediction[j,labels_to_int["HPL"]]
            MWS_prob = test_prediction[j,labels_to_int["MWS"]]
        
            line = [test_ids[(i*batch_size + j)] , EAP_prob, HPL_prob, MWS_prob]
            csv_writer.writerow(line)
        
        print("finished batch {}".format(i))

    submit_file.close()