## RNN for Human Activity Recognition - 2D Pose Input

This experiment is the classification of human activities using a 2D pose time series dataset and an LSTM RNN.
The idea is to prove the concept that using a series of 2D poses, rather than 3D poses or a raw 2D images, can produce an accurate estimation of the behaviour of a person or animal.
This is a step towards creating a method of classifying an animal's current behaviour state and predicting it's likely next state, allowing for better interaction with an autonomous mobile robot.

## Objectives

The aims of this experiment are:

-  To determine if 2D pose has comparable accuracy to 3D pose for use in activity recognition. This would allow the use of RGB only cameras for human and animal pose estimation, as opposed to RGBD or a large motion capture dataset.


- To determine if  2D pose has comparable accuracy to using raw RGB images for use in activity recognition. This is based on the idea that limiting the input feature vector can help to deal with a limited dataset, as is likely to occur in animal activity recognition, by allowing for a smaller model to be used (citation required).


- To verify the concept for use in future works involving behaviour prediction from motion in 2D images.

The network used in this experiment is based on that of Guillaume Chevalier, 'LSTMs for Human Activity Recognition, 2016'  https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition, available under the MIT License.
Notable changes that have been made (other than accounting for dataset sizes) are:
 - Adapting for use with a large dataset ordered by class, using random sampling without replacement for mini-batch.  
 This allows for use of smaller batch sizes when using a dataset ordered by class. "It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize"  
      _N.S Keskar, D. Mudigere, et al, 'On Large-Batch Training for Deep Learning: Generalization Gap and Sharp 
      Minima', ICLR 2017_ https://arxiv.org/abs/1609.04836
      
 - Exponentially decaying learning rate implemented



## Dataset overview

The dataset consists of pose estimations, made using the software OpenPose (https://github.com/CMU-Perceptual-Computing-Lab/openpose's) on a subset of the Berkeley Multimodal Human Action Database (MHAD) dataset http://tele-immersion.citris-uc.org/berkeley_mhad.

This dataset is comprised of 12 subjects doing the following 6 actions for 5 repetitions, filmed from 4 angles, repeated 5 times each.  

- JUMPING,
- JUMPING_JACKS,
- BOXING,
- WAVING_2HANDS,
- WAVING_1HAND,
- CLAPPING_HANDS.

In total, there are 1438 videos (2 were missing) made up of 211200 individual frames.

The below image is an example of the 4 camera views during the 'boxing' action for subject 1

![alt text](images/boxing_all_views.gif.png "Title")

The input for the LSTM is the 2D position of 18 joints across a timeseries of frames numbering n_steps (window-width), with an associated class label for the frame series.  
A single frame's input (where j refers to a joint) is stored as:

[  j0_x,  j0_y, j1_x, j1_y , j2_x, j2_y, j3_x, j3_y, j4_x, j4_y, j5_x, j5_y, j6_x, j6_y, j7_x, j7_y, j8_x, j8_y, j9_x, j9_y, j10_x, j10_y, j11_x, j11_y, j12_x, j12_y, j13_x, j13_y, j14_x, j14_y, j15_x, j15_y, j16_x, j16_y, j17_x, j17_y ]

For the following experiment, very little preprocessing has been done to the dataset.  
The following steps were taken:
1. openpose run on individual frames, for each subject, action and view, outputting JSON of 18 joint x and y position keypoints and accuracies per frame
2. JSONs converted into txt format, keeping only x and y positions of each frame, action being performed during frame, and order of frames. This is used to create a database of associated activity class number and corresponding series of joint 2D positions
3. No further prepossessing was performed.  

In some cases, multiple people were detected in each frame, in which only the first detection was used.

The data has not been normalised with regards to subject position in the frame, motion across frame (if any), size of the subject, speed of action etc. It is essentially the raw 2D position of each joint viewed from a stationary camera.  
In many cases, individual joints were not located and a position of [0.0,0.0] was given for that joint

A summary of the dataset used for input is:

 - 211200 individual images 
 - n_steps = 32 frames (~=1.5s at 22Hz)
 - Images with noisy pose detection (detection of >=2 people) = 5132  
 - Training_split = 0.8
 - Overlap = 0.8125 (26 / 32) ie 26 frame overlap
   - Length X_train = 22625 * 32 frames
   - Length X_test = 5751 * 32 frames
   
Note that their is no overlap between test and train sets, which were seperated by activity repetition entirely, before creating the 26 of 32 frame overlap.




## Training and Results below: 
Training took approximately 4 mins running on a single GTX1080Ti, and was run for 22,000,000ish iterations with a batch size of 5000  (600 epochs)


In [1]:

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import tensorflow as tf  # Version 1.0.0 (some previous versions are used in past commits)
from sklearn import metrics
import random
from random import randint
import time
import os

## Preparing dataset:

In [2]:
# Useful Constants

# Output classes to learn how to classify
LABELS = [    
    "JUMPING",
    "JUMPING_JACKS",
    "BOXING",
    "WAVING_2HANDS",
    "WAVING_1HAND",
    "CLAPPING_HANDS"

] 
DATASET_PATH = "data/HAR_pose_activities/database/"

X_train_path = DATASET_PATH + "X_train.txt"
X_test_path = DATASET_PATH + "X_test.txt"

y_train_path = DATASET_PATH + "Y_train.txt"
y_test_path = DATASET_PATH + "Y_test.txt"

n_steps = 32 # 32 timesteps per series

In [3]:

# Load the networks inputs

def load_X(X_path):
    file = open(X_path, 'r')
    X_ = np.array(
        [elem for elem in [
            row.split(',') for row in file
        ]], 
        dtype=np.float32
    )
    file.close()
    blocks = int(len(X_) / n_steps)
    
    X_ = np.array(np.split(X_,blocks))

    return X_ 

# Load the networks outputs

def load_y(y_path):
    file = open(y_path, 'r')
    y_ = np.array(
        [elem for elem in [
            row.replace('  ', ' ').strip().split(' ') for row in file
        ]], 
        dtype=np.int32
    )
    file.close()
    
    # for 0-based indexing 
    return y_ - 1

X_train = load_X(X_train_path)
X_test = load_X(X_test_path)
#print X_test

y_train = load_y(y_train_path)
y_test = load_y(y_test_path)
# proof that it actually works for the skeptical: replace labelled classes with random classes to train on
#for i in range(len(y_train)):
#    y_train[i] = randint(0, 5)


## Set Parameters:


In [4]:
# Input Data 

training_data_count = len(X_train)  # 4519 training series (with 50% overlap between each serie)
test_data_count = len(X_test)  # 1197 test series
n_input = len(X_train[0][0])  # num input parameters per timestep

n_hidden = 34 # Hidden layer num of features
n_classes = 6 

#updated for learning-rate decay
# calculated as: decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
decaying_learning_rate = True
learning_rate = 0.0025 #used if decaying_learning_rate set to False
init_learning_rate = 0.005
decay_rate = 0.96 #the base of the exponential in the decay
decay_steps = 100000 #used in decay every 60000 steps with a base of 0.96

global_step = tf.Variable(0, trainable=False)
lambda_loss_amount = 0.0015

training_iters = training_data_count *300  # Loop 300 times on the dataset, ie 300 epochs
batch_size = 512
display_iter = batch_size*8  # To show test set accuracy during training

print("(X shape, y shape, every X's mean, every X's standard deviation)")
print((X_train.shape, y_test.shape, np.mean(X_test), np.std(X_test)))
print("\nThe dataset has not been preprocessed, is not normalised etc")




(X shape, y shape, every X's mean, every X's standard deviation)
((22625, 32, 36), (5751, 1), 251.01117, 126.12204)

The dataset has not been preprocessed, is not normalised etc


## Utility functions for training:

In [5]:
def LSTM_RNN(_X, _weights, _biases):
    # model architecture based on "guillaume-chevalier" and "aymericdamien" under the MIT license.

    _X = tf.transpose(_X, [1, 0, 2])  # permute n_steps and batch_size
    _X = tf.reshape(_X, [-1, n_input])   
    # Rectifies Linear Unit activation function used
    _X = tf.nn.relu(tf.matmul(_X, _weights['hidden']) + _biases['hidden'])
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(_X, n_steps, 0) 

    # Define two stacked LSTM cells (two recurrent layers deep) with tensorflow
    lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
    lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
    lstm_cells = tf.contrib.rnn.MultiRNNCell([lstm_cell_1, lstm_cell_2], state_is_tuple=True)
    outputs, states = tf.contrib.rnn.static_rnn(lstm_cells, _X, dtype=tf.float32)

    # A single output is produced, in style of "many to one" classifier, refer to http://karpathy.github.io/2015/05/21/rnn-effectiveness/ for details
    lstm_last_output = outputs[-1]
    
    # Linear activation
    return tf.matmul(lstm_last_output, _weights['out']) + _biases['out']


def extract_batch_size(_train, _labels, _unsampled, batch_size):
    # Fetch a "batch_size" amount of data and labels from "(X|y)_train" data. 
    # Elements of each batch are chosen randomly, without replacement, from X_train with corresponding label from Y_train
    # unsampled_indices keeps track of sampled data ensuring non-replacement. Resets when remaining datapoints < batch_size    
    
    shape = list(_train.shape)
    shape[0] = batch_size
    batch_s = np.empty(shape)
    batch_labels = np.empty((batch_size,1)) 

    for i in range(batch_size):
        # Loop index
        # index = random sample from _unsampled (indices)
        index = random.choice(_unsampled)
        batch_s[i] = _train[index] 
        batch_labels[i] = _labels[index]
        _unsampled.remove(index)


    return batch_s, batch_labels, _unsampled


def one_hot(y_):
    # One hot encoding of the network outputs
    # e.g.: [[5], [0], [3]] --> [[0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0]]
    
    y_ = y_.reshape(len(y_))
    n_values = int(np.max(y_)) + 1
    return np.eye(n_values)[np.array(y_, dtype=np.int32)]  # Returns FLOATS



## Build the network:

In [6]:

# Graph input/output
x = tf.placeholder(tf.float32, [None, n_steps, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])

# Graph weights
weights = {
    'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
    'out': tf.Variable(tf.random_normal([n_hidden, n_classes], mean=1.0))
}
biases = {
    'hidden': tf.Variable(tf.random_normal([n_hidden])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

pred = LSTM_RNN(x, weights, biases)

# Loss, optimizer and evaluation
l2 = lambda_loss_amount * sum(
    tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables()
) # L2 loss prevents this overkill neural network to overfit the data
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=pred)) + l2 # Softmax loss
if decaying_learning_rate:
    learning_rate = tf.train.exponential_decay(init_learning_rate, global_step*batch_size, decay_steps, decay_rate, staircase=True)


#decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps) #exponentially decayed learning rate
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost,global_step=global_step) # Adam Optimizer

correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))



Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



## Train the network:

In [7]:
test_losses = []
test_accuracies = []
train_losses = []
train_accuracies = []
sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement=True))
init = tf.global_variables_initializer()
sess.run(init)

# Perform Training steps with "batch_size" amount of data at each loop. 
# Elements of each batch are chosen randomly, without replacement, from X_train, 
# restarting when remaining datapoints < batch_size
step = 1
time_start = time.time()
unsampled_indices = list(range(0,len(X_train)))

while step * batch_size <= training_iters:
    #print (sess.run(learning_rate)) #decaying learning rate
    #print (sess.run(global_step)) # global number of iterations
    if len(unsampled_indices) < batch_size:
        unsampled_indices = list(range(0,len(X_train))) 
    batch_xs, raw_labels, unsampled_indicies = extract_batch_size(X_train, y_train, unsampled_indices, batch_size)
    batch_ys = one_hot(raw_labels)
    # check that encoded output is same length as num_classes, if not, pad it 
    if len(batch_ys[0]) < n_classes:
        temp_ys = np.zeros((batch_size, n_classes))
        temp_ys[:batch_ys.shape[0],:batch_ys.shape[1]] = batch_ys
        batch_ys = temp_ys
       
    

    # Fit training using batch data
    _, loss, acc = sess.run(
        [optimizer, cost, accuracy],
        feed_dict={
            x: batch_xs, 
            y: batch_ys
        }
    )
    train_losses.append(loss)
    train_accuracies.append(acc)
    
    # Evaluate network only at some steps for faster training: 
    if (step*batch_size % display_iter == 0) or (step == 1) or (step * batch_size > training_iters):
        
        # To not spam console, show training accuracy/loss in this "if"
        print(("Iter #" + str(step*batch_size) + \
              ":  Learning rate = " + "{:.6f}".format(sess.run(learning_rate)) + \
              ":   Batch Loss = " + "{:.6f}".format(loss) + \
              ", Accuracy = {}".format(acc)))
        
        # Evaluation on the test set (no learning made here - just evaluation for diagnosis)
        loss, acc = sess.run(
            [cost, accuracy], 
            feed_dict={
                x: X_test,
                y: one_hot(y_test)
            }
        )
        test_losses.append(loss)
        test_accuracies.append(acc)
        print(("PERFORMANCE ON TEST SET:             " + \
              "Batch Loss = {}".format(loss) + \
              ", Accuracy = {}".format(acc)))

    step += 1

print("Optimization Finished!")

# Accuracy for test data

one_hot_predictions, accuracy, final_loss = sess.run(
    [pred, accuracy, cost],
    feed_dict={
        x: X_test,
        y: one_hot(y_test)
    }
)

test_losses.append(final_loss)
test_accuracies.append(accuracy)

print(("FINAL RESULT: " + \
      "Batch Loss = {}".format(final_loss) + \
      ", Accuracy = {}".format(accuracy)))
time_stop = time.time()
print(("TOTAL TIME:  {}".format(time_stop - time_start)))

Iter #512:  Learning rate = 0.005000:   Batch Loss = 3.443691, Accuracy = 0.193359375
PERFORMANCE ON TEST SET:             Batch Loss = 3.2098851203918457, Accuracy = 0.24395757913589478
Iter #4096:  Learning rate = 0.005000:   Batch Loss = 2.988659, Accuracy = 0.326171875
PERFORMANCE ON TEST SET:             Batch Loss = 2.981539011001587, Accuracy = 0.2804729640483856
Iter #8192:  Learning rate = 0.005000:   Batch Loss = 2.966253, Accuracy = 0.271484375
PERFORMANCE ON TEST SET:             Batch Loss = 2.9049172401428223, Accuracy = 0.3110763430595398
Iter #12288:  Learning rate = 0.005000:   Batch Loss = 2.824858, Accuracy = 0.30078125
PERFORMANCE ON TEST SET:             Batch Loss = 2.819492816925049, Accuracy = 0.3625456392765045
Iter #16384:  Learning rate = 0.005000:   Batch Loss = 2.715844, Accuracy = 0.37109375
PERFORMANCE ON TEST SET:             Batch Loss = 2.7031614780426025, Accuracy = 0.4246217906475067
Iter #20480:  Learning rate = 0.005000:   Batch Loss = 2.651761, Ac

Iter #180224:  Learning rate = 0.004800:   Batch Loss = 1.489767, Accuracy = 0.642578125
PERFORMANCE ON TEST SET:             Batch Loss = 1.4729933738708496, Accuracy = 0.6645800471305847
Iter #184320:  Learning rate = 0.004800:   Batch Loss = 1.393531, Accuracy = 0.755859375
PERFORMANCE ON TEST SET:             Batch Loss = 1.479292869567871, Accuracy = 0.7108328938484192
Iter #188416:  Learning rate = 0.004800:   Batch Loss = 1.377377, Accuracy = 0.73828125
PERFORMANCE ON TEST SET:             Batch Loss = 1.4335952997207642, Accuracy = 0.7303077578544617
Iter #192512:  Learning rate = 0.004800:   Batch Loss = 1.510360, Accuracy = 0.697265625
PERFORMANCE ON TEST SET:             Batch Loss = 1.4225130081176758, Accuracy = 0.7268301248550415
Iter #196608:  Learning rate = 0.004800:   Batch Loss = 1.525767, Accuracy = 0.67578125
PERFORMANCE ON TEST SET:             Batch Loss = 1.5131696462631226, Accuracy = 0.684750497341156
Iter #200704:  Learning rate = 0.004608:   Batch Loss = 1.4

Iter #360448:  Learning rate = 0.004424:   Batch Loss = 1.145965, Accuracy = 0.787109375
PERFORMANCE ON TEST SET:             Batch Loss = 1.1548290252685547, Accuracy = 0.7864719033241272
Iter #364544:  Learning rate = 0.004424:   Batch Loss = 1.148329, Accuracy = 0.775390625
PERFORMANCE ON TEST SET:             Batch Loss = 1.1318941116333008, Accuracy = 0.7977743148803711
Iter #368640:  Learning rate = 0.004424:   Batch Loss = 1.098094, Accuracy = 0.82421875
PERFORMANCE ON TEST SET:             Batch Loss = 1.140174150466919, Accuracy = 0.7960354685783386
Iter #372736:  Learning rate = 0.004424:   Batch Loss = 1.131756, Accuracy = 0.78125
PERFORMANCE ON TEST SET:             Batch Loss = 1.1669214963912964, Accuracy = 0.7866458296775818
Iter #376832:  Learning rate = 0.004424:   Batch Loss = 1.131867, Accuracy = 0.806640625
PERFORMANCE ON TEST SET:             Batch Loss = 1.264470100402832, Accuracy = 0.7593461871147156
Iter #380928:  Learning rate = 0.004424:   Batch Loss = 1.4897

Iter #540672:  Learning rate = 0.004077:   Batch Loss = 1.162638, Accuracy = 0.791015625
PERFORMANCE ON TEST SET:             Batch Loss = 1.11753511428833, Accuracy = 0.7882107496261597
Iter #544768:  Learning rate = 0.004077:   Batch Loss = 1.050351, Accuracy = 0.814453125
PERFORMANCE ON TEST SET:             Batch Loss = 1.1103224754333496, Accuracy = 0.7918623089790344
Iter #548864:  Learning rate = 0.004077:   Batch Loss = 1.073121, Accuracy = 0.78125
PERFORMANCE ON TEST SET:             Batch Loss = 1.0775294303894043, Accuracy = 0.7974265217781067
Iter #552960:  Learning rate = 0.004077:   Batch Loss = 1.071327, Accuracy = 0.77734375
PERFORMANCE ON TEST SET:             Batch Loss = 1.2235503196716309, Accuracy = 0.7183098793029785
Iter #557056:  Learning rate = 0.004077:   Batch Loss = 1.012646, Accuracy = 0.818359375
PERFORMANCE ON TEST SET:             Batch Loss = 1.060521125793457, Accuracy = 0.8012519478797913
Iter #561152:  Learning rate = 0.004077:   Batch Loss = 1.12804

Iter #720896:  Learning rate = 0.003757:   Batch Loss = 0.962194, Accuracy = 0.81640625
PERFORMANCE ON TEST SET:             Batch Loss = 1.0032888650894165, Accuracy = 0.8062945604324341
Iter #724992:  Learning rate = 0.003757:   Batch Loss = 0.968796, Accuracy = 0.8046875
PERFORMANCE ON TEST SET:             Batch Loss = 0.9442123174667358, Accuracy = 0.8233350515365601
Iter #729088:  Learning rate = 0.003757:   Batch Loss = 0.872686, Accuracy = 0.83984375
PERFORMANCE ON TEST SET:             Batch Loss = 0.9173749685287476, Accuracy = 0.8355068564414978
Iter #733184:  Learning rate = 0.003757:   Batch Loss = 1.006192, Accuracy = 0.810546875
PERFORMANCE ON TEST SET:             Batch Loss = 1.0161293745040894, Accuracy = 0.8003825545310974
Iter #737280:  Learning rate = 0.003757:   Batch Loss = 0.904955, Accuracy = 0.8515625
PERFORMANCE ON TEST SET:             Batch Loss = 0.9284632205963135, Accuracy = 0.8287254571914673
Iter #741376:  Learning rate = 0.003757:   Batch Loss = 0.876

Iter #901120:  Learning rate = 0.003463:   Batch Loss = 0.879590, Accuracy = 0.865234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.8341667652130127, Accuracy = 0.8715006113052368
Iter #905216:  Learning rate = 0.003463:   Batch Loss = 0.807900, Accuracy = 0.87890625
PERFORMANCE ON TEST SET:             Batch Loss = 0.8246880769729614, Accuracy = 0.8628064393997192
Iter #909312:  Learning rate = 0.003463:   Batch Loss = 0.770255, Accuracy = 0.904296875
PERFORMANCE ON TEST SET:             Batch Loss = 0.8105103969573975, Accuracy = 0.874978244304657
Iter #913408:  Learning rate = 0.003463:   Batch Loss = 0.793084, Accuracy = 0.890625
PERFORMANCE ON TEST SET:             Batch Loss = 0.8273019790649414, Accuracy = 0.8716744780540466
Iter #917504:  Learning rate = 0.003463:   Batch Loss = 0.764719, Accuracy = 0.884765625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7962151169776917, Accuracy = 0.8815858364105225
Iter #921600:  Learning rate = 0.003463:   Batch Loss = 0.81

Iter #1081344:  Learning rate = 0.003324:   Batch Loss = 0.737302, Accuracy = 0.896484375
PERFORMANCE ON TEST SET:             Batch Loss = 0.7985130548477173, Accuracy = 0.8725439310073853
Iter #1085440:  Learning rate = 0.003324:   Batch Loss = 0.714041, Accuracy = 0.916015625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7493040561676025, Accuracy = 0.8855851292610168
Iter #1089536:  Learning rate = 0.003324:   Batch Loss = 0.731184, Accuracy = 0.900390625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7733303308486938, Accuracy = 0.887671709060669
Iter #1093632:  Learning rate = 0.003324:   Batch Loss = 0.730109, Accuracy = 0.896484375
PERFORMANCE ON TEST SET:             Batch Loss = 0.7900664806365967, Accuracy = 0.8732394576072693
Iter #1097728:  Learning rate = 0.003324:   Batch Loss = 0.734137, Accuracy = 0.904296875
PERFORMANCE ON TEST SET:             Batch Loss = 0.7546867728233337, Accuracy = 0.8892366290092468
Iter #1101824:  Learning rate = 0.003191:   Batch L

PERFORMANCE ON TEST SET:             Batch Loss = 0.7091628313064575, Accuracy = 0.9005390405654907
Iter #1261568:  Learning rate = 0.003064:   Batch Loss = 0.687014, Accuracy = 0.9140625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7221885323524475, Accuracy = 0.89393150806427
Iter #1265664:  Learning rate = 0.003064:   Batch Loss = 0.702498, Accuracy = 0.900390625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7529261112213135, Accuracy = 0.8744566440582275
Iter #1269760:  Learning rate = 0.003064:   Batch Loss = 0.729218, Accuracy = 0.876953125
PERFORMANCE ON TEST SET:             Batch Loss = 0.7567006945610046, Accuracy = 0.8655886054039001
Iter #1273856:  Learning rate = 0.003064:   Batch Loss = 0.754587, Accuracy = 0.876953125
PERFORMANCE ON TEST SET:             Batch Loss = 0.7482759952545166, Accuracy = 0.8725439310073853
Iter #1277952:  Learning rate = 0.003064:   Batch Loss = 0.712511, Accuracy = 0.88671875
PERFORMANCE ON TEST SET:             Batch Loss = 0.739

Iter #1437696:  Learning rate = 0.002823:   Batch Loss = 0.671382, Accuracy = 0.90234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.67801833152771, Accuracy = 0.903147280216217
Iter #1441792:  Learning rate = 0.002823:   Batch Loss = 0.691365, Accuracy = 0.916015625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7490030527114868, Accuracy = 0.8845418095588684
Iter #1445888:  Learning rate = 0.002823:   Batch Loss = 0.631433, Accuracy = 0.92578125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6797589063644409, Accuracy = 0.9057555198669434
Iter #1449984:  Learning rate = 0.002823:   Batch Loss = 0.713754, Accuracy = 0.892578125
PERFORMANCE ON TEST SET:             Batch Loss = 0.7087175846099854, Accuracy = 0.889758288860321
Iter #1454080:  Learning rate = 0.002823:   Batch Loss = 0.705267, Accuracy = 0.89453125
PERFORMANCE ON TEST SET:             Batch Loss = 0.7114067077636719, Accuracy = 0.89393150806427
Iter #1458176:  Learning rate = 0.002823:   Batch Loss = 0.

PERFORMANCE ON TEST SET:             Batch Loss = 0.6663814783096313, Accuracy = 0.8953225612640381
Iter #1617920:  Learning rate = 0.002602:   Batch Loss = 0.695652, Accuracy = 0.896484375
PERFORMANCE ON TEST SET:             Batch Loss = 0.683653712272644, Accuracy = 0.8974091410636902
Iter #1622016:  Learning rate = 0.002602:   Batch Loss = 0.628829, Accuracy = 0.927734375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6696704626083374, Accuracy = 0.8956702947616577
Iter #1626112:  Learning rate = 0.002602:   Batch Loss = 0.694410, Accuracy = 0.888671875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6719844341278076, Accuracy = 0.9015823602676392
Iter #1630208:  Learning rate = 0.002602:   Batch Loss = 0.673177, Accuracy = 0.896484375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6546385288238525, Accuracy = 0.9034950733184814
Iter #1634304:  Learning rate = 0.002602:   Batch Loss = 0.632178, Accuracy = 0.919921875
PERFORMANCE ON TEST SET:             Batch Loss = 0

Iter #1794048:  Learning rate = 0.002498:   Batch Loss = 0.652894, Accuracy = 0.908203125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6250854134559631, Accuracy = 0.9142757654190063
Iter #1798144:  Learning rate = 0.002498:   Batch Loss = 0.621490, Accuracy = 0.908203125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6258690357208252, Accuracy = 0.9130585789680481
Iter #1802240:  Learning rate = 0.002398:   Batch Loss = 0.614344, Accuracy = 0.908203125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6502673625946045, Accuracy = 0.9038428068161011
Iter #1806336:  Learning rate = 0.002398:   Batch Loss = 0.610696, Accuracy = 0.923828125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6530312299728394, Accuracy = 0.8908016085624695
Iter #1810432:  Learning rate = 0.002398:   Batch Loss = 0.554605, Accuracy = 0.9453125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6569821834564209, Accuracy = 0.8961919546127319
Iter #1814528:  Learning rate = 0.002398:   Batch Lo

PERFORMANCE ON TEST SET:             Batch Loss = 0.7013863325119019, Accuracy = 0.8694140315055847
Iter #1974272:  Learning rate = 0.002302:   Batch Loss = 0.692620, Accuracy = 0.869140625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7767398357391357, Accuracy = 0.8440271019935608
Iter #1978368:  Learning rate = 0.002302:   Batch Loss = 0.687622, Accuracy = 0.873046875
PERFORMANCE ON TEST SET:             Batch Loss = 0.7089723348617554, Accuracy = 0.8683707118034363
Iter #1982464:  Learning rate = 0.002302:   Batch Loss = 0.709499, Accuracy = 0.8671875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6897467374801636, Accuracy = 0.8781081438064575
Iter #1986560:  Learning rate = 0.002302:   Batch Loss = 0.710552, Accuracy = 0.859375
PERFORMANCE ON TEST SET:             Batch Loss = 0.7566589117050171, Accuracy = 0.848721981048584
Iter #1990656:  Learning rate = 0.002302:   Batch Loss = 0.659780, Accuracy = 0.900390625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7025

Iter #2150400:  Learning rate = 0.002122:   Batch Loss = 0.629214, Accuracy = 0.896484375
PERFORMANCE ON TEST SET:             Batch Loss = 0.7490875720977783, Accuracy = 0.8561989068984985
Iter #2154496:  Learning rate = 0.002122:   Batch Loss = 0.675553, Accuracy = 0.873046875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6628682613372803, Accuracy = 0.8794991970062256
Iter #2158592:  Learning rate = 0.002122:   Batch Loss = 0.644423, Accuracy = 0.896484375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6688339114189148, Accuracy = 0.8817597031593323
Iter #2162688:  Learning rate = 0.002122:   Batch Loss = 0.640624, Accuracy = 0.900390625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6431322693824768, Accuracy = 0.8864545226097107
Iter #2166784:  Learning rate = 0.002122:   Batch Loss = 0.590928, Accuracy = 0.912109375
PERFORMANCE ON TEST SET:             Batch Loss = 0.649789571762085, Accuracy = 0.8861067891120911
Iter #2170880:  Learning rate = 0.002122:   Batch L

PERFORMANCE ON TEST SET:             Batch Loss = 0.6281614899635315, Accuracy = 0.8855851292610168
Iter #2330624:  Learning rate = 0.001955:   Batch Loss = 0.607813, Accuracy = 0.89453125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6549673080444336, Accuracy = 0.8848896026611328
Iter #2334720:  Learning rate = 0.001955:   Batch Loss = 0.620078, Accuracy = 0.8984375
PERFORMANCE ON TEST SET:             Batch Loss = 0.7026662826538086, Accuracy = 0.8650669455528259
Iter #2338816:  Learning rate = 0.001955:   Batch Loss = 0.652165, Accuracy = 0.876953125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6594418287277222, Accuracy = 0.8739349842071533
Iter #2342912:  Learning rate = 0.001955:   Batch Loss = 0.640375, Accuracy = 0.884765625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6762666702270508, Accuracy = 0.8732394576072693
Iter #2347008:  Learning rate = 0.001955:   Batch Loss = 0.644694, Accuracy = 0.876953125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6

Iter #2506752:  Learning rate = 0.001802:   Batch Loss = 0.575405, Accuracy = 0.916015625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6037972569465637, Accuracy = 0.9029734134674072
Iter #2510848:  Learning rate = 0.001802:   Batch Loss = 0.601929, Accuracy = 0.90625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6345435380935669, Accuracy = 0.889758288860321
Iter #2514944:  Learning rate = 0.001802:   Batch Loss = 0.563745, Accuracy = 0.919921875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6004548668861389, Accuracy = 0.9026256203651428
Iter #2519040:  Learning rate = 0.001802:   Batch Loss = 0.627489, Accuracy = 0.90234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6211704015731812, Accuracy = 0.8981046676635742
Iter #2523136:  Learning rate = 0.001802:   Batch Loss = 0.588544, Accuracy = 0.908203125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6182491779327393, Accuracy = 0.8942792415618896
Iter #2527232:  Learning rate = 0.001802:   Batch Loss =

PERFORMANCE ON TEST SET:             Batch Loss = 0.6377271413803101, Accuracy = 0.8761954307556152
Iter #2686976:  Learning rate = 0.001730:   Batch Loss = 0.612992, Accuracy = 0.892578125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6650246381759644, Accuracy = 0.8739349842071533
Iter #2691072:  Learning rate = 0.001730:   Batch Loss = 0.576297, Accuracy = 0.912109375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6212512254714966, Accuracy = 0.8934098482131958
Iter #2695168:  Learning rate = 0.001730:   Batch Loss = 0.640105, Accuracy = 0.892578125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6328848600387573, Accuracy = 0.889062762260437
Iter #2699264:  Learning rate = 0.001730:   Batch Loss = 0.562501, Accuracy = 0.91015625
PERFORMANCE ON TEST SET:             Batch Loss = 0.5927513837814331, Accuracy = 0.9045383334159851
Iter #2703360:  Learning rate = 0.001661:   Batch Loss = 0.527490, Accuracy = 0.927734375
PERFORMANCE ON TEST SET:             Batch Loss = 0.

Iter #2863104:  Learning rate = 0.001594:   Batch Loss = 0.565066, Accuracy = 0.91015625
PERFORMANCE ON TEST SET:             Batch Loss = 0.5760874152183533, Accuracy = 0.9101026058197021
Iter #2867200:  Learning rate = 0.001594:   Batch Loss = 0.597264, Accuracy = 0.896484375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6043939590454102, Accuracy = 0.8963658213615417
Iter #2871296:  Learning rate = 0.001594:   Batch Loss = 0.581056, Accuracy = 0.90234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.5751151442527771, Accuracy = 0.910276472568512
Iter #2875392:  Learning rate = 0.001594:   Batch Loss = 0.582533, Accuracy = 0.8984375
PERFORMANCE ON TEST SET:             Batch Loss = 0.5867111086845398, Accuracy = 0.9029734134674072
Iter #2879488:  Learning rate = 0.001594:   Batch Loss = 0.568077, Accuracy = 0.923828125
PERFORMANCE ON TEST SET:             Batch Loss = 0.5981618165969849, Accuracy = 0.9062771797180176
Iter #2883584:  Learning rate = 0.001594:   Batch Loss 

PERFORMANCE ON TEST SET:             Batch Loss = 1.3771271705627441, Accuracy = 0.5605981349945068
Iter #3043328:  Learning rate = 0.001469:   Batch Loss = 1.427388, Accuracy = 0.53515625
PERFORMANCE ON TEST SET:             Batch Loss = 1.3852101564407349, Accuracy = 0.5705094933509827
Iter #3047424:  Learning rate = 0.001469:   Batch Loss = 1.426882, Accuracy = 0.513671875
PERFORMANCE ON TEST SET:             Batch Loss = 1.3768491744995117, Accuracy = 0.5564249753952026
Iter #3051520:  Learning rate = 0.001469:   Batch Loss = 1.305362, Accuracy = 0.62109375
PERFORMANCE ON TEST SET:             Batch Loss = 1.4456404447555542, Accuracy = 0.5336463451385498
Iter #3055616:  Learning rate = 0.001469:   Batch Loss = 1.332794, Accuracy = 0.6015625
PERFORMANCE ON TEST SET:             Batch Loss = 1.3926156759262085, Accuracy = 0.5579898953437805
Iter #3059712:  Learning rate = 0.001469:   Batch Loss = 1.300701, Accuracy = 0.60546875
PERFORMANCE ON TEST SET:             Batch Loss = 1.322

Iter #3219456:  Learning rate = 0.001354:   Batch Loss = 1.119003, Accuracy = 0.6953125
PERFORMANCE ON TEST SET:             Batch Loss = 1.2012624740600586, Accuracy = 0.6534515619277954
Iter #3223552:  Learning rate = 0.001354:   Batch Loss = 1.128667, Accuracy = 0.6953125
PERFORMANCE ON TEST SET:             Batch Loss = 1.178431749343872, Accuracy = 0.6531038284301758
Iter #3227648:  Learning rate = 0.001354:   Batch Loss = 1.150551, Accuracy = 0.67578125
PERFORMANCE ON TEST SET:             Batch Loss = 1.1890510320663452, Accuracy = 0.648756742477417
Iter #3231744:  Learning rate = 0.001354:   Batch Loss = 1.123146, Accuracy = 0.685546875
PERFORMANCE ON TEST SET:             Batch Loss = 1.1878457069396973, Accuracy = 0.6492784023284912
Iter #3235840:  Learning rate = 0.001354:   Batch Loss = 1.175129, Accuracy = 0.666015625
PERFORMANCE ON TEST SET:             Batch Loss = 1.205246925354004, Accuracy = 0.6327595114707947
Iter #3239936:  Learning rate = 0.001354:   Batch Loss = 1

PERFORMANCE ON TEST SET:             Batch Loss = 1.1336591243743896, Accuracy = 0.6727525591850281
Iter #3399680:  Learning rate = 0.001300:   Batch Loss = 1.093849, Accuracy = 0.685546875
PERFORMANCE ON TEST SET:             Batch Loss = 1.1492259502410889, Accuracy = 0.6670144200325012
Iter #3403776:  Learning rate = 0.001248:   Batch Loss = 1.134454, Accuracy = 0.666015625
PERFORMANCE ON TEST SET:             Batch Loss = 1.1465977430343628, Accuracy = 0.6567553281784058
Iter #3407872:  Learning rate = 0.001248:   Batch Loss = 1.096257, Accuracy = 0.685546875
PERFORMANCE ON TEST SET:             Batch Loss = 1.1354620456695557, Accuracy = 0.679881751537323
Iter #3411968:  Learning rate = 0.001248:   Batch Loss = 1.062147, Accuracy = 0.724609375
PERFORMANCE ON TEST SET:             Batch Loss = 1.1351237297058105, Accuracy = 0.6680577397346497
Iter #3416064:  Learning rate = 0.001248:   Batch Loss = 1.069797, Accuracy = 0.703125
PERFORMANCE ON TEST SET:             Batch Loss = 1.14

Iter #3575808:  Learning rate = 0.001198:   Batch Loss = 1.053490, Accuracy = 0.712890625
PERFORMANCE ON TEST SET:             Batch Loss = 1.1346937417984009, Accuracy = 0.6687532663345337
Iter #3579904:  Learning rate = 0.001198:   Batch Loss = 1.049306, Accuracy = 0.708984375
PERFORMANCE ON TEST SET:             Batch Loss = 1.1239850521087646, Accuracy = 0.6718831658363342
Iter #3584000:  Learning rate = 0.001198:   Batch Loss = 1.028569, Accuracy = 0.7109375
PERFORMANCE ON TEST SET:             Batch Loss = 1.1275895833969116, Accuracy = 0.6654495000839233
Iter #3588096:  Learning rate = 0.001198:   Batch Loss = 1.047595, Accuracy = 0.720703125
PERFORMANCE ON TEST SET:             Batch Loss = 1.1068658828735352, Accuracy = 0.6760563254356384
Iter #3592192:  Learning rate = 0.001198:   Batch Loss = 1.028149, Accuracy = 0.6953125
PERFORMANCE ON TEST SET:             Batch Loss = 1.0957376956939697, Accuracy = 0.6821422576904297
Iter #3596288:  Learning rate = 0.001198:   Batch Loss

PERFORMANCE ON TEST SET:             Batch Loss = 1.1009389162063599, Accuracy = 0.6685793995857239
Iter #3756032:  Learning rate = 0.001104:   Batch Loss = 0.952083, Accuracy = 0.765625
PERFORMANCE ON TEST SET:             Batch Loss = 1.0631070137023926, Accuracy = 0.7026603817939758
Iter #3760128:  Learning rate = 0.001104:   Batch Loss = 1.009430, Accuracy = 0.71875
PERFORMANCE ON TEST SET:             Batch Loss = 1.084692120552063, Accuracy = 0.6882281303405762
Iter #3764224:  Learning rate = 0.001104:   Batch Loss = 1.027204, Accuracy = 0.712890625
PERFORMANCE ON TEST SET:             Batch Loss = 1.0526703596115112, Accuracy = 0.6998782753944397
Iter #3768320:  Learning rate = 0.001104:   Batch Loss = 0.991101, Accuracy = 0.7265625
PERFORMANCE ON TEST SET:             Batch Loss = 1.0565831661224365, Accuracy = 0.6917057633399963
Iter #3772416:  Learning rate = 0.001104:   Batch Loss = 0.976948, Accuracy = 0.748046875
PERFORMANCE ON TEST SET:             Batch Loss = 1.05484700

Iter #3932160:  Learning rate = 0.001018:   Batch Loss = 1.000557, Accuracy = 0.734375
PERFORMANCE ON TEST SET:             Batch Loss = 1.0626215934753418, Accuracy = 0.6868370771408081
Iter #3936256:  Learning rate = 0.001018:   Batch Loss = 0.986354, Accuracy = 0.751953125
PERFORMANCE ON TEST SET:             Batch Loss = 1.0877684354782104, Accuracy = 0.6934446096420288
Iter #3940352:  Learning rate = 0.001018:   Batch Loss = 0.976178, Accuracy = 0.7265625
PERFORMANCE ON TEST SET:             Batch Loss = 1.05836820602417, Accuracy = 0.6885759234428406
Iter #3944448:  Learning rate = 0.001018:   Batch Loss = 0.935875, Accuracy = 0.75
PERFORMANCE ON TEST SET:             Batch Loss = 1.0496618747711182, Accuracy = 0.6946617960929871
Iter #3948544:  Learning rate = 0.001018:   Batch Loss = 0.938619, Accuracy = 0.734375
PERFORMANCE ON TEST SET:             Batch Loss = 1.030840516090393, Accuracy = 0.7016171216964722
Iter #3952640:  Learning rate = 0.001018:   Batch Loss = 1.023839, A

Iter #4112384:  Learning rate = 0.000938:   Batch Loss = 0.896551, Accuracy = 0.759765625
PERFORMANCE ON TEST SET:             Batch Loss = 1.0184893608093262, Accuracy = 0.7049208879470825
Iter #4116480:  Learning rate = 0.000938:   Batch Loss = 0.967005, Accuracy = 0.728515625
PERFORMANCE ON TEST SET:             Batch Loss = 1.0276590585708618, Accuracy = 0.7070074677467346
Iter #4120576:  Learning rate = 0.000938:   Batch Loss = 0.979365, Accuracy = 0.724609375
PERFORMANCE ON TEST SET:             Batch Loss = 1.0111485719680786, Accuracy = 0.7127456068992615
Iter #4124672:  Learning rate = 0.000938:   Batch Loss = 0.967235, Accuracy = 0.74609375
PERFORMANCE ON TEST SET:             Batch Loss = 1.0207183361053467, Accuracy = 0.705964207649231
Iter #4128768:  Learning rate = 0.000938:   Batch Loss = 0.986807, Accuracy = 0.732421875
PERFORMANCE ON TEST SET:             Batch Loss = 1.0316143035888672, Accuracy = 0.7090940475463867
Iter #4132864:  Learning rate = 0.000938:   Batch Lo

Iter #4292608:  Learning rate = 0.000900:   Batch Loss = 0.949747, Accuracy = 0.748046875
PERFORMANCE ON TEST SET:             Batch Loss = 1.0108600854873657, Accuracy = 0.7070074677467346
Iter #4296704:  Learning rate = 0.000900:   Batch Loss = 0.951696, Accuracy = 0.728515625
PERFORMANCE ON TEST SET:             Batch Loss = 1.0118211507797241, Accuracy = 0.7106590270996094
Iter #4300800:  Learning rate = 0.000864:   Batch Loss = 0.940695, Accuracy = 0.73828125
PERFORMANCE ON TEST SET:             Batch Loss = 1.0058324337005615, Accuracy = 0.7087463140487671
Iter #4304896:  Learning rate = 0.000864:   Batch Loss = 0.901848, Accuracy = 0.7734375
PERFORMANCE ON TEST SET:             Batch Loss = 0.9863067865371704, Accuracy = 0.7245696187019348
Iter #4308992:  Learning rate = 0.000864:   Batch Loss = 0.931673, Accuracy = 0.751953125
PERFORMANCE ON TEST SET:             Batch Loss = 0.988301157951355, Accuracy = 0.7181359529495239
Iter #4313088:  Learning rate = 0.000864:   Batch Loss

PERFORMANCE ON TEST SET:             Batch Loss = 0.961143970489502, Accuracy = 0.7346548438072205
Iter #4472832:  Learning rate = 0.000830:   Batch Loss = 0.871040, Accuracy = 0.779296875
PERFORMANCE ON TEST SET:             Batch Loss = 0.9829246401786804, Accuracy = 0.7259607315063477
Iter #4476928:  Learning rate = 0.000830:   Batch Loss = 0.870632, Accuracy = 0.765625
PERFORMANCE ON TEST SET:             Batch Loss = 0.9581794142723083, Accuracy = 0.7417840361595154
Iter #4481024:  Learning rate = 0.000830:   Batch Loss = 0.884204, Accuracy = 0.771484375
PERFORMANCE ON TEST SET:             Batch Loss = 0.9626569747924805, Accuracy = 0.7376108765602112
Iter #4485120:  Learning rate = 0.000830:   Batch Loss = 0.848904, Accuracy = 0.791015625
PERFORMANCE ON TEST SET:             Batch Loss = 0.9930193424224854, Accuracy = 0.7334376573562622
Iter #4489216:  Learning rate = 0.000830:   Batch Loss = 0.930747, Accuracy = 0.740234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.96

Iter #4648960:  Learning rate = 0.000765:   Batch Loss = 0.705964, Accuracy = 0.865234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.7369072437286377, Accuracy = 0.8351591229438782
Iter #4653056:  Learning rate = 0.000765:   Batch Loss = 0.680278, Accuracy = 0.87109375
PERFORMANCE ON TEST SET:             Batch Loss = 0.765882134437561, Accuracy = 0.8238567113876343
Iter #4657152:  Learning rate = 0.000765:   Batch Loss = 0.695035, Accuracy = 0.84765625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7637059688568115, Accuracy = 0.8285515308380127
Iter #4661248:  Learning rate = 0.000765:   Batch Loss = 0.678261, Accuracy = 0.8671875
PERFORMANCE ON TEST SET:             Batch Loss = 0.7380936145782471, Accuracy = 0.8400278091430664
Iter #4665344:  Learning rate = 0.000765:   Batch Loss = 0.651109, Accuracy = 0.880859375
PERFORMANCE ON TEST SET:             Batch Loss = 0.7592126131057739, Accuracy = 0.8271604776382446
Iter #4669440:  Learning rate = 0.000765:   Batch Loss 

PERFORMANCE ON TEST SET:             Batch Loss = 0.7464958429336548, Accuracy = 0.8262910842895508
Iter #4829184:  Learning rate = 0.000705:   Batch Loss = 0.654121, Accuracy = 0.86328125
PERFORMANCE ON TEST SET:             Batch Loss = 0.7421196699142456, Accuracy = 0.8342896699905396
Iter #4833280:  Learning rate = 0.000705:   Batch Loss = 0.735023, Accuracy = 0.83203125
PERFORMANCE ON TEST SET:             Batch Loss = 0.7614938020706177, Accuracy = 0.8235089778900146
Iter #4837376:  Learning rate = 0.000705:   Batch Loss = 0.747496, Accuracy = 0.845703125
PERFORMANCE ON TEST SET:             Batch Loss = 0.7486448287963867, Accuracy = 0.826812744140625
Iter #4841472:  Learning rate = 0.000705:   Batch Loss = 0.700068, Accuracy = 0.833984375
PERFORMANCE ON TEST SET:             Batch Loss = 0.7293556928634644, Accuracy = 0.8414188623428345
Iter #4845568:  Learning rate = 0.000705:   Batch Loss = 0.716846, Accuracy = 0.830078125
PERFORMANCE ON TEST SET:             Batch Loss = 0.7

Iter #5005312:  Learning rate = 0.000649:   Batch Loss = 0.694814, Accuracy = 0.841796875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6975452899932861, Accuracy = 0.8568944334983826
Iter #5009408:  Learning rate = 0.000649:   Batch Loss = 0.642920, Accuracy = 0.859375
PERFORMANCE ON TEST SET:             Batch Loss = 0.7268772125244141, Accuracy = 0.8405494689941406
Iter #5013504:  Learning rate = 0.000649:   Batch Loss = 0.686742, Accuracy = 0.84765625
PERFORMANCE ON TEST SET:             Batch Loss = 0.7214590311050415, Accuracy = 0.8388106226921082
Iter #5017600:  Learning rate = 0.000649:   Batch Loss = 0.659742, Accuracy = 0.8671875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6953147649765015, Accuracy = 0.8601981997489929
Iter #5021696:  Learning rate = 0.000649:   Batch Loss = 0.676276, Accuracy = 0.853515625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6917251348495483, Accuracy = 0.8647191524505615
Iter #5025792:  Learning rate = 0.000649:   Batch Loss =

PERFORMANCE ON TEST SET:             Batch Loss = 0.7111485004425049, Accuracy = 0.8379412293434143
Iter #5185536:  Learning rate = 0.000623:   Batch Loss = 0.700493, Accuracy = 0.86328125
PERFORMANCE ON TEST SET:             Batch Loss = 0.7200198769569397, Accuracy = 0.8428099751472473
Iter #5189632:  Learning rate = 0.000623:   Batch Loss = 0.652307, Accuracy = 0.85546875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6861482262611389, Accuracy = 0.8548078536987305
Iter #5193728:  Learning rate = 0.000623:   Batch Loss = 0.708455, Accuracy = 0.84375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6951603889465332, Accuracy = 0.8475047945976257
Iter #5197824:  Learning rate = 0.000623:   Batch Loss = 0.687572, Accuracy = 0.85546875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6823391318321228, Accuracy = 0.8628064393997192
Iter #5201920:  Learning rate = 0.000599:   Batch Loss = 0.710845, Accuracy = 0.841796875
PERFORMANCE ON TEST SET:             Batch Loss = 0.70589

Iter #5361664:  Learning rate = 0.000575:   Batch Loss = 0.674766, Accuracy = 0.86328125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6858168244361877, Accuracy = 0.863502025604248
Iter #5365760:  Learning rate = 0.000575:   Batch Loss = 0.616399, Accuracy = 0.87109375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6762793660163879, Accuracy = 0.8588071465492249
Iter #5369856:  Learning rate = 0.000575:   Batch Loss = 0.631404, Accuracy = 0.873046875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6827919483184814, Accuracy = 0.8528951406478882
Iter #5373952:  Learning rate = 0.000575:   Batch Loss = 0.673075, Accuracy = 0.8515625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6848512291908264, Accuracy = 0.8565467000007629
Iter #5378048:  Learning rate = 0.000575:   Batch Loss = 0.631776, Accuracy = 0.869140625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6643087267875671, Accuracy = 0.8675013184547424
Iter #5382144:  Learning rate = 0.000575:   Batch Loss 

PERFORMANCE ON TEST SET:             Batch Loss = 0.6752297878265381, Accuracy = 0.8603721261024475
Iter #5541888:  Learning rate = 0.000530:   Batch Loss = 0.585315, Accuracy = 0.88671875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6766279935836792, Accuracy = 0.8542861938476562
Iter #5545984:  Learning rate = 0.000530:   Batch Loss = 0.613935, Accuracy = 0.890625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6571332812309265, Accuracy = 0.872022271156311
Iter #5550080:  Learning rate = 0.000530:   Batch Loss = 0.621024, Accuracy = 0.900390625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6688487529754639, Accuracy = 0.8582854866981506
Iter #5554176:  Learning rate = 0.000530:   Batch Loss = 0.620651, Accuracy = 0.873046875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6586526036262512, Accuracy = 0.8626325726509094
Iter #5558272:  Learning rate = 0.000530:   Batch Loss = 0.656878, Accuracy = 0.853515625
PERFORMANCE ON TEST SET:             Batch Loss = 0.657

Iter #5718016:  Learning rate = 0.000488:   Batch Loss = 0.666169, Accuracy = 0.865234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6524797081947327, Accuracy = 0.8721961379051208
Iter #5722112:  Learning rate = 0.000488:   Batch Loss = 0.607160, Accuracy = 0.884765625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6506219506263733, Accuracy = 0.8683707118034363
Iter #5726208:  Learning rate = 0.000488:   Batch Loss = 0.566935, Accuracy = 0.90234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6493678092956543, Accuracy = 0.8708050847053528
Iter #5730304:  Learning rate = 0.000488:   Batch Loss = 0.601438, Accuracy = 0.8671875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6563680171966553, Accuracy = 0.8612415194511414
Iter #5734400:  Learning rate = 0.000488:   Batch Loss = 0.661971, Accuracy = 0.87890625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6619704961776733, Accuracy = 0.8681968450546265
Iter #5738496:  Learning rate = 0.000488:   Batch Loss

PERFORMANCE ON TEST SET:             Batch Loss = 0.669676661491394, Accuracy = 0.8622848391532898
Iter #5898240:  Learning rate = 0.000468:   Batch Loss = 0.644259, Accuracy = 0.875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6751188039779663, Accuracy = 0.8549817204475403
Iter #5902336:  Learning rate = 0.000450:   Batch Loss = 0.573026, Accuracy = 0.896484375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6829656958580017, Accuracy = 0.863502025604248
Iter #5906432:  Learning rate = 0.000450:   Batch Loss = 0.594295, Accuracy = 0.884765625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6445702314376831, Accuracy = 0.8718483448028564
Iter #5910528:  Learning rate = 0.000450:   Batch Loss = 0.561658, Accuracy = 0.9140625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6395452618598938, Accuracy = 0.87984699010849
Iter #5914624:  Learning rate = 0.000450:   Batch Loss = 0.626376, Accuracy = 0.87109375
PERFORMANCE ON TEST SET:             Batch Loss = 0.65097415447

Iter #6074368:  Learning rate = 0.000432:   Batch Loss = 0.562220, Accuracy = 0.904296875
PERFORMANCE ON TEST SET:             Batch Loss = 0.635057270526886, Accuracy = 0.8774126172065735
Iter #6078464:  Learning rate = 0.000432:   Batch Loss = 0.639409, Accuracy = 0.873046875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6433807611465454, Accuracy = 0.8741088509559631
Iter #6082560:  Learning rate = 0.000432:   Batch Loss = 0.573852, Accuracy = 0.890625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6462985277175903, Accuracy = 0.8666318655014038
Iter #6086656:  Learning rate = 0.000432:   Batch Loss = 0.579597, Accuracy = 0.8984375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6365917325019836, Accuracy = 0.8727177977561951
Iter #6090752:  Learning rate = 0.000432:   Batch Loss = 0.581640, Accuracy = 0.888671875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6421946287155151, Accuracy = 0.8699356913566589
Iter #6094848:  Learning rate = 0.000432:   Batch Loss =

PERFORMANCE ON TEST SET:             Batch Loss = 0.6362336874008179, Accuracy = 0.874978244304657
Iter #6254592:  Learning rate = 0.000398:   Batch Loss = 0.607427, Accuracy = 0.865234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6354473233222961, Accuracy = 0.8777604103088379
Iter #6258688:  Learning rate = 0.000398:   Batch Loss = 0.551105, Accuracy = 0.904296875
PERFORMANCE ON TEST SET:             Batch Loss = 0.6358391046524048, Accuracy = 0.8761954307556152
Iter #6262784:  Learning rate = 0.000398:   Batch Loss = 0.593119, Accuracy = 0.900390625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6282408237457275, Accuracy = 0.8777604103088379
Iter #6266880:  Learning rate = 0.000398:   Batch Loss = 0.559768, Accuracy = 0.91015625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6456176042556763, Accuracy = 0.8754999041557312
Iter #6270976:  Learning rate = 0.000398:   Batch Loss = 0.559248, Accuracy = 0.90234375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6

Iter #6430720:  Learning rate = 0.000367:   Batch Loss = 0.637378, Accuracy = 0.87109375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6312770247459412, Accuracy = 0.8800208568572998
Iter #6434816:  Learning rate = 0.000367:   Batch Loss = 0.560880, Accuracy = 0.884765625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6256906390190125, Accuracy = 0.8819335699081421
Iter #6438912:  Learning rate = 0.000367:   Batch Loss = 0.600901, Accuracy = 0.890625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6632147431373596, Accuracy = 0.8577638864517212
Iter #6443008:  Learning rate = 0.000367:   Batch Loss = 0.640434, Accuracy = 0.87890625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6397436261177063, Accuracy = 0.8821074366569519
Iter #6447104:  Learning rate = 0.000367:   Batch Loss = 0.560348, Accuracy = 0.8984375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6253378391265869, Accuracy = 0.8772387504577637
Iter #6451200:  Learning rate = 0.000367:   Batch Loss = 

PERFORMANCE ON TEST SET:             Batch Loss = 0.6732109189033508, Accuracy = 0.8591549396514893
Iter #6610944:  Learning rate = 0.000338:   Batch Loss = 0.573221, Accuracy = 0.87890625
PERFORMANCE ON TEST SET:             Batch Loss = 0.6182905435562134, Accuracy = 0.8822813630104065
Iter #6615040:  Learning rate = 0.000338:   Batch Loss = 0.599846, Accuracy = 0.892578125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6562848091125488, Accuracy = 0.8708050847053528
Iter #6619136:  Learning rate = 0.000338:   Batch Loss = 0.548931, Accuracy = 0.912109375
PERFORMANCE ON TEST SET:             Batch Loss = 0.6370408535003662, Accuracy = 0.8702834248542786
Iter #6623232:  Learning rate = 0.000338:   Batch Loss = 0.651647, Accuracy = 0.8828125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6496499180793762, Accuracy = 0.8702834248542786
Iter #6627328:  Learning rate = 0.000338:   Batch Loss = 0.590708, Accuracy = 0.892578125
PERFORMANCE ON TEST SET:             Batch Loss = 0.6

Iter #6787072:  Learning rate = 0.000324:   Batch Loss = 0.614660, Accuracy = 0.8828125
PERFORMANCE ON TEST SET:             Batch Loss = 0.624006986618042, Accuracy = 0.8800208568572998
Optimization Finished!
FINAL RESULT: Batch Loss = 0.624006986618042, Accuracy = 0.8800208568572998
TOTAL TIME:  2821.4196779727936


## Results:



In [8]:
# (Inline plots: )
%matplotlib inline

font = {
    'family' : 'Bitstream Vera Sans',
    'weight' : 'bold',
    'size'   : 18
}
matplotlib.rc('font', **font)

width = 12
height = 12
plt.figure(figsize=(width, height))

indep_train_axis = np.array(range(batch_size, (len(train_losses)+1)*batch_size, batch_size))
#plt.plot(indep_train_axis, np.array(train_losses),     "b--", label="Train losses")
plt.plot(indep_train_axis, np.array(train_accuracies), "g--", label="Train accuracies")

indep_test_axis = np.append(
    np.array(range(batch_size, len(test_losses)*display_iter, display_iter)[:-1]),
    [training_iters]
)
#plt.plot(indep_test_axis, np.array(test_losses), "b-", linewidth=2.0, label="Test losses")
plt.plot(indep_test_axis, np.array(test_accuracies), "b-", linewidth=2.0, label="Test accuracies")
print len(test_accuracies)
print len(train_accuracies)

plt.title("Training session's Accuracy over Iterations")
plt.legend(loc='lower right', shadow=True)
plt.ylabel('Training Accuracy')
plt.xlabel('Training Iteration')

plt.show()

# Results

predictions = one_hot_predictions.argmax(1)

print("Testing Accuracy: {}%".format(100*accuracy))

print("")
print("Precision: {}%".format(100*metrics.precision_score(y_test, predictions, average="weighted")))
print("Recall: {}%".format(100*metrics.recall_score(y_test, predictions, average="weighted")))
print("f1_score: {}%".format(100*metrics.f1_score(y_test, predictions, average="weighted")))

print("")
print("Confusion Matrix:")
print("Created using test set of {} datapoints, normalised to % of each class in the test dataset".format(len(y_test)))
confusion_matrix = metrics.confusion_matrix(y_test, predictions)


#print(confusion_matrix)
normalised_confusion_matrix = np.array(confusion_matrix, dtype=np.float32)/np.sum(confusion_matrix)*100


# Plot Results: 
width = 12
height = 12
plt.figure(figsize=(width, height))
plt.imshow(
    normalised_confusion_matrix, 
    interpolation='nearest', 
    cmap=plt.cm.Blues
)
plt.title("Confusion matrix \n(normalised to % of total test data)")
plt.colorbar()
tick_marks = np.arange(n_classes)
plt.xticks(tick_marks, LABELS, rotation=90)
plt.yticks(tick_marks, LABELS)
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()


SyntaxError: invalid syntax (<ipython-input-8-07b52e58fc62>, line 25)

In [None]:
#
#X_val_path = DATASET_PATH + "X_val.txt"
#X_val = load_X(X_val_path)
#print X_val
#
#preds = sess.run(
#    [pred],
#    feed_dict={
#        x: X_val
#   }
#)
#
#print preds

In [None]:
#sess.close()
print(test_accuracies)

## Conclusion

Final accuracy of >90% is pretty good, considering that training takes about 7 minutes.

Noticeable confusion between activities of Clapping Hands and Boxing, and between Jumping Jacks and Waving Two Hands which is understandable.

In terms of the applicability of this to a wider dataset, I would imagine that it would be able to work for any activities in which the training included a views from all angles to be tested on. It would be interesting to see it's applicability to camera angles in between the 4 used in this dataset, without training on them specifically.

 Overall, this experiment validates the idea that 2D pose can be used for at least human activity recognition, and provides verification to continue onto use of 2D pose for behaviour estimation in both people and animals
 

 ### With regards to Using LSTM-RNNs
 - Batch sampling
     - It is neccessary to ensure you are not just sampling classes one at a time! (ie y_train is ordered by class and batch chosen in order)The use of random sampling of batches without replacement from the training data resolves this.    
 
 - Architecture
     - Testing has been run using a variety of hidden units per LSTM cell, with results showing that testing accuracy achieves a higher score when using a number of hidden cells approximately equal to that of the input, ie 34. The following figure displays the final accuracy achieved on the testing dataset for a variety of hidden units, all using a batch size of 4096 and 300 epochs (a total of 1657 iterations, with testing performed every 8th iteration).
   
 
 

## Future Works

Inclusion of :

 - A pipeline for qualitative results
 - A validation dataset
 - Momentum     
 - Normalise input data (each point with respect to distribution of itself only)
 - Dropout
 - Comparison of effect of changing batch size
 

Further research will be made into the use on more subtle activity classes, such as walking versus running, agitated movement versus calm movement, and perhaps normal versus abnormal behaviour, based on a baseline of normal motion.


## References

The dataset can be found at http://tele-immersion.citris-uc.org/berkeley_mhad released under the BSD-2 license
>Copyright (c) 2013, Regents of the University of California All rights reserved.

The network used in this experiment is based on the following, available under the [MIT License](https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition/blob/master/LICENSE). :
> Guillaume Chevalier, LSTMs for Human Activity Recognition, 2016
> https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition



In [None]:
# Let's convert this notebook to a README for the GitHub project's title page:
!jupyter nbconvert --to markdown LSTM.ipynb
!mv LSTM.md README.md

## 