# Using MI-RNN on the HAR Dataset

This is an example of how the existing MI-RNN implementation can be used to train a model with shorter input sequence length on the HAR dataset. We are actively working on releasing a better implementation of both `MI-RNN` and `EMI-RNN`. This notebook only illustrates some of the features/methods we have. For instance, usage of features like embeddings, regularizers, various losses, dropout layers, various RNN cells etc are not illustrated here.

Please note that, in the preprint of our work, we use the terms *bag* and *instance* to refer to the LSTM input sequence of original length and the shorter ones we want to learn to predict on, respectively. In the code though, *bag* is replaced with *instance* and *instance* is replaced with *sub-instance*. To avoid ambiguity, we will use the terms *bag* and *sub-instance*  throughout this document.

The network used here is a simple LSTM + Linear classifier network. 

The UCI [Human Activity Recognition](https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones) dataset.

In [7]:
import numpy as np
import os
import tensorflow as tf
import time
import sys
sys.path.insert(0, '../')
# TODO: Explain these methods
from edgeml.mi_rnn import analysisModelMultiClass
from edgeml.mi_rnn import NetworkV2
from edgeml.mi_rnn import updateYPolicy4

# Loading Data
Please download the UCI datset from the above link and use your favorite data loading methods to set up (`x_train`, `y_train`) and (`x_val`, `y_val`) numpy arrays.

### Data Preparation

[Typical RNN models](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb) by convention, use a 3 dimensional tensor for the input data. This tensor is of shape `[number of examples, number of time steps, number of features]`. To incorporate the notion of *bags* and *sub-instances*, we extend this by adding an additional fourth dimension, thus making our input data shape - `[number of bags, number of sub-instances, number of time steps, number of features]`. Additionally, the typical shape of the one-hot encoded label tensor - `[number of examples, number of outputs]` is extended to incorporate sub-instance level labels, thus making it `[number of bags, number of sub-instances, number of output classes]`.

Specifically for HAR dataset, the data creation algorithm looks something like this.

```
def createData(X, Y, subinstanceWidth, subinstanceStride):
    '''
    Here X and Y are time series input from HAR and their labels. This methods
    chops the sequences into temporarily ordered set of sub-instances. All 
    sub-instances are given the same label as the bag.
    '''
    assert len(X) == len(Y)
    assert len(X.shape) == 3
    assert len(Y.shape) == 2
    
    X_out = []
    Y_out = []
    
    for i in range(len(X)):
        bag = X[i]
        bagLabel = Y[i]
        
        instances = breakBagIntoInstances(bag, subinstanceWidth, subinstaceStride)
        instanceLabels = [Y[i]] * len(instances)
        X_out.append(instances)
        Y_out.append(instanceLabels)
```

In [2]:
x_train, y_train = np.load('./HAR/48_16/x_train.npy'), np.load('./HAR/48_16/y_train.npy')
x_test, y_test = np.load('./HAR/48_16/x_test.npy'), np.load('./HAR/48_16/y_test.npy')
x_val, y_val = np.load('./HAR/48_16/x_val.npy'), np.load('./HAR/48_16/y_val.npy')

# BAG_TEST, BAG_TRAIN, BAG_VAL are used as part of some of the analysis methods
# These are BAG level labels.
BAG_TEST = np.argmax(y_test[:, 0, :], axis=1)
BAG_TRAIN = np.argmax(y_train[:, 0, :], axis=1)
BAG_VAL = np.argmax(y_val[:, 0, :], axis=1)

print("x_train shape is:", x_train.shape)
print("y_train shape is:", y_train.shape)
print("x_test shape is:", x_val.shape)
print("y_test shape is:", y_val.shape)

x_train shape is: (6220, 6, 48, 9)
y_train shape is: (6220, 6, 6)
x_test shape is: (1132, 6, 48, 9)
y_test shape is: (1132, 6, 6)


In [3]:
# These are the parameters that are required to create the training graph
SUBINSTANCE_WIDTH = 48
SUBINSTANCE_STRIDE = 16
NUM_SUBINSTANCE = x_val.shape[1]
NUM_TIME_STEPS = x_val.shape[2]
NUM_FEATS = x_val.shape[3]
NUM_HIDDEN = 16
# Even though we are using a linear layer, the linear matrix is
# decomposed into two matrices. That is, W = W1 * W2
# NUM_FC is the common dimension of W1 and W2. Its value is of
# no consequence without a non-linearity or low-rank restrictions
# and hence NUM_FC= NUM_HIDDEN  is a good default.
NUM_FC = NUM_HIDDEN
NUM_OUTPUT = 6
NUM_ITER = 3
NUM_ROUNDS = 5
MODELDIR = '/tmp/model_dump/'
# After each `round` of MI-RNN, the labels are updated based on a policy
# of picking the top-k `most likely positive` elements from a bag and
# setting the label of everything else to `noise/negative`. This happens
# only if k >= MIN_SUBSEQUENCE_LEN
MIN_SUBSEQUENCE_LEN = 3

# Training parameters. To make sure datset API is used efficiently,
# these parameters need to be known apriori.
trainingParams = {
    'batch_size': 256,
    'max_epochs': 50,
    'learning_rate_start': 0.001,
}
print('Num subinstance', NUM_SUBINSTANCE)
print('Num time steps', NUM_TIME_STEPS)
print('Num feats', NUM_FEATS)

Num subinstance 6
Num time steps 48
Num feats 9


# Training

Both *MI-RNN* and *EMI-RNN* training happens in multiple *rounds*. Each round consists of two phases, the training phase where we learn the best possible model for the current information of sub-instance labels, followed by the label update phase where we use the best model we have to update the label information of the instances.

In [4]:
currentRound = 0
reuse = False
currY = np.array(y_train)
while currentRound < NUM_ROUNDS:
    print("%s" %( '-' * 10))
    print("Round %d" % (currentRound))
    print("%s" %( '-' * 10))
    globalStepBase = 20000 + currentRound * 100
    accList = []
    modelList = []
    # Start training. We save a model after each interation and 
    # reload the one with the best validation set performance later
    for i in range(NUM_ITER):
        print("Iteration %d " % (i))
        if reuse is False:
            print("Generating graph %d" % i)
            tf.reset_default_graph()
            network = NetworkV2(NUM_SUBINSTANCE, NUM_FEATS,
                                NUM_TIME_STEPS, NUM_HIDDEN,
                                NUM_FC, NUM_OUTPUT)
            network.createGraph(stepSize=trainingParams['learning_rate_start'])
        
        network.trainModel(x_train, y_train, x_val, y_val, trainingParams, reuse=reuse)
        network.checkpointModel(MODELDIR, max_to_keep=1000,
                                global_step = globalStepBase + i)
        rawOut, softmaxOut, labelOut = network.inference(x_val, 50000)
        trueLabels = np.argmax(y_val, axis=2)
        f = open(os.devnull, 'w')
        df = analysisModelMultiClass(labelOut, trueLabels, BAG_VAL,
                                     NUM_SUBINSTANCE, numClass=NUM_OUTPUT,
                                     redirFile=f)
        f.close()
        acc = np.max(df.acc.values)
        # ssl: subsequence length
        print("Val Accuracy: %f @ssl %d" % (acc, np.argmax(df.acc.values) + 1))
        accList.append(acc)
        modelList.append((MODELDIR, globalStepBase + i))
        reuse=True

    # Load the next best model
    idx = np.argmax(accList)
    modelname, global_step = modelList[idx]
    tf.reset_default_graph()
    network = NetworkV2(NUM_SUBINSTANCE, NUM_FEATS, NUM_TIME_STEPS,
                        NUM_HIDDEN, NUM_FC, NUM_OUTPUT, useCudnn=False)
    graph = network.importModelTF(modelname, global_step)
    rawOut, softmaxOut, labelOut = network.inference(x_val, 50000)
    trueLabels = np.argmax(y_val, axis=2)
    f = open(os.devnull, 'w')
    df = analysisModelMultiClass(labelOut, trueLabels, BAG_VAL,
                                NUM_SUBINSTANCE, numClass=NUM_OUTPUT, redirFile=f)
    f.close()
    print("\nVal Accuracy: %f @ssl %d" % (np.max(df.acc.values), np.argmax(df.acc.values) + 1))
  
    # Update label information
    _, softmaxOut, _ = network.inference(x_train, 50000)
    newY = updateYPolicy4(currY, softmaxOut, BAG_TRAIN,
                          numClasses=NUM_OUTPUT, k=MIN_SUBSEQUENCE_LEN)
    currY = newY
    currentRound += 1

----------
Round 0
----------
Iteration 0 
Generating graph 0
Using softmax loss
GPU Fraction: 1.0
Executing 50 epochs
Epoch  48 Batch     0 ( 1200) Loss 0.11596 Accuracy 0.94792 
Model saved to /tmp/model_dump/, global_step 20000
Val Accuracy: 0.961131 @ssl 2
Iteration 1 
Reusing previous session
Reusing previous init
Executing 50 epochs
Epoch  48 Batch     0 ( 1200) Loss 0.10304 Accuracy 0.94336 
Model saved to /tmp/model_dump/, global_step 20001
Val Accuracy: 0.945230 @ssl 3
Iteration 2 
Reusing previous session
Reusing previous init
Executing 50 epochs
Epoch  48 Batch     0 ( 1200) Loss 0.09217 Accuracy 0.94727 
Model saved to /tmp/model_dump/, global_step 20002
Val Accuracy: 0.954947 @ssl 2
INFO:tensorflow:Restoring parameters from /tmp/model_dump/-20000
Restoring /tmp/model_dump/-20000

Val Accuracy: 0.961131 @ssl 2
----------
Round 1
----------
Iteration 0 
Reusing previous session
Reusing previous init
Executing 50 epochs
Epoch  48 Batch     0 ( 1200) Loss 0.10374 Accuracy 0.94

## Saving and Restoring 


In [5]:
network.checkpointModel('/tmp/model00_', 1000)
tf.reset_default_graph()
network = NetworkV2(NUM_SUBINSTANCE, NUM_FEATS, NUM_TIME_STEPS, NUM_HIDDEN, NUM_FC, NUM_OUTPUT, useCudnn=False)
_ = network.importModelTF('/tmp/model00_', 1000)

Model saved to /tmp/model00_, global_step 1000
INFO:tensorflow:Restoring parameters from /tmp/model00_-1000
Restoring /tmp/model00_-1000


## Test Stats

In [6]:
_, softmaxOut, predictions = network.inference(x_test, 1000)
trueLabels = np.argmax(y_test, axis=2)
bagTest = np.argmax(y_test, axis=2)[:, 0]
df = analysisModelMultiClass(predictions, trueLabels,
                        bagTest, NUM_SUBINSTANCE,
                        numClass=NUM_OUTPUT)

   len       acc  macro-fsc  macro-pre  macro-rec  micro-fsc  micro-pre  \
0    1  0.893112   0.893449   0.897172   0.896078   0.893112   0.893112   
1    2  0.908381   0.909552   0.910828   0.911163   0.908381   0.908381   
2    3  0.913811   0.915271   0.915584   0.916341   0.913811   0.913811   
3    4  0.895148   0.896470   0.901949   0.896227   0.895148   0.895148   
4    5  0.875467   0.877612   0.891945   0.875583   0.875467   0.875467   
5    6  0.857822   0.861301   0.886443   0.856575   0.857822   0.857822   

   micro-rec  
0   0.893112  
1   0.908381  
2   0.913811  
3   0.895148  
4   0.875467  
5   0.857822  
Max accuracy 0.913811 at subsequencelength 3
Max micro-f 0.913811 at subsequencelength 3
Micro-precision 0.913811 at subsequencelength 3
Micro-recall 0.913811 at subsequencelength 3
Max macro-f 0.915271 at subsequencelength 3
macro-precision 0.915584 at subsequencelength 3
macro-recall 0.916341 at subsequencelength 3
Fraction false alarm 0.103495 (308/2976) 
