<a href="https://colab.research.google.com/github/nyp-sit/it3103/blob/main/week15/human_activity_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Human Activity Recognition using 2D-Pose

In this practical, we will be developing a model to recognise activities such as jumping, boxing, waving 1 hand, etc. The activity is defined as a sequence of human poses (given by keypoints of skeletal joints) and these poses are estimated by a pretrained model (Google's PoseNet).


## Section 1 - Import Libraries and Setup Folders

Let's import all the necessary libraries

In [None]:
import pandas as pd
import os
import numpy as np
import math
from tqdm import tqdm
from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from matplotlib import rc

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Input, Bidirectional, Dropout, LSTM, TimeDistributed, Flatten


## Section 2 - Dataset

We will be using the following dataset from 
https://github.com/stuarteiffert/RNN-for-Human-Activity-Recognition-using-2D-Pose-Input

The data is 2D positions (x,y coordinates) of 18 joints across a timeseries of 32 frames (window-width), with an associated class label for the frame series.

The dataset consist of the following files:
- X_test.txt : testing dataset x inputs (36 keypoints per line, 32 lines per datapoint)
- X_train.txt : training dataset x inputs (36 keypoints per line, 32 lines per datapoint)
- X_val.txt : validation dataset x inputs (36 keypoints per line, 32 lines per datapoint)
- Y_test.txt : testing class labels
- Y_train.txt : training class labels
- Y_val.txt : validation class labels

In [None]:
!wget https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/it3103/2D-Pose-Data.zip
!unzip 2D-Pose-Data.zip

In [None]:
train_df = pd.read_csv('2D-Pose-Data/X_train.txt', header=None)
train_label_df = pd.read_csv('2D-Pose-Data/Y_train.txt', header=None)
test_df = pd.read_csv('2D-Pose-Data/X_test.txt', header=None)
test_label_df = pd.read_csv('2D-Pose-Data/Y_test.txt', header=None)

In [None]:
#examine first few rows
train_df.head()

In [None]:
#check the distribution of the labels
train_label_df.value_counts()

## Section 3- Create the input data

We cannot use the panda dataframe (which is 2D) directly with our LSTM network. We need to create a dataset that consists sequence of 32 timesteps (frames) of 36 keypoints. In other words, we need our data to be of the shape (batch_size, 32, 36).

In addition, we saw earlier that our labels starts from 1 to 6 (total of 6 classes). However, the deep learning model will predict labels starting from 0 to 5.  So we need to map the labels to 0-5 by subtracting the original values by 1. 

Our labels are the following: 

```
labels = ["JUMPING", "JUMPING_JACKS", "BOXING", "WAVING_2HANDS", "WAVING_1HAND", "CLAPPING_HANDS"]
```

In [None]:
# convert the dataframe to numpy array and bunch every 32 rows together as a sequence of 32 timesteps
X_train = train_df.to_numpy().reshape(-1, 32, 36)

# convert labels from 1-6 to 0-5.
y_train = train_label_df.to_numpy() - 1

print(X_train.shape)
print(y_train.shape)

In [None]:
X_test = test_df.to_numpy().reshape(-1, 32, 36)
y_test = test_label_df.to_numpy() - 1

print(X_test.shape)
print(y_test.shape)

## Section 4 - Visualize Our Dataset



We can view each frame as a timestep on the x-axis, and each of the 36 numbers (the x and y coordinates of 18 joints) as individual line plots.  It provides some visual clue as to how the different joints move over time, but they are still difficult to imagine and visualize.

In [None]:
%matplotlib inline 
sample = 0
plt.plot(X_train[sample])
plt.show()

A better way to visualize is do a scatter plot of the X and Y coordianates of the various joints and animating them so that we can see their movements over time.

NOTE: These are the various types of actions captured in the dataset:
JUMPING, JUMPING_JACKS, BOXING, WAVING_2HANDS, WAVING_1HAND", "CLAPPING_HANDS"

In [None]:
sample = 0

# This function returns a set of data for every frame that is
# called from the animation.FuncAnimation below.
#
def animate_pose(frame):
    # Retrieve the even number values as X-coordinates
    # and the odd number values as Y-coordinates
    #
    # Once you have these 2 sets of values, you can
    # pass them into the line.set_data to get matplotlib
    # to draw a scatter plot 
    #
    graph_x = X_train[sample][frame][0::2]
    graph_y = X_train[sample][frame][1::2]
    line.set_data(graph_x, graph_y)
    return line,

fig, ax = plt.subplots()
plt.close()

ax.set_xlim(0, 800)
ax.set_ylim(600, 0)

line, = ax.plot([], [], 'o', color='black');

anim = animation.FuncAnimation(fig, animate_pose, 32,  interval=50, blit=True)
rc('animation', html='jshtml')
anim

## Section 5 - Define and Train Your Model

We will next create a model using LSTM layer to process sequence (time-series) data. We will start with very simple model, consisting of only a single LSTM layer followed by Dense layer for classification. We will also add in Dropout layer. 

Since our target label is not one-hot-encoded, we will specify `sparse_categorical_crossentropy` as our loss function.

You may find that a good validation accuracy for you model may hover near about 85-90%. 

In [None]:
# Create our LSTM model here
#
def create_model():

    # Use Keras to create a Sequential model here with any layers that 
    # you'd like.
    #
    model = Sequential()

    model.add(LSTM(128, input_shape=(32, 36)))
    model.add(Dropout(0.2))

    model.add(Dense(6, activation='softmax'))

    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.summary()
    return model


In [None]:
model = create_model()

# create tensorboard log directory 
root_logdir = os.path.join(os.curdir, "tb_logs")

def get_run_logdir():    # use a new directory for each run
    import time
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    logdir = os.path.join(root_logdir, run_id)
    os.makedirs(logdir, exist_ok=True)
    return logdir 

run_logdir = get_run_logdir()
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=run_logdir)
checkpoint_callback = keras.callbacks.ModelCheckpoint(filepath=run_logdir + '/model.{epoch:04d}-val_acc-{val_accuracy:4.2f}-loss-{val_loss:4.2f}.h5',
                                                      monitor='val_loss', save_best_only=True)
earlystop_callback = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

# Train our model
#
model.fit(x=X_train, y=y_train, 
          batch_size=256, 
          epochs=40,
          validation_data=(X_test, y_test),
          callbacks=[tensorboard_callback, checkpoint_callback, earlystop_callback], 
          verbose=1)

In [None]:
%load_ext tensorboard 
%tensorboard --logdir tb_logs

## Section 6 - Scale / Translation Normalization

So far, we have not talked about how we can normalize our skeletal keypoints so that the pose data is scale / translation invariant. This means that regardless of how far the person is from the camera, or when the person moves left or right or up or down, the coordinates of all joint positions should always be relative to a fixed frame of reference.

To take care of translation (left / right / up / down) invariance, we are shift all points together so that neck point is always placed at (0, 0). 

To take care of scale invariance, we estimate the torso height (which is either the length of the neck point to either hip, or the width of the shoulders). We then divide all joint coordinates by the torso height.

To do so, we will create a `process_joints()` function to include code to normalize the skeleton key points as described above:

1. ref = P[1] or the midpoint of P[2], P[5]
2. reflength = length(ref to P[8]) or length(ref to P[11]) 
3. Compute 
   - P[i].x = (P[i].x - ref.x) / reflength
   - P[i].y = (P[i].y - ref.y) / reflength


In [None]:
# Declare a function that can compute length (euclidean distance) between two points
#   (x1,y1) - (x2,y2)
def compute_length(x1, y1, x2, y2):
    return math.sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2))

# Process OpenPose's Joints

# NOTE: The "keypoints" parameter consists of an array of consecutive x and y values 
# within the same array.
# keypoints = [p0.x, p0.y, p1.x, p1.y, p2.x, p2.y, ..., p17.x, p17.y] (a total of 36 values) 
def process_joints(keypoints):

    normalized_keypoints = [0] * 36

    refx = 0
    refy = 0
    reflength = 1

    # Step 1: Let's find the reference point (neck)
    #
    if keypoints[2] != 0 or keypoints[3] != 0:         
        refx = keypoints[2]                # use the neck X, Y
        refy = keypoints[3]
    elif (keypoints[4] != 0 or keypoints[5] != 0) and (keypoints[10] != 0 or keypoints[11] != 0):
        refx = (keypoints[4] + keypoints[10]) / 2  # estimate the neck X, Y from the mid point
        refy = (keypoints[5] + keypoints[11]) / 2  # of the left/right shoulder
    
    # Step 2: Let's estimate the torso length.
    #
    if keypoints[16] != 0 and keypoints[17] != 0:             
        reflength = compute_length(keypoints[16], keypoints[17], refx, refy)   # neck to right hip
    elif keypoints[22] != 0 and keypoints[23] != 0:
        reflength = compute_length(keypoints[22], keypoints[23], refx, refy)   # neck to left hip

    # Step 3:
    # Perform the translation and the scaling.
    #
    for i in range(0, 18):
        normalized_keypoints[i*2] = (keypoints[i*2] - refx) / reflength
        normalized_keypoints[i*2 + 1] = (keypoints[i*2 + 1] - refy) / reflength
    
    # Return the re-mapped and normalized result
    #
    return normalized_keypoints


We will apply the normalization to each row of keypoints (36 keypoints). We use `itertuples()` to iterate through each row of dataframe. 

In [None]:
def normalize(df):
    X = []
    for row in tqdm(df.itertuples(index=False)):
        X.append(process_joints(row))
    
    X = np.array(X)
    return X

In [None]:
X_train_normalized = normalize(train_df).reshape(-1, 32, 36)
X_test_normalized = normalize(test_df).reshape(-1, 32, 36)

In [None]:
model = create_model()

run_logdir = get_run_logdir()
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=run_logdir)
checkpoint_callback = keras.callbacks.ModelCheckpoint(filepath=run_logdir + '/model.{epoch:04d}-val_acc-{val_accuracy:4.2f}-loss-{val_loss:4.2f}.h5',
                                                      monitor='val_loss', save_best_only=True)
earlystop_callback = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

# Train our model
model.fit(x=X_train_normalized, y=y_train, 
          batch_size=256, 
          epochs=40,
          validation_data=(X_test_normalized, y_test),
          callbacks=[tensorboard_callback, checkpoint_callback, earlystop_callback], 
          verbose=1)


## Section 7 - Evaluate Model Performance

In [None]:
labels = ["JUMPING", "JUMPING_JACKS", "BOXING", "WAVING_2HANDS", "WAVING_1HAND", "CLAPPING_HANDS"]

import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt
from functools import reduce
 
def display_classification_confusion_matrix(keras_model, x_train, y_train, x_test, y_test, labels):
    
    '''
    x_train = []
    x_test = []
    y_train = []
    y_test = []
    '''
 
    print(x_train.shape)
    train_preds = keras_model.predict(x_train)
    test_preds = keras_model.predict(x_test)
    train_preds = np.argmax(train_preds, axis=1)
    test_preds = np.argmax(test_preds, axis=1)
    
    plt.figure(figsize=(20,6))  

    labels = np.array(labels)
    # Print the first Confusion Matrix for the training data
    #
    cm = confusion_matrix(y_train, train_preds)

    cm_df = pd.DataFrame(cm, labels, labels)          
    plt.subplot(1, 2, 1)
    plt.title('Confusion Matrix (Train Data)')
    sns.heatmap(cm_df, annot=True)
    plt.ylabel('Actual')
    plt.xlabel('Predicted')        
    
    # Print the second Confusion Matrix for the test data
    #    
    cm = confusion_matrix(y_test, test_preds)
    
    cm_df = pd.DataFrame(cm, labels, labels)          
    plt.subplot(1, 2, 2)
    plt.title('Confusion Matrix (Test Data)')
    sns.heatmap(cm_df, annot=True)
    plt.ylabel('Actual')
    plt.xlabel('Predicted')        
    
    plt.show()

    # Finally display the classification reports
    #
    print ("Train Data:")
    print ("--------------------------------------------------------")
    print(classification_report(y_train, train_preds, target_names=labels))
    print ("")
    print ("Test Data:")
    print ("--------------------------------------------------------")
    print(classification_report(y_test, test_preds, target_names=labels))
    

display_classification_confusion_matrix(model, X_train_normalized, y_train, X_test_normalized, y_test, labels)

## Section 8 - Save and Download Model

Run the following cell to save your model. 


In [None]:
model.save("activity_model")

Run the following the zip the "model.savedmodel" folder into a single zip file.

Download that zip file from Colab once you are done! We will be using this for the next practical exercise.

In [None]:
!zip activity_model.zip -r activity_model

In [None]:
model = keras.models.load_model('activity_model')

In [None]:
model.summary()

In [None]:
sample_index = 2000
sample = X_test_normalized[sample_index]
label = np.squeeze(y_test)[sample_index]


In [None]:
sample = np.expand_dims(sample, axis=0)

In [None]:
pred = model(sample)
print(pred)

In [None]:
print('actual = {}'.format(labels[label]))
print('predicted = {}'.format(labels[np.argmax(pred)]))