# Action Recognition - LSTM Model Implementation Study

This script implements a LSTM model for ASL. It will be used for study purposes.

Created by:
- Marcus Vinicius da Silva Fernandes.
- Yamini Sharma.

2023-06-05.

#### References:
- https://www.youtube.com/watch?v=pG4sUNDOZFg
- https://numpy.org/doc/stable/reference/generated/numpy.pad.html

### Importing necessary libraries

In [1]:
import numpy as np
import os
import csv
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Masking
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import TensorBoard
from sklearn.model_selection import train_test_split
from sklearn.metrics import multilabel_confusion_matrix, accuracy_score

### Accessing the landmarks

Set up the paths of folders to locate the landmarks and the list (csv file) that associates the name of the video to the corresponding word in English.

In [2]:
# Set up of the extracted landmarks save path
landmarks_path = 'C:/Users/marcu/OneDrive/Documentos/Loyalist_College/AISC2006/ASL/extracted_landmarks_dummy_npy/'

Creation of the dictionary to associate the videos and the words.

In [3]:
# Opening the file dataset_analysis.csv to load the association of landmark ids to words and its number of frames
id_dict = {}  # initializing the dictionary that will receive the data
num_frames = []  # initializing the list that will contain the number of frames of each landmark
with open(landmarks_path + "dataset_analysis.csv", "r") as csv_file:
    csv_reader = csv.reader(csv_file)  # reading the data
    next(csv_reader)  # to skip the header
    for row in csv_reader:
        id_dict['0' * (5 - len(row[0])) + row[0]] = row[1]  # storing the content into a dictionary
        num_frames.append(int(row[7]))

In [4]:
# Maximum number of frames of all the landmarks
max_num_frames = max(num_frames)
print('Maximum number of frames of all the landmarks =', max_num_frames)

# Minimum number of frames of all the landmarks
min_num_frames = min(num_frames)
print('Minimum number of frames of all the landmarks =', min_num_frames)

Maximum number of frames of all the landmarks = 181
Minimum number of frames of all the landmarks = 26


### Shaping the data for the LSTM model

Desired number of frames
- each video will be reshaped to have the number of rows (or frames) equal to the desired number of frames defined below.

In [5]:
NUM_FRAMES = 30

Creation of the X array

- Time-based sampling: we will reduce the dimension of the array to the desired NUM_FRAMES.
- Padding the videos: we will add rows with zeros to increase the dimension of the array to the desired NUM_FRAMES.
- No change: the array already has the desired NUM_FRAMES.

In [6]:
videos, labels = [], []

for item in os.listdir(landmarks_path):
    if item.endswith('.npy'):  # working with npy files only
        data = np.load(os.path.join(landmarks_path, item))  # loading the numpy array into memory
        if data.shape[0] > NUM_FRAMES:  # time-based sampling
            indices = np.arange(0, data.shape[0], data.shape[0] // NUM_FRAMES)[:30]
            data = data[indices]
            videos.append(data)
        elif data.shape[0] < NUM_FRAMES:  # padding the videos
            data = np.pad(data, ((0, NUM_FRAMES - data.shape[0]), (0, 0)), mode='constant')
            videos.append(data)
        else:  # no change
            videos.append(data)
        labels.append(id_dict[item.split('.npy')[0]])

X = np.array(videos)
print(X.shape)

(131, 30, 1662)


Creation of the Y array

In [7]:
labels_unique = np.unique(labels)

labels_encoded = []
for i in labels:
    labels_encoded = np.append(labels_encoded, np.where(labels_unique == i))

Y = to_categorical(labels_encoded).astype(int)
print(Y.shape)

(131, 25)


Splitting the data

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.20)

### LSTM model

Model build

In [9]:
# Second try - masking layer added
model = Sequential()
model.add(Masking(mask_value=0, input_shape=(X.shape[1], X.shape[2])))  # Input shape with variable-length sequences
model.add(LSTM(64, activation='sigmoid'))
model.add(Dense(y_train.shape[1], activation='softmax'))



Model compile

In [10]:
log_dir = os.path.join('Logs')
tb_callback = TensorBoard(log_dir=log_dir)

optimizer = tf.keras.optimizers.legacy.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['categorical_accuracy'])

Model fitment

In [11]:
model.fit(X_train, y_train, epochs=200, callbacks=[tb_callback])

Epoch 1/200


2023-07-10 09:15:29.288567: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78/200
Epoch 7

<keras.callbacks.History at 0x298e3dc60>

In [12]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 masking (Masking)           (None, 30, 1662)          0         
                                                                 
 lstm (LSTM)                 (None, 64)                442112    
                                                                 
 dense (Dense)               (None, 25)                1625      
                                                                 
Total params: 443,737
Trainable params: 443,737
Non-trainable params: 0
_________________________________________________________________


Saving the model

In [13]:
# model.save('ARM_LSTM_second_run.h5')

### Prediction

In [14]:
yhat = model.predict(X_test)

ytrue = np.argmax(y_test, axis=1).tolist()
yhat = np.argmax(yhat, axis=1).tolist()

multilabel_confusion_matrix(ytrue, yhat)



array([[[23,  2],
        [ 2,  0]],

       [[25,  0],
        [ 1,  1]],

       [[25,  0],
        [ 2,  0]],

       [[21,  5],
        [ 0,  1]],

       [[19,  4],
        [ 2,  2]],

       [[26,  0],
        [ 1,  0]],

       [[25,  0],
        [ 2,  0]],

       [[24,  3],
        [ 0,  0]],

       [[24,  3],
        [ 0,  0]],

       [[26,  0],
        [ 1,  0]],

       [[26,  0],
        [ 1,  0]],

       [[25,  2],
        [ 0,  0]],

       [[25,  1],
        [ 1,  0]],

       [[26,  0],
        [ 1,  0]],

       [[25,  0],
        [ 2,  0]],

       [[26,  0],
        [ 1,  0]],

       [[26,  0],
        [ 1,  0]],

       [[25,  0],
        [ 2,  0]],

       [[25,  0],
        [ 2,  0]],

       [[25,  2],
        [ 0,  0]],

       [[25,  1],
        [ 1,  0]]])

In [15]:
print('Prediction accuracy score:')
accuracy_score(ytrue, yhat)

Prediction accuracy score:


0.14814814814814814

In [16]:
for i in range(len(yhat)):
    print('Expected result = ' + labels[ytrue[i]])
    print('Model result = ' + labels[yhat[i]])
    print()

Expected result = across
Model result = adapt

Expected result = adapt
Model result = adapt

Expected result = adjust
Model result = accept

Expected result = adapt
Model result = able

Expected result = admit
Model result = adapt

Expected result = admit
Model result = accident

Expected result = adjective
Model result = adapt

Expected result = adapt
Model result = admit

Expected result = accident
Model result = accept

Expected result = able
Model result = accept

Expected result = admit
Model result = admit

Expected result = adjust
Model result = adapt

Expected result = admit
Model result = admit

Expected result = adapt
Model result = able

Expected result = accident
Model result = accident

Expected result = adapt
Model result = admit

Expected result = admit
Model result = admit

Expected result = admit
Model result = adapt

Expected result = accept
Model result = accept

Expected result = adjective
Model result = accident

Expected result = across
Model result = accident

Ex