# CNN-LSTM Model

The following notebook is a guide into using a hybrid model, that is when two models are trained together in order to increase \
the accuracy of the predicition. In this case, to detect someones sitting posture (3 classifications: leaning left, leaning right, sitting straight), \
we are using a Convolutional Neural Network (CNN) to grab the spatial features of the images and a Long Short Term Memory (LSTM) model to grab \
the temporal features of the images that are organized in sequence to better study the movement of the subject.

## Imports

In [1]:
import numpy as np
import pandas as pd
import keras
import matplotlib.pyplot as plt
%matplotlib inline
import json
import tensorflow as tf
import cv2

## Resizing & Dataset Preparation

Resizing is not necessary but it makes the trainin quicker. For this use case, more pixels does not necessary help. \
The rest of the code takes care of preparing all the classified images so that they are \
in order of time e.g. a person moving frame by frame to the right. This is needed for the LSTM model.

Function to Resize the image from (720,1280,3) to (180,320,3)

In [3]:
def get_img(img_path):
    original_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
    resized_img = cv2.resize(original_img, (320,180), interpolation=cv2.INTER_CUBIC)
    return resized_img

**To Concatanate 8 Images in a sample for our TimeDistributed Layer**

In [6]:
prefix = "frames/train/left/1/"          ## change to your own folder path

X_sample = []
for idx in range(1, 9):
    img_path = prefix + str(idx) + ".jpg"
    img = get_img(img_path)
    X_sample.append(img)

print (np.array(X_sample).shape)

(8, 180, 320, 3)


**To Convert the Training data into Required format to feed the data into model.**

In [None]:
main_prefix="frames-full-9/train/"   ## change to your own folder path
x_left=[]
x_right=[]
x_straight=[]

for i in range(1,9):
    path=main_prefix+'/left/'+str(i)+"/"
    X_sample = []
    for idx in range(1, 10):
        img_path = path + str(idx) + ".jpg"
        img = get_img(img_path)
        X_sample.append(img)
    x_left.append(np.array(X_sample))
        
for i in range(1,9):
    path=main_prefix+'/right/'+str(i)+"/"
    X_sample = []
    for idx in range(1, 10):
        img_path = path + str(idx) + ".jpg"
        img = get_img(img_path)
        X_sample.append(img)
    x_right.append(np.array(X_sample))
        
for i in range(1,9):
    path=main_prefix+'straight/'+str(i)+"/"
    X_sample = []
    for idx in range(1, 10):
        img_path = path + str(idx) + ".jpg"
        img = get_img(img_path)
        X_sample.append(img)
    x_straight.append(np.array(X_sample))



In [8]:
# label encoding
from sklearn.preprocessing import LabelEncoder

y_train = ['left'] * 90 + ['right'] * 90 + ['straight'] * 90
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
num_classes = len(label_encoder.classes_)
y_train_onehot = keras.utils.to_categorical(y_train_encoded, num_classes)

In [9]:
print(np.array(x_left).shape)
print(np.array(x_right).shape)
print(np.array(x_straight).shape)

(8, 9, 180, 320, 3)
(8, 9, 180, 320, 3)
(8, 9, 180, 320, 3)


## Model Layer Configuration

In the following code the data is inputted in the CNN model first, then flattened such as is output is the input of the LSTM layer.\
In between this process is a TimeDistributed Layer so that the LSTM model can read each video frame (an image) in sequence and \
understand the temporal differences of movement.

In [54]:
from tensorflow import keras
from tensorflow.keras.optimizers.legacy import Adam
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout, TimeDistributed, LSTM

model=None

cnn = Sequential()

cnn.add(Conv2D(16, (3,3), 1, activation='relu', input_shape=(180,320,3)))
cnn.add(MaxPooling2D())
cnn.add(Dropout(0.75))
cnn.add(Conv2D(32, (3,3), 1, activation='relu'))
cnn.add(MaxPooling2D())
cnn.add(Dropout(0.75))
cnn.add(Conv2D(64, (3,3), 1, activation='relu'))
cnn.add(MaxPooling2D())
cnn.add(Dropout(0.5))
cnn.add(Flatten())

model = Sequential()

model.add(TimeDistributed(cnn, input_shape=(10, 180, 320, 3)))                          ###Using TimeDistributed Layer to Feed the Image Sequence

# now, flatten  each output to send 8 outputs with one dimension to LSTM
model.add( TimeDistributed( Flatten()  ))
model.add(LSTM(256, activation='relu', return_sequences=False))                                   ##Added LSTM to Capture the Sequence Information
# finalize with standard Dense, Dropout...
model.add(Dense(64, activation='relu'))                                           ##Final Adding an Dense Layer
model.add(Dropout(.5))    

optimizer = tf.keras.optimizers.legacy.Adam(0.00001)
optimizer.learning_rate.assign(0.00001)
model.add(Dense(3))                                         
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])


In [53]:
model.summary()

Model: "sequential_171"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 time_distributed_169 (Time  (None, 10, 2592)          18960     
 Distributed)                                                    
                                                                 
 time_distributed_170 (Time  (None, 10, 2592)          0         
 Distributed)                                                    
                                                                 
 lstm_31 (LSTM)              (None, 256)               2917376   
                                                                 
 dense_167 (Dense)           (None, 64)                16448     
                                                                 
 dropout_342 (Dropout)       (None, 64)                0         
                                                                 
 dense_168 (Dense)           (None, 3)              

In [16]:
## standard training

x_train = np.concatenate((x_left, x_right, x_straight))

r=model.fit(x_train ,y_train_onehot,validation_split=0.2,batch_size=10,epochs=50, shuffle=True)  


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


## Visualizations

Visualizing the loss

In [None]:

fig = plt.figure()
plt.plot(r.history['loss'], color='teal', label='loss')
plt.plot(r.history['val_loss'], color='orange', label='val_loss')
fig.suptitle('Loss', fontsize=20)
plt.legend(loc="upper right")
plt.show()

Visualizing the model accuracy


In [None]:

fig = plt.figure()
plt.plot(r.history['accuracy'], color='teal', label='accuracy')
plt.plot(r.history['val_accuracy'], color='orange', label='val_accuracy')
fig.suptitle('Accuracy', fontsize=20)
plt.legend(loc="lower right")
plt.show()

## K-fold Testing

In [59]:
import os
import numpy as np
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import TimeDistributed, LSTM, Dense, Dropout, Flatten, GRU
from tensorflow.keras.utils import to_categorical


dataset_path = "path/to/testing/folder"

def load_sequence(folder_path):
    image_list = os.listdir(folder_path)
    image_list.sort()  # Sort images in ascending order

    # Load and preprocess the images
    images = []
    for image_name in image_list:
        image_path = os.path.join(folder_path, image_name)
        if ".DS_Store" not in image_path:
            image = load_img(image_path, target_size=(180, 320))
            image = img_to_array(image)
            images.append(image)

    # Stack the images to create a sequence
    sequence = np.stack(images, axis=0)
    return sequence

class_folders = os.listdir(dataset_path)

sequences = []
labels = []

# Iterate over each class folder
for class_folder in class_folders:
    class_path = os.path.join(dataset_path, class_folder)

    # Iterate over each sequence folder
    if ".DS_Store" not in class_path:
        sequence_folders = os.listdir(class_path)

        for sequence_folder in sequence_folders:
            if ".DS_Store" not in sequence_folder:
                sequence_path = os.path.join(class_path, sequence_folder)

                # Load the sequence
                sequence = load_sequence(sequence_path)
                sequences.append(sequence)
                labels.append(class_folder)

# Pad or truncate the sequences to have a length of x
sequences = pad_sequences(sequences, maxlen=10, padding='post', truncating='post')

sequences = tf.convert_to_tensor(sequences)

# Convert labels to integer format
label_encoder = LabelEncoder()
integer_labels = label_encoder.fit_transform(labels)
labels = tf.convert_to_tensor(integer_labels)

k = 5
kfold = KFold(n_splits=k, shuffle=True, random_state=42)

fold_scores = []

for train_indices, val_indices in kfold.split(sequences):
    train_indices = tf.constant(train_indices, dtype=tf.int64)
    val_indices = tf.constant(val_indices, dtype=tf.int64)

    train_sequences = tf.gather(sequences, train_indices)
    train_labels = tf.gather(labels, train_indices)
    val_sequences = tf.gather(sequences, val_indices)
    val_labels = tf.gather(labels, val_indices)
    
    one_hot_labels = to_categorical(train_labels)
    train_labels = tf.convert_to_tensor(one_hot_labels)
    
    one_hot_labels_val = to_categorical(val_labels)
    val_labels = tf.convert_to_tensor(one_hot_labels_val)

    cnn = Sequential()

    cnn.add(Conv2D(16, (3,3), 1, activation='relu', input_shape=(180,320,3)))
    cnn.add(MaxPooling2D())
    cnn.add(Dropout(0.75))
    cnn.add(Conv2D(32, (3,3), 1, activation='relu'))
    cnn.add(MaxPooling2D())
    cnn.add(Dropout(0.75))
    cnn.add(Conv2D(64, (3,3), 1, activation='relu'))
    cnn.add(MaxPooling2D())
    cnn.add(Dropout(0.5))
    cnn.add(Flatten())

    model = Sequential()

    model.add(TimeDistributed(cnn, input_shape=(10, 180, 320, 3)))                         
    model.add(TimeDistributed( Flatten()))
    model.add(LSTM(256, activation='relu', return_sequences=False))                                 
    # model.add(GRU(64, activation='relu', return_sequences=False))                                 
    model.add(Dense(64, activation='relu'))                                         
    model.add(Dropout(.5))    

    optimizer = tf.keras.optimizers.legacy.Adam(0.001)
    optimizer.learning_rate.assign(0.001)

    model.add(Dense(3, activation='softmax'))                                      
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit(train_sequences, train_labels, epochs=50)
    _, model_eval = model.evaluate(val_sequences, val_labels)
    fold_scores.append(model_eval)
    print(model_eval)

print(np.mean(fold_scores))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
0.9090909361839294
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50


Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
0.800000011920929
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
1.0
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50


Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
0.800000011920929
0.9018181920051574
