<a href="https://colab.research.google.com/github/schmuecker/transfer-learning/blob/main/computer_vision/classification_from_scratch/cnn_human_action.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h2 style='color:blue' align='center'>Human Action Recognition using a CNN (from scratch)</h2>

**In this notebook we will build a CNN to classify human action images.

In [1]:
%pip install datasets

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import matplotlib.pyplot as plt
import numpy as np
import cv2
import os
import PIL
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from datasets import load_dataset
from PIL import Image

We will download flowers dataset from google website and store it locally. In below call it downloads the zip file (.tgz) in cache_dir which is . meaning the current folder

<h3 style='color:purple'>Load flowers dataset</h3>

In [3]:
dataset = load_dataset("Bingsu/Human_Action_Recognition")
dataset



  0%|          | 0/2 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['image', 'labels'],
        num_rows: 12600
    })
    test: Dataset({
        features: ['image', 'labels'],
        num_rows: 5400
    })
})

<h3 style='color:purple'>Preprocess human action data</h3>

## Training data

In [22]:
X_train, y_train = [], []

for a in dataset['train']:
  image, labels = a['image'], a['labels']
  img = np.asarray(image.resize((160,160)), dtype=np.float32)
  X_train.append(img)
  y_train.append(labels)

In [23]:
X_train = np.array(X_train)
y_train = np.array(y_train)

X_train.shape, y_train.shape

((12600, 160, 160, 3), (12600,))

In [24]:
X_train[0].shape

(160, 160, 3)

In [25]:
X_train[0][0][0]

array([234., 161.,  89.], dtype=float32)

<h3 style='color:purple'>Test data</h3>

In [38]:
X_test, y_test = [], []

for a in dataset['test']:
  image, labels = a['image'], a['labels']
  img = np.asarray(image.resize((160,160)), dtype=np.float32)
  X_test.append(img)
  y_test.append(labels)

X_test = np.array(X_test)
y_test = np.array(y_test)

<h3 style='color:purple'>Build convolutional neural network and train it</h3>

In [30]:
class myCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    '''
    Halts the training after reaching 80 percent accuracy

    Args:
      epoch (integer) - index of epoch (required but unused in the function definition below)
      logs (dict) - metric results from the training epoch
    '''

    # Check accuracy
    if(logs.get('loss') < 0.3) and (logs.get('accuracy') > 0.8):

      # Stop if threshold is met
      print("\nLoss is lower than 0.4 so cancelling training!")
      self.model.stop_training = True

# Instantiate class
callbacks = myCallback()

Model architecture: https://www.kaggle.com/code/debanjan2002/human-action-recognition-classification

In [35]:
num_classes = 15

model = Sequential([
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(pool_size=2,strides=2),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(pool_size=2,strides=2),
  layers.Flatten(),
  layers.Dense(512, activation='relu'),
  layers.Dense(num_classes, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])
              
model.fit(X_train, y_train, epochs = 100, callbacks=[callbacks])              

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Loss is lower than 0.4 so cancelling training!


<keras.callbacks.History at 0x7fa8393af410>

In [39]:
model.evaluate(X_test,y_test)



[21.869091033935547, 0.0640740767121315]

In [40]:
predictions = model.predict(X_test)
predictions



array([[1.25805836e-05, 1.18464285e-07, 9.99999821e-01, ...,
        1.61897391e-04, 3.17071946e-09, 9.99377429e-01],
       [2.14241984e-23, 2.49052420e-14, 1.15959365e-05, ...,
        2.44274739e-23, 1.00000000e+00, 6.46187804e-17],
       [9.99999344e-01, 1.03614957e-14, 4.44925184e-13, ...,
        1.63464529e-35, 2.53704154e-32, 0.00000000e+00],
       ...,
       [3.89393717e-01, 1.73525512e-02, 1.30133168e-03, ...,
        8.87449714e-04, 4.02354747e-01, 6.25553960e-03],
       [9.94157314e-01, 9.99826252e-01, 9.54675302e-03, ...,
        6.56399934e-10, 9.99969661e-01, 7.39748776e-02],
       [4.48311172e-21, 1.47763203e-04, 4.39406902e-01, ...,
        1.45447309e-04, 1.00000000e+00, 7.88521464e-33]], dtype=float32)

In [41]:
score = tf.nn.softmax(predictions[0])

In [42]:
np.argmax(score)

2

In [43]:
y_test[0]

0