<a href="https://colab.research.google.com/github/schmuecker/transfer-learning/blob/main/computer_vision/classification_from_scratch/cnn_human_action.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h2 style='color:blue' align='center'>Human Action Recognition using a CNN (from scratch)</h2>

**In this notebook we will build a CNN to classify human action images.**

In [1]:
%pip install datasets

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.6.1-py3-none-any.whl (441 kB)
[K     |████████████████████████████████| 441 kB 4.7 MB/s 
Collecting responses<0.19
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Collecting dill<0.3.6
  Downloading dill-0.3.5.1-py2.py3-none-any.whl (95 kB)
[K     |████████████████████████████████| 95 kB 3.8 MB/s 
Collecting huggingface-hub<1.0.0,>=0.2.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 90.3 MB/s 
Collecting multiprocess
  Downloading multiprocess-0.70.14-py37-none-any.whl (115 kB)
[K     |████████████████████████████████| 115 kB 52.1 MB/s 
[?25hCollecting xxhash
  Downloading xxhash-3.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[K     |████████████████████████████████| 212 kB 932 kB/s 
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Down

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import cv2
import os
import PIL
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from datasets import load_dataset
from PIL import Image

We will download flowers dataset from google website and store it locally. In below call it downloads the zip file (.tgz) in cache_dir which is . meaning the current folder

In [3]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


<h3 style='color:purple'>Load flowers dataset</h3>

In [4]:
dataset = load_dataset("Bingsu/Human_Action_Recognition")
dataset

Downloading metadata:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/4.70k [00:00<?, ?B/s]



Downloading and preparing dataset None/None (download: 311.96 MiB, generated: 312.59 MiB, post-processed: Unknown size, total: 624.55 MiB) to /root/.cache/huggingface/datasets/Bingsu___parquet/Bingsu--Human_Action_Recognition-8c4395446c1c9642/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/229M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/98.1M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/Bingsu___parquet/Bingsu--Human_Action_Recognition-8c4395446c1c9642/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['image', 'labels'],
        num_rows: 12600
    })
    test: Dataset({
        features: ['image', 'labels'],
        num_rows: 5400
    })
})

<h3 style='color:purple'>Preprocess human action data</h3>

## Training data

In [5]:
X_train, y_train = [], []

for a in dataset['train']:
  image, labels = a['image'], a['labels']
  img = np.asarray(image.resize((160,160)), dtype=np.float32)
  X_train.append(img)
  y_train.append(labels)

In [6]:
X_train = np.array(X_train)
y_train = np.array(y_train)

X_train.shape, y_train.shape

((12600, 160, 160, 3), (12600,))

In [7]:
X_train[0].shape

(160, 160, 3)

In [8]:
X_train[0][0][0]

array([234., 161.,  89.], dtype=float32)

<h3 style='color:purple'>Test data</h3>

In [9]:
X_test, y_test = [], []

for a in dataset['test']:
  image, labels = a['image'], a['labels']
  img = np.asarray(image.resize((160,160)), dtype=np.float32)
  X_test.append(img)
  y_test.append(labels)

X_test = np.array(X_test)
y_test = np.array(y_test)

<h3 style='color:purple'>Build convolutional neural network and train it</h3>

In [10]:
class myCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    '''
    Halts the training after reaching 80 percent accuracy

    Args:
      epoch (integer) - index of epoch (required but unused in the function definition below)
      logs (dict) - metric results from the training epoch
    '''

    # Check accuracy
    if(logs.get('loss') < 0.3) and (logs.get('accuracy') > 0.8):

      # Stop if threshold is met
      print("\nLoss is lower than 0.3 and accuracy higher than 0.8. Cancelling training!")
      self.model.stop_training = True

# Instantiate class
callbacks = myCallback()

Model architecture: https://www.kaggle.com/code/debanjan2002/human-action-recognition-classification

In [11]:
num_classes = 15

model = Sequential([
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(pool_size=2,strides=2),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(pool_size=2,strides=2),
  layers.Flatten(),
  layers.Dense(512, activation='relu'),
  layers.Dense(num_classes, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])
              
model.fit(X_train, y_train, epochs = 100, callbacks=[callbacks])              

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Loss is lower than 0.3 and accuracy higher than 0.8. Cancelling training!


<keras.callbacks.History at 0x7f7f24ef9d90>

In [12]:
model.evaluate(X_test,y_test)



[11.72130298614502, 0.11388888955116272]

In [13]:
predictions = model.predict(X_test)
predictions



array([[3.0362858e-03, 2.5385980e-02, 7.2486049e-01, ..., 6.3100600e-01,
        3.8918741e-02, 9.6313983e-01],
       [6.2990264e-04, 9.9502748e-01, 9.9999702e-01, ..., 2.3399815e-16,
        1.0385875e-06, 1.5593482e-10],
       [9.3933153e-01, 9.9999595e-01, 9.9955553e-01, ..., 6.9226772e-01,
        5.6081069e-07, 1.9272343e-07],
       ...,
       [9.7590828e-01, 7.7587605e-02, 8.5543013e-01, ..., 1.9243063e-03,
        9.7885072e-01, 3.0389350e-02],
       [9.9965358e-01, 9.9626690e-01, 1.8007828e-06, ..., 3.4540051e-04,
        9.8924202e-01, 6.9602072e-01],
       [1.3043885e-10, 9.9999917e-01, 9.9999976e-01, ..., 4.1564316e-13,
        6.2772854e-10, 3.4951555e-04]], dtype=float32)

In [14]:
score = tf.nn.softmax(predictions[0])

In [15]:
np.argmax(score)

4

In [16]:
y_test[0]

0