## Fine tuning VGG16, VGG19 and ResNetRS101 for Human Activity Classification

### Group: 11 - Darshan Avaiya | Yukta Patel | Vaibhav Sheth | Taranjot Singh

### Professor - Ran Feldesh

Date - August 6, 2023

## Abstract

The aim of this project is to fine tune three CNN models for human activity classification. The project utilizes popular pre-trained models such as VGG16, VGG19, and ResNet101 to achieve high prediction accuracy. 

The project involves preprocessing input images to match the input size required by the models. The chosen model is loaded and fine-tuned for the specific image classification task. The models' architectures are adjusted by freezing certain layers to prevent overfitting. Additionally, techniques such as dropout and regularization are employed to enhance the models' generalization capabilities.

The Human Activity Recognition (HAR) dataset used for training and evaluation consists of labeled images belonging to different classes. The models are trained on a subset of the dataset and validated on another subset. Dataset contations 15 different classes presenting various human activities.

The project's report highlights the technical aspects of implementing deep learning models, including model selection, architecture, preprocessing, and training. It emphasizes the results achieved through experimentation and provides insights into the challenges and solutions encountered during the project. By focusing on accurate image classification using deep learning, this project contributes to the broader field of computer vision and pattern recognition.

## Introduction

In the field of computer vision and artificial intelligence, image classification stands as a fundamental task with wide range of applications. Accurate and automated recognition of objects and activities within images is important for numerous domains, including healthcare, surveillance, and entertainment. The Image Classification Project seeks to address this task through the implementation of cutting-edge deep learning techniques, leveraging well-established pre-trained models such as VGG16, VGG19, and ResNetRS101.

The primary objective of this project is to develop robust and accurate models capable of classifying human activities into various predefined categories. The project's focus lies in understanding the training process of these deep learning models, their adaptation for the given dataset, and the effective implementation of image classification methodologies.

The core problem addressed in this project is Human Activity Recognition (HAR) using image classification techniques. HAR involves the identification and categorization of different human activities based on images. The goal is to develop a deep learning model capable of accurately classifying images into predefined categories that correspond to various human activities.

Human Activity Recognition (HAR) through image classification holds significant importance due to its wide-ranging applications and potential benefits in various domains such as healthcare, fitness and sports, survelliance and security, industrial safety and many more. 

## Related work

In the domain of Human Activity Recognition (HAR) and image classification, several prior works have laid the foundation for your project by exploring similar problems, methodologies, and challenges. Here are some examples of related work in this domain:

1. "Deep Convolutional Neural Networks for Human Activity Recognition Using Mobile Sensors" (2015): This paper introduced the concept of using deep convolutional neural networks (CNNs) for HAR using sensor data from mobile devices. While not focused on images, it paved the way for applying deep learning to HAR.

2. "Human Activity Recognition: A Review" (2019): This comprehensive review article summarizes the evolution of HAR techniques, including both traditional machine learning approaches and deep learning methods. It discusses datasets, features, and challenges in HAR.

3. "Deep Learning-Based Human Activity Recognition: A Survey" (2021): Another survey paper provides an up-to-date overview of deep learning techniques applied to HAR. It discusses various architectures, datasets, and challenges associated with training deep models for HAR.

4. Competitions and Challenges: Various data science competitions, such as Kaggle challenges, have provided platforms for researchers to showcase their HAR models. Studying winning solutions and approaches from such challenges can provide insights into effective strategies. Even dataset taken for this project is from a chellenge hosted by AI Planet.

## Use Cases of Human Activity Classification/Recognition

1. **Healthcare and Wellness Monitoring:** HAR can be utilized to monitor human activities for healthcare purposes. It can aid in tracking patient movements, identifying anomalies in daily activities, and ensuring the well-being of individuals, especially the elderly or those with medical conditions.
<br>

2. **Fitness and Sports Analysis:** HAR can be employed in fitness tracking and sports analysis. It enables the assessment of exercise routines, tracking of performance metrics, and providing personalized feedback to individuals aiming for physical fitness and athletic excellence.
<br>

3. **Surveillance and Security:** HAR is crucial for security and surveillance applications. It can assist in identifying suspicious or abnormal activities in public spaces, ensuring public safety, and enhancing security measures.
<br>

4. **Industrial Safety:** In industrial settings, HAR can be used to monitor worker activities and ensure compliance with safety protocols. It helps prevent accidents and provides insights into optimizing workflow efficiency.
<br>

5. **Behavioral Analysis:** HAR aids in understanding human behavior and patterns. This is valuable in fields like psychology, sociology, and market research, enabling researchers to gather insights into daily routines and habits.
<br>

6. **Automated Processes:** In automation and robotics, HAR contributes to creating smarter systems that can react appropriately to human actions. This is essential in scenarios like industrial automation, robot-human collaboration, and autonomous vehicles.
<br>

7. **Smart Environments:** HAR contributes to the development of smart environments that respond intelligently to human presence and actions. This includes energy-efficient lighting, climate control, and home automation.

## Introduction to Data

The dataset used in this project is collected from the Data Sprint 76 - Human Activity Recognition challenge provided by Aiplanet. It comprises a diverse collection of images capturing human activities, each associated with specific labels that denote the corresponding activity. The dataset is structured to enable supervised learning, where images are the input features, and the activity labels serve as the ground truth for model training and assessment.

Dataset includes a train and test folders including images and a csv file containing training images names and associated lables with it.

While performing data preprocessing steps, image size was reduced to 120x120 according to computational capacity. Images were converted into numpy array and labels were encoded in one-hot vector to train models.

Training data was splitted into two parts. 70% dataset for training purpose and 30% of data was preparedfor model validation purpose. Stratified train validation splitting was used to maintain the balance in classes of human activities. 

**Data Augmentation:** To increase diversity and control overfitting, following data augmentation was applied on training set.
data augmentation can be understood as follows:

1. **Rotation:** The images are rotated by a certain angle (up to the specified rotation range). This helps the model become invariant to different orientations of the activities.
<br>

2. **Shift:** Images are shifted horizontally and vertically within a specified range. This simulates variations in camera angles and positioning.
<br>
3. **Shear:** Shearing involves shifting one part of an image in a specific direction while keeping the other part fixed. This can simulate different viewpoints.
<br>
4. **Zoom:** Images can be zoomed in or out, mimicking scenarios where the camera captures activities from different distances.
<br>
5. **Horizontal Flip:** Images are flipped horizontally. This is particularly useful when the orientation of the activity doesn't impact its classification.
<br>
6. **Fill Mode:** This specifies how to fill in the gaps created by transformations like rotation or shifting. The 'nearest' mode fills gaps with the nearest pixel values.

## Methods

Three models were trained using Tensorflow framework of Python:
<br>
1. VGG16
2. VGG19
2. ResNetRS101
<br>

Weights of top layers were freezed and few layers were added as top layers on architecture of above models.

Top layers of all models were as per below:
<br>

model.add(Flatten())
<br>
model.add(Dense(256, activation="relu"))
<br>
model.add(Dropout(0.5))
<br>
model.add(Dense(15, activation="softmax"))
<br>

20 epochs were used to train models with "adam" optimizer, "categorical_crossentropy" as a loss function and "accuracy" as evaluation metrics.