## Fine tuning VGG16, VGG19 and ResNetRS101 for Human Activity Classification

### Group: 11 - Darshan Avaiya | Yukta Patel | Vaibhav Sheth | Taranjot Singh

### Professor - Ran Feldesh

Date - August 6, 2023

## Abstract

The aim of this project is to fine tune three CNN models for human activity classification. The project utilizes popular pre-trained models such as VGG16, VGG19, and ResNet101 to achieve high prediction accuracy. 

The project involves preprocessing input images to match the input size required by the models. The chosen model is loaded and fine-tuned for the specific image classification task. The models' architectures are adjusted by freezing certain layers to prevent overfitting. Additionally, techniques such as dropout and regularization are employed to enhance the models' generalization capabilities.

The Human Activity Recognition (HAR) dataset used for training and evaluation consists of labeled images belonging to different classes. The models are trained on a subset of the dataset and validated on another subset. Dataset contations 15 different classes presenting various human activities.

The project's report highlights the technical aspects of implementing deep learning models, including model selection, architecture, preprocessing, and training. It emphasizes the results achieved through experimentation and provides insights into the challenges and solutions encountered during the project. By focusing on accurate image classification using deep learning, this project contributes to the broader field of computer vision and pattern recognition.

## Introduction

In the field of computer vision and artificial intelligence, image classification stands as a fundamental task with wide range of applications. Accurate and automated recognition of objects and activities within images is important for numerous domains, including healthcare, surveillance, and entertainment. The Image Classification Project seeks to address this task through the implementation of cutting-edge deep learning techniques, leveraging well-established pre-trained models such as VGG16, VGG19, and ResNetRS101.

The primary objective of this project is to develop robust and accurate models capable of classifying human activities into various predefined categories. The project's focus lies in understanding the training process of these deep learning models, their adaptation for the given dataset, and the effective implementation of image classification methodologies.

The core problem addressed in this project is Human Activity Recognition (HAR) using image classification techniques. HAR involves the identification and categorization of different human activities based on images. The goal is to develop a deep learning model capable of accurately classifying images into predefined categories that correspond to various human activities.

Human Activity Recognition (HAR) through image classification holds significant importance due to its wide-ranging applications and potential benefits in various domains such as healthcare, fitness and sports, survelliance and security, industrial safety and many more. 

## Related work

In the domain of Human Activity Recognition (HAR) and image classification, several prior works have laid the foundation for your project by exploring similar problems, methodologies, and challenges. Here are some examples of related work in this domain:

1. "Deep Convolutional Neural Networks for Human Activity Recognition Using Mobile Sensors" (2015): This paper introduced the concept of using deep convolutional neural networks (CNNs) for HAR using sensor data from mobile devices. While not focused on images, it paved the way for applying deep learning to HAR.

2. "Human Activity Recognition: A Review" (2019): This comprehensive review article summarizes the evolution of HAR techniques, including both traditional machine learning approaches and deep learning methods. It discusses datasets, features, and challenges in HAR.

3. "Deep Learning-Based Human Activity Recognition: A Survey" (2021): Another survey paper provides an up-to-date overview of deep learning techniques applied to HAR. It discusses various architectures, datasets, and challenges associated with training deep models for HAR.

4. Competitions and Challenges: Various data science competitions, such as Kaggle challenges, have provided platforms for researchers to showcase their HAR models. Studying winning solutions and approaches from such challenges can provide insights into effective strategies. Even dataset taken for this project is from a chellenge hosted by AI Planet.

## Use Cases of Human Activity Classification/Recognition

1. **Healthcare and Wellness Monitoring:** HAR can be utilized to monitor human activities for healthcare purposes. It can aid in tracking patient movements, identifying anomalies in daily activities, and ensuring the well-being of individuals, especially the elderly or those with medical conditions.

2. **Fitness and Sports Analysis:** HAR can be employed in fitness tracking and sports analysis. It enables the assessment of exercise routines, tracking of performance metrics, and providing personalized feedback to individuals aiming for physical fitness and athletic excellence.

3. **Surveillance and Security:** HAR is crucial for security and surveillance applications. It can assist in identifying suspicious or abnormal activities in public spaces, ensuring public safety, and enhancing security measures.

4. **Industrial Safety:** In industrial settings, HAR can be used to monitor worker activities and ensure compliance with safety protocols. It helps prevent accidents and provides insights into optimizing workflow efficiency.

5. **Behavioral Analysis:** HAR aids in understanding human behavior and patterns. This is valuable in fields like psychology, sociology, and market research, enabling researchers to gather insights into daily routines and habits.

6. **Automated Processes:** In automation and robotics, HAR contributes to creating smarter systems that can react appropriately to human actions. This is essential in scenarios like industrial automation, robot-human collaboration, and autonomous vehicles.

7. **Smart Environments:** HAR contributes to the development of smart environments that respond intelligently to human presence and actions. This includes energy-efficient lighting, climate control, and home automation.

## Introduction to Data

The dataset used in this project is collected from the Data Sprint 76 - Human Activity Recognition challenge provided by Aiplanet. It comprises a diverse collection of images capturing human activities, each associated with specific labels that denote the corresponding activity. The dataset is structured to enable supervised learning, where images are the input features, and the activity labels serve as the ground truth for model training and assessment.

Dataset includes a train and test folders including images and a csv file containing training images names and associated lables with it.

While performing data preprocessing steps, image size was reduced to 120x120 according to computational capacity. Images were converted into numpy array and labels were encoded in one-hot vector to train models.

Training data was splitted into two parts. 70% dataset for training purpose and 30% of data was preparedfor model validation purpose. Stratified train validation splitting was used to maintain the balance in classes of human activities. 

**Data Augmentation:** To increase diversity and control overfitting, following data augmentation was applied on training set.
data augmentation can be understood as follows:

1. **Rotation:** The images are rotated by a certain angle (up to the specified rotation range). This helps the model become invariant to different orientations of the activities.

2. **Shift:** Images are shifted horizontally and vertically within a specified range. This simulates variations in camera angles and positioning.

3. **Shear:** Shearing involves shifting one part of an image in a specific direction while keeping the other part fixed. This can simulate different viewpoints.

4. **Zoom:** Images can be zoomed in or out, mimicking scenarios where the camera captures activities from different distances.

5. **Horizontal Flip:** Images are flipped horizontally. This is particularly useful when the orientation of the activity doesn't impact its classification.

6. **Fill Mode:** This specifies how to fill in the gaps created by transformations like rotation or shifting. The 'nearest' mode fills gaps with the nearest pixel values.

## Methods

Three models were trained using Tensorflow framework of Python:

1. VGG16
2. VGG19
2. ResNetRS101

Weights of top layers were freezed and few layers were added as top layers on architecture of above models.

Top layers of all models were as per below:

model.add(Flatten())

model.add(Dense(256, activation="relu"))

model.add(Dropout(0.5))

model.add(Dense(15, activation="softmax"))

20 epochs were used to train models with "adam" optimizer, "categorical_crossentropy" as a loss function and "accuracy" as evaluation metrics.

### 1. VGG16

VGG16, as its name suggests, is a 16-layer deep neural network. VGG16 is thus a relatively extensive network with a total of 138 million parameters—it’s huge even by today’s standards. However, the simplicity of the VGG16 architecture is its main attraction. For our dataset, number of classes is 15 and accordingly VGG16 contains 14.7 million parameters. New architecture with backbone of VGG16 contains 135 thousands trainable parameters.
 
The VGGNet architecture incorporates the most important convolution neural network features. 

![VGG16 Architecture](../images/VGG16.png "VGG16 Architecture")
Figure 1: Architecture of VGG16 model

<br>

### 2. VGG19

VGG19 is a variant of VGG model which in short consists of 19 layers (16 convolution layers, 3 Fully connected layer, 5 MaxPool layers and 1 SoftMax layer). For the dataset with 15 classes, it has around 20 million parameters and overall architecture has arond 135 thousand trainable parameters.

![VGG19 Architecture](../images/VGG19.png "VGG19 Architecture")
Figure 2: Architecture of VGG19 model

<br>

### 3. ResNetRS101

ResNet-101 is a convolutional neural network that is 101 layers deep. for HAR dataset, it has 61 Million Parameters. Overall our model architecture consists aounr 528 thousand trainable parameters.

![ResNet101 Architecture](../images/ResNet101.png "ResNet101 Architecture")
Figure 3: Architecture of ResNet101

<br>

## Experiments

To explore appropriate training methods and processes on above methods, techniques such as Dropouts, Image augmentation and L2 regularizations were used step by step. 

### Model 1 : VGG16

Training and evaluations results of fine tuning the VGG16 model is as per below:

![Model 1 results](../images/model_1.png "Model 1 results")
Figure 4: Training and evaluation of model 1

We can notice that training accuracy is very high at around 67% and evaluation accuracy is at 47%. Which shows model is overfitting on training dataset.

### Model 2 : VGG16 with L2 regularization and data augmentation

![Model 2 results](../images/model_2.png "Model 2 results")
Figure 5: Training and evaluation of model 2

We used L2 regularization with l2 = 0.01. Here we can notice that model's accuracy on training data and evaluation data is around 45%. Although it is not desired accuracy but we significantly reduced overfitting.

### Model 3: VGG19

![Model 3 results](../images/model_3.png "Model 3 results")
Figure 6: Training and evaluation of model 3

Here using VGG19 model, we got acccuracy on training data around 69% and on validation data, it's 48%. Again VGG19 model is also overfitting on training dataset. So, let's try data augmentation and regularization on this model as well.

### Model 4 : VGG19 with regularization and data augmentation

![Model 4 results](../images/model_4.png "Model 4 results")
Figure 7: Training and evaluation of model 4

Accuracy chart shows that model 4's accuracy on training data and validation data is nearly same. which shows we overcome the problem of model overfitting on training dataset.

### Model 5 : ResNetRS101 with data augmentation and regularization

As we saw in VGG16 and VGG19, data augmentation and L2 regularization with l2=0.01 significantly reduced overfitting problem. So, we decided to train ResNetRS101 model with same regularization and data augmentaion techniques. Here are results:

![Model 5 results](../images/model_5.png "Model 5 results")
Figure 8: Training and evaluation of model 5

Accuracy chart for this model shows that model is performing poor on training data but well on validation data. Which means model is underfitting on training data set. So, we might have to reduce the regularization penalty.

### Model 6 : ResNetRS101 with data augmentation and regularization with l2=0.001

We reduced regularization to l2 = 0.001 and got these results:

![Model 6 results](../images/model_6.png "Model 6 results")
Figure 9: Training and evaluation of model 6

Training and evaluatiion accuracy of model 6 shows that again model is suffering from overfitting problem. So, again regularization penalty needs to be increased.

### Model 7 : ResNetRS101 with data augmentation and regularization with l2=0.005

With increased regularization term, here are results of model 7:

![Model 7 results](../images/model_7.png "Model 7 results")
Figure 10: Training and evaluation of model 7

These charts shows that, by increasing regularization term we model 7 successfully overcome the overfitting problem. It is noticable that after 20 epochs, accuracy of model on training data and evaluation data shows signs of increase. So, higher number of epochs may results in higher accuracy for this model.

<br>

## Conclusion

Out of all three models, ResNetRs101 (model 7) gave highest accuracy of 65%, due to its deep architecture and higher number of parameters. Using robust and scalable human activity classification algorithms can lead to improved safety, better healthcare monitoring, enhanced user experiences, and increased efficiency in various applications. It demonstrates the practical impact of deep learning and computer vision techniques in solving real-world problems.

| Model | Architecture | Data Augmentation | Regularization | Train Accuracy | Validation Accuracy |
| ----- | ------------ | ----------------- | -------------- | -------------- | ------------------- |
|Model 1 | VGG16        | No   | -  | 67.5% | 47.4% |
|Model 2 | VGG16        | Yes  | l2=0.01  | 44.5% | 46.9% |
|Model 3 | VGG19        | No   | -  | 69.2% | 48.2% |
|Model 4 | VGG19        | Yes  | l2 = 0.01  | 45.0% | 47.0% |
|Model 5 | ResNetRS101  | Yes  | l2 = 0.01  | 55.2% | 61.1% |
|Model 6 | ResNetRS101  | Yes  | l2 = 0.001  | 83.0% | 64.9% |
|Model 7 | ResNetRS101  | Yes  | l2 = 0.005  | 64.3% | 65.7% |

**Challenges Faced:**

**1. Data Variability:** The dataset contains images capturing various lighting conditions, camera angles, and human poses. This variability can impact the model's ability to generalize across different scenarios.

**2. Data Preprocessing:** Preprocessing the images to ensure consistency in terms of size, aspect ratio, and color channels is crucial. The choice of preprocessing techniques can influence the model's performance.

**3. Model Selection:** Selecting the appropriate pre-trained models (VGG16, VGG19, ResNet101) involves considerations such as model complexity, computational resources, and expected accuracy. Choosing the model that best suits the problem is vital.

**4. Overfitting:** Preventing overfitting is a significant concern. Fine-tuning the selected models and implementing techniques like dropout and regularization are essential to ensure the model generalizes well to unseen data.

**5. Validation and Metrics:** Selecting appropriate evaluation metrics to assess the model's performance is crucial. Balancing accuracy, precision, recall, and F1-score based on the project's goals is a challenge.

### Deployment

VGG16 and VGG19 models are deployed using Stremlit and available to use. However, best performing model can not be deployed due to its large size.

Deployed model - https://human-activity-classification.streamlit.app/

## References

1. Documentaion of Tensorflow 2 API available at https://www.tensorflow.org/api_docs
2. Understanding VGG16: Concepts, Architecture, and Performance available at https://datagen.tech/guides/computer-vision/vgg16/#
3. VGG19 documentation available at https://www.mathworks.com/help/deeplearning/ref/vgg19.html
4. https://www.kaggle.com/datasets/pytorch/resnet101
5. Dataset - https://aiplanet.com/challenges/data-sprint-76-human-activity-recognition/233/overview/about
6. Streamlit available at https://streamlit.io/
7. GitHub Repo of Project - https://github.com/idarshan07/human-activity-classification 