# DAAN570 - Final Project - RSI Prevention by Yoga - Modelling notebook
<br>Team: 11: Suradech Kongkiatpaiboon and Burq Latif
Course: DAAN 570 – Deep Learning (Fall, 2021) - Penn State World Campus

> Problem statement : Repetitive stress injury is extremely prevalent for anyone who works at the same spot for a long length of time, especially with the development of COVID-19 and the increase in work from home trend. We've identified certain flaws in traditional RSI prevention software on the market, and we've noted that yoga's popularity is growing by the day. The reason for this is the numerous physical, mental, and spiritual advantages that yoga may provide. Many people are following this trend and practice yoga without the help of a professional. However, doing yoga incorrectly or without adequate instruction can lead to serious health problems such as strokes and nerve damage. As a result, adhering to appropriate yoga poses is a vital consideration.
In this work, we present a method for identifying the user's postures and providing visual guidance to the user. In order to be more engaging with the user, this procedure is done in real-time and utilizes the traditional webcam on the laptop/desktop to run the application.

Keywords : Yoga, posture, classification, movenet, keypoint

Data Collection:
We took some images from open source yoga posture dataset from three following sites and applied basic data cleaning manually (e.g. remove corrupted images, remove misclassified yoga posture images).
1. Open source dataset from https://www.kaggle.com/general/192938
2. 3D synthetic dataset from https://laurencemoroney.com/2021/08/23/yogapose-dataset.html
3. Yoga-82 dataset from https://sites.google.com/view/yoga-82/home

This is the 4th notebook of total 5 notebooks in this series listed as following:
1. EDA and image augmentation note books >> https://www.kaggle.com/suradechk/01-eda-and-image-augmentation-v2
2. Setting up a baseline model using CNN >> https://www.kaggle.com/suradechk/02-baseline-model-using-cnn-v2
3. Keypoint generation using movenet >> https://www.kaggle.com/suradechk/03-keypoint-movenet-v2
4. Classification keypoint output using classical ML >> https://www.kaggle.com/suradechk/04-classification-using-keypoints-output-v2
5. Classification keypoint output using ANN >> https://www.kaggle.com/suradechk/05-classification-using-ann-v2

# 1. Importing keypoint data from previous notebook

In [1]:
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.image import imread
%matplotlib inline

In [1]:
my_data_dir = r'/kaggle/input/03-keypoint-movenet-v2/'

In [1]:
os.listdir(my_data_dir)

In [1]:
#Loading data from previous notebook
train_df = pd.read_csv(my_data_dir + 'train_df.csv', index_col = 0)
test_df = pd.read_csv(my_data_dir + 'test_df.csv', index_col = 0)
val_df = pd.read_csv(my_data_dir + 'val_df.csv', index_col = 0)
train_df.shape, test_df.shape, val_df.shape

In [1]:
train_df.head()

In [1]:
train_df = train_df.drop(['image_name','keypoint'], axis = 1)
test_df = test_df.drop(['image_name','keypoint'], axis = 1)
val_df = val_df.drop(['image_name','keypoint'], axis = 1)

# 2. Setting up train/test/split dataframes

In [1]:
X_train, y_train = train_df.drop('category', axis = 1), train_df['category']
X_test, y_test = test_df.drop('category', axis = 1), test_df['category']
X_val, y_val = val_df.drop('category', axis = 1), val_df['category']

In [1]:
from sklearn.metrics import accuracy_score
def model_fit_predict_acc(model, X_train, y_train, X_test, y_test, X_val, y_val):
    model.fit(X_train,y_train)
    y_pred_test = model.predict(X_test)
    y_pred_train = model.predict(X_train)
    y_pred_val = model.predict(X_val)
    acc_test = accuracy_score(y_pred_test, y_test)
    acc_train = accuracy_score(y_pred_train, y_train)
    acc_val = accuracy_score(y_pred_val, y_val)
    return (acc_train, acc_test, acc_val)

# 3. Try various classical machine learning algorithm to classify the Yoga poses.

In [1]:
from catboost import CatBoostClassifier
model_train, model_test, model_val = model_fit_predict_acc(CatBoostClassifier(verbose = False),X_train, y_train, X_test, y_test, X_val, y_val)
print("Accuracy of CatBoost Classifier Models Train: {:.4f}, Test: {:.4f}, Val: {:.4f}".format(model_train, model_test, model_val))

It can be seen that the model can be easily overfitted.  Let's try varying hyperparameters to get a better fit.

In [1]:
model_train, model_test, model_val = model_fit_predict_acc(CatBoostClassifier(iterations=200, depth = 4, l2_leaf_reg=10, verbose = False),X_train, y_train, X_test, y_test, X_val, y_val)
print("Accuracy of CatBoost Classifier Models Train: {:.4f}, Test: {:.4f}, Val: {:.4f}".format(model_train, model_test, model_val))

In [1]:
from xgboost import XGBClassifier
model_train, model_test, model_val = model_fit_predict_acc(XGBClassifier(max_depth = 3),X_train, y_train, X_test, y_test, X_val, y_val)
print("Accuracy of XGBClassifier Model >> Train: {:.4f}, Test: {:.4f}, Val: {:.4f}".format(model_train, model_test, model_val))

In [1]:
from lightgbm import LGBMClassifier
model_train, model_test, model_val = model_fit_predict_acc(LGBMClassifier(max_depth = 3),X_train, y_train, X_test, y_test, X_val, y_val)
print("Accuracy of LGBM Classifier Model >> Train: {:.4f}, Test: {:.4f}, Val: {:.4f}".format(model_train, model_test, model_val))

In [1]:
from sklearn.neighbors import KNeighborsClassifier
for i in range(2,30):
    model_train, model_test, model_val = model_fit_predict_acc(KNeighborsClassifier(n_neighbors = i),X_train, y_train, X_test, y_test, X_val, y_val)
    print("Accuracy of kNNClassifier Model with i = {} >> Train: {:.4f}, Test: {:.4f}, Val: {:.4f}".format(i, model_train, model_test, model_val))

In [1]:
from sklearn.neural_network import MLPClassifier
model_train, model_test, model_val = model_fit_predict_acc(MLPClassifier(),X_train, y_train, X_test, y_test, X_val, y_val)
print("Accuracy of MLPClassifier Model >> Train: {:.4f}, Test: {:.4f}, Val: {:.4f}".format(model_train, model_test, model_val))

In [1]:
from sklearn.tree import DecisionTreeClassifier
model_train, model_test, model_val = model_fit_predict_acc(DecisionTreeClassifier(),X_train, y_train, X_test, y_test, X_val, y_val)
print("Accuracy of DecisionTreeClassifier Model >> Train: {:.4f}, Test: {:.4f}, Val: {:.4f}".format(model_train, model_test, model_val))

In [1]:
from sklearn.ensemble import RandomForestClassifier
model_train, model_test, model_val = model_fit_predict_acc(RandomForestClassifier(),X_train, y_train, X_test, y_test, X_val, y_val)
print("Accuracy of RandomForestClassifier Model >> Train: {:.4f}, Test: {:.4f}, Val: {:.4f}".format(model_train, model_test, model_val))

In [1]:
from sklearn.ensemble import AdaBoostClassifier
model_train, model_test, model_val = model_fit_predict_acc(AdaBoostClassifier(CatBoostClassifier(verbose = False)),X_train, y_train, X_test, y_test, X_val, y_val)
print("Accuracy of Ada + Cat Model >> Train: {:.4f}, Test: {:.4f}, Val: {:.4f}".format(model_train, model_test, model_val))

It appears that after getting all the keypoints, any model gives acceptable accuracy. We'll just pick one best model in this step for our custom software creation.  In this case we just pick Adaboost(catboost()) model for next step.

# 4. Exploring the confusion matrix

In [1]:
from sklearn.metrics import classification_report,confusion_matrix

In [1]:
model = AdaBoostClassifier(CatBoostClassifier(verbose = False))
model.fit(X_train, y_train)
predictions = model.predict(X_train)
print(classification_report(y_train,predictions))

In [1]:
predictions_test = model.predict(X_test)
print(classification_report(y_test,predictions_test))

In [1]:
predictions_val = model.predict(X_val)
print(classification_report(y_val,predictions_val))