## Classification of Hand Gestures

In this notebook, we train and evaluate different classification models for the task of hand gesture classification. We use the data generated in the previous notebook.

### Setup

---

To reload custom scripts automatically:

In [2]:
%load_ext autoreload
%autoreload 2

Define dependencies:

In [10]:
from matplotlib import pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns

import numpy as np
import pandas as pd
from scipy import signal

# Modelling
from sklearn.utils import shuffle
from sklearn.metrics import accuracy_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier

import os
import pickle
from tqdm import tqdm
import random
import sys
sys.path.insert(0, '../')

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

from src import utils

Define global variables:

In [4]:
DATA_ROOT = os.path.join("..", "data")
NINAPRO_ROOT = os.path.join("..", "data", "ninapro")
SAMPLING_EMG_RATE = 100

### Load the data preprocessed in the previous notebook

---

Before loading the data, make sure you downloaded them ([link](https://drive.google.com/file/d/1tz8tb6rruNvnlBkwDc80aT7D8jAXvwkF/view?usp=sharing)) and placed them into the data folder! 

We start by loading the data preprocessed in the previous notebook.

In [5]:
with open(os.path.join(DATA_ROOT, "exercise_1_processed.pkl"), 'rb') as file:
    subjects_features = pickle.load(file)

print("✅ Successfully loaded data, there are {} subjects.".format(len(subjects_features)))

✅ Successfully loaded data, there are 27 subjects.


Now, let's check what we get for instance for first subject:

In [6]:
# Get the data for the first subject
X_train, y_train, X_val, y_val, X_test, y_test = subjects_features[1]

# Print the shapes of the data
print(f"ℹ️ Number of windows: {X_train.shape[0]} (train), {X_val.shape[0]} (val), {X_test.shape[0]} (test)")
print(f"ℹ️ All have {X_train.shape[1]} features.")

ℹ️ Number of windows: 10143 (train), 5984 (val), 3977 (test)
ℹ️ All have 120 features.


### Train and Eval the Models (One Subject)

---

In this section, we try to train and then evaluate the model on the data from 10 subjects. Note that these subjeocts will remain same for training as well as validation. We will train separate models for each subject and then evaluate them on the test data of the same subject.

In [7]:
# Select randmly 10 subjects
selected_subjects = random.sample(list(subjects_features.keys()), 10)

# Collect the data for the selected subjects
selected = dict()
for subject in selected_subjects:
    # Get the data for the subject
    X_train, y_train, X_val, y_val, X_test, y_test = subjects_features[subject]

    # Impute the missing values for features
    X_train = utils.impute_missing_values(X_train)
    X_val = utils.impute_missing_values(X_val)
    X_test = utils.impute_missing_values(X_test)

    # Add the data to the dictionary
    selected[subject] = (X_train, y_train, X_val, y_val, X_test, y_test)

Next, we define our model and train it:

In [13]:
# Define the models storage for each subject
models = dict()

for subject in tqdm(selected.keys()):

    # Get the train data
    X_train, y_train, _, _, _, _ = selected[subject]

    # Define the pipeline
    model = Pipeline([
        ('scaler', StandardScaler()),  
        ('logistic regression', SGDClassifier(loss="hinge", penalty="l2", max_iter=100))
    ])

    # Train the pipeline on the training data
    model.fit(X_train, y_train);

    # Add the model to the dictionary
    models[subject] = model

  0%|          | 0/10 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:35<00:00,  3.57s/it]


Finally, we evaluate the model on the validation data:

In [14]:

for subject in tqdm(selected.keys()):

    # Get the data on which we want to evaluate the model
    X_train, y_train, X_val, y_val, _, _ = selected[subject]

    # Get the model
    model = models[subject]

    # Make predictions on the train and validation sets
    y_train_pred = model.predict(X_train)
    y_val_pred = model.predict(X_val)

    # Compute the accuracy
    train_acc = accuracy_score(y_train, y_train_pred)
    val_acc = accuracy_score(y_val, y_val_pred)

    # Print the results
    print(f"ℹ️ Subject {subject} - Train acc: {train_acc:.3f}, Val acc: {val_acc:.3f}")

  0%|          | 0/10 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:00<00:00, 77.40it/s]

ℹ️ Subject 18 - Train acc: 0.767, Val acc: 0.717
ℹ️ Subject 25 - Train acc: 0.811, Val acc: 0.785
ℹ️ Subject 23 - Train acc: 0.801, Val acc: 0.746
ℹ️ Subject 26 - Train acc: 0.814, Val acc: 0.799
ℹ️ Subject 4 - Train acc: 0.793, Val acc: 0.743
ℹ️ Subject 21 - Train acc: 0.888, Val acc: 0.862
ℹ️ Subject 19 - Train acc: 0.820, Val acc: 0.745
ℹ️ Subject 12 - Train acc: 0.797, Val acc: 0.738
ℹ️ Subject 24 - Train acc: 0.886, Val acc: 0.826
ℹ️ Subject 2 - Train acc: 0.870, Val acc: 0.823





---