# **Training a Random Forest model to classify different gesture classes using EMG signals as input**

In this practice, EMG data will be provided to be used as input to train a classical machine learning model called Random Forest.

Eight bipolar EMG electrodes were used to measure myoelectric activations and acquire EMG signals. Signals were acquired at a sampling rate of 1,200 Hz using a g.tec g.USBamp bioamplifier, which applied a Butterworth band-pass filter (5 Hz – 500 Hz). Power-line noise was removed using a 50 Hz notch filter. The user performed four grasp types for offline data collection. These gestures were selected so that the hand could perform the most common gestures used to pick up a bowl. For each gesture, the participant started with 10 seconds of rest, during which the hand remained completely relaxed, followed by 10 seconds performing the gesture. Visual cues were provided through a 3-second countdown shown on a computer screen, indicating transitions between gesture and rest states.

The files below will be downloaded from Google Drive in the code cells.
There are four files:

* **X_train**: Model input used for training. It contains EMG signals that will be used as model input and learned for decoding. It contains 9 out of 10 repetitions.
* **X_test**: Model input used for testing. It contains EMG signals that will be used as model input and evaluated for performance. It contains 1 out of 10 repetitions, i.e., the repetition not seen during training.
* **Y_train**: Contains labels/classes for the corresponding training input data. Labels are one of the 4 gestures or rest. Total: 5 classes.
* **Y_test**: Contains labels/classes for the corresponding test input data. Labels are one of the 4 gestures or rest. Total: 5 classes.

Input data has shape `(number_of_samples, 8, 200, 1)`, while output data has shape `(number_of_samples, 1)`.
To provide this data to a Random Forest model using the implementation provided, you need to reshape it to `(number_of_samples, 16000)` using `-1`, as follows: `(-1, 16000)`.
Data is already balanced and filtered.

Next, you will receive a few tasks, but first let’s load the data:

In [None]:
!pip install -U gdown



In [None]:
import gdown
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Activation, Dropout, Conv2D, MaxPooling2D, Softmax, BatchNormalization
from tensorflow.keras.optimizers import Adam
from numpy import genfromtxt

# Replace this with your folder ID
folder_id = '14pZ46-ySb3PernSKjuGnIhIMJl-B3xZ5?usp=sharing'

gdown.download_folder(id=folder_id, quiet=False, use_cookies=False)


Retrieving folder contents


Processing file 1yqA6a7pfLeKwoBqDXtjDAsHALVw5a82B X_test.npy
Processing file 1dlcf9dezUOGLDxesLxsUS9ZD9GZHghC4 X_train.npy
Processing file 1Srw2qJXopeQ7LubCrfgBk7v8DGGlOq28 Y_test.npy
Processing file 1dZ8jtelyJJPWwwlDWlMWa1lTjYYjhrBw Y_train.npy


Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From: https://drive.google.com/uc?id=1yqA6a7pfLeKwoBqDXtjDAsHALVw5a82B
To: /content/Pratica de python 2/X_test.npy
100%|██████████| 33.9M/33.9M [00:00<00:00, 87.2MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=1dlcf9dezUOGLDxesLxsUS9ZD9GZHghC4
From (redirected): https://drive.google.com/uc?id=1dlcf9dezUOGLDxesLxsUS9ZD9GZHghC4&confirm=t&uuid=c27e90cb-4155-473c-abcd-338166cbf6ea
To: /content/Pratica de python 2/X_train.npy
100%|██████████| 204M/204M [00:02<00:00, 93.1MB/s]
Downloading...
From: https://drive.google.com/uc?id=1Srw2qJXopeQ7LubCrfgBk7v8DGGlOq28
To: /content/Pratica de python 2/Y_test.npy
100%|██████████| 21.3k/21.3k [00:00<00:00, 32.2MB/s]
Downloading...
From: https://drive.google.com/uc?id=1dZ8jtelyJJPWwwlDWlMWa1lTjYYjhrBw
To: /content/Pratica de python 2/Y_train.npy
100%|██████████| 128k/128k [00:00<00:00, 3.18MB/s]
Download completed

['/content/Pratica de python 2/X_test.npy',
 '/content/Pratica de python 2/X_train.npy',
 '/content/Pratica de python 2/Y_test.npy',
 '/content/Pratica de python 2/Y_train.npy']

In [None]:
cd Pratica\ de\ python\ 2

/content/Pratica de python 2


Now that the data has been loaded into the Google Colab environment.

**Remember: this practice is graded and must be submitted on Canvas.**

**Any changes you make in this notebook will not be saved automatically.**

**So I recommend going to "File" > "Save a copy in Drive".**

Now you can freely make changes and they will be saved.

**The file saved in your Drive will probably be named "Programming_exercises_1.ipynb" or something similar, and may be inside a folder called "Colab Notebooks".**

**The grade is by group, but all members must submit the assignment. Since this is a group activity, files submitted by members of the same group should obviously be the same. However, exactly identical files submitted by members of different groups will result in a grade of 0.**

This assignment counts as a complementary activity grade.

**Since this task involves training machine learning models, I recommend using GPUs to speed up training. To do that, go to 'Runtime > Change runtime type > GPU'.**

Good luck!

## **Task 1: Load the files into variables and train a Random Forest model using raw data (without feature extraction — we will do that in the next tasks)**

Remember to put the data in the correct format.
The model implementation is already provided for you.

In [None]:
## Your Code Here

In [None]:
from sklearn.ensemble import RandomForestClassifier


print('Training the model...')
# Create a Gaussian Classifier
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, Y_train)
# Prediction step
rf_prediction = clf.predict(X_test)
counter = 0
accuracy = 0

# Calculates overall match accuracy (score equivalent)
for i in range(len(X_test)):
  if rf_prediction[i] == Y_test[i]:
    counter = counter + 1
accuracy = float(counter / (len(X_test)))
print(accuracy)


## **Task 2: Load the files again into new variables, extract the features from the table below, and train another Random Forest model**

Features must be implemented per channel. Since there are 8 channels, extract 8 features per channel. The final result should have shape `(number_of_samples, 8, 8)`, where 8 is both the number of channels and the number of features.

You can implement features yourself or use any library you find.

In [None]:
## Your Code Here

In [None]:
from sklearn.ensemble import RandomForestClassifier

print('Training the model...')
# Create a Gaussian Classifier
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, Y_train)
# Prediction step
rf_prediction = clf.predict(X_test)
counter = 0
accuracy = 0

# Calculates overall match accuracy (score equivalent)
for i in range(len(X_test)):
  if rf_prediction[i] == Y_test[i]:
    counter = counter + 1
accuracy = float(counter / (len(X_test)))
print(accuracy)

## **Task 3: Discuss which model achieved the best result**

Your explanation here:

## **Task 4: For the model with the best result, evaluate its performance using metrics other than accuracy. Also generate a confusion matrix**

In [None]:
## Your Code Here

## **Task 5: Vary the `n_estimators` parameter in Random Forest and check whether performance improves or decreases.**

In [None]:
## Your Code Here