<a href="https://colab.research.google.com/github/michelleaeh/Dissertation/blob/master/FinalHorizontal2MyoASL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Myo armband dataset from https://data.mendeley.com/datasets/wgswcr8z24/2**


**The dataset consits of .csv files collected from two Myo armbands. The format of the files are [word_name]_[id]. The ‘word_name’ is the English translation of the American Sign Language word used and the ‘id’ is a unique identifier. The .zip for each of the above links has sub-folders for each User.**

**Each file has 50 columns. They represent a sub-sampled data collection from two Myo devices worn on left and right hands of the signer. The first column is the ‘Counter’ that goes from 1 to 50.**

**The following columns are of the format: [Sensor][pod/direction][left/right]. For instance the EMG reading for the first EMG pod (out of 8) on the left hand would be called EMG0R and the accelerometer reading for the Z axis on the left hand would be called: AXL**

**If you use this dataset please cite the following papers:**

**@inproceedings{paudyal2016sceptre,
title={Sceptre: a pervasive, non-invasive, and programmable gesture recognition technology},
author={Paudyal, Prajwal and Banerjee, Ayan and Gupta, Sandeep KS},
booktitle={Proceedings of the 21st International Conference on Intelligent User Interfaces},
pages={282--293},
year={2016},
organization={ACM}
}**

**@inproceedings{paudyal2017dyfav,
title={Dyfav: Dynamic feature selection and voting for real-time recognition of fingerspelled alphabet using wearables},
author={Paudyal, Prajwal and Lee, Junghyo and Banerjee, Ayan and Gupta, Sandeep KS},
booktitle={Proceedings of the 22nd International Conference on Intelligent User Interfaces},
pages={457--467},
year={2017},
organization={ACM}
}**

**Frequency:**

50Hz sampling rate

**Words:**

*36 total words*

allmorning, bird, blue, cantsleep, cat, colrunnynose, continuouslyforanhour, cost, day, dollar, everymorning, everynight, gold, goodnight, happy, headache, home, horse, hot, hurt, itching, large, mom, monthly, notfeelgood, orange, pizza, please, shirt, soreness, swelling, takeliquidmedicine, thatsterrible, tired, upsetstomach, wash


**Filenames:**

*849 total files*

(word)_(user#)(try#)


**Columns of files:**

Counter  (1 -> 50)

EMG0L -> EMG7L  (EMG sensor readings)

AXL, AYL, AZL  (accelerometer readings)

GXL, GYL, GZL  (gyroscope readings)

ORL, OPL, OYL  (magnetometer readings?)

EMG0R -> EMG7R  (EMG sensor readings)

AXR, AYR, AZR  (accelerometer readings)

GXR, GYR, GZR  (gyroscope readings)

ORR, OPR, OYR  (magnetometer readings?)

features=['EMG0L', 'EMG1L', 'EMG2L', 'EMG3L', 'EMG4L', 'EMG5L', 'EMG6L', 'EMG7L', 'AXL', 'AYL', 'AZL', 'GXL', 'GYL', 'GZL', 'ORL', 'OPL', 'OYL', 'EMG0R', 'EMG1R', 'EMG2R', 'EMG3R', 'EMG4R', 'EMG5R', 'EMG6R', 'EMG7R', 'AXR', 'AYR', 'AZR', 'GXR', 'GYR', 'GZR', 'ORR', 'OPR', 'OYR']


**Size of files:**

All files are 50 rows x 35 columns except continuouslyforanhour_22.csv, headache_52.csv, home_61.csv, and mom_82.csv which are 101 rows x 35 columns

**Steps:**

1. Combine files
2. Normalize or standardize matrix
3. Apply Butterworth
4. Apply PCA
5. Input to SVM


In [1]:
import matplotlib.pyplot as plt
import numpy as np
import os.path
import pandas as pd
import scipy as sp
import warnings

from google.colab import files
from mpl_toolkits import mplot3d
from scipy import signal
from scipy.io import loadmat
from sklearn import metrics
from sklearn.decomposition import PCA
from sklearn.metrics import classification_report
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from zipfile import ZipFile

# Import zip containing all files
file_name = "/content/2MyoASL.zip"

# Extract all files from zip
with ZipFile(file_name, 'r') as zip:
  zip.extractall()

In [2]:
# Generate matrices for all combinations of sensors (E=3, A=5, G=7, O=11)
products=[3, 5, 7, 11, 15, 21, 33, 35, 55, 77, 105, 165, 231, 385, 1155]
comb=['e', 'a', 'g', 'o', 'ea', 'eg', 'eo', 'ag', 'ao', 'go', 'eag', 'eao', 'ego', 'ago', 'eago']
emg=['EMG0L', 'EMG1L', 'EMG2L', 'EMG3L', 'EMG4L', 'EMG5L', 'EMG6L', 'EMG7L', 
    'EMG0R', 'EMG1R', 'EMG2R', 'EMG3R', 'EMG4R', 'EMG5R', 'EMG6R', 'EMG7R']
acc=['AXL', 'AYL', 'AZL', 'AXR', 'AYR', 'AZR']
gyro=['GXL', 'GYL', 'GZL', 'GXR', 'GYR', 'GZR']
ori=['ORL', 'OPL', 'OYL', 'ORR', 'OPR', 'OYR']

# Initialization of counters
words=['allmorning', 'bird', 'blue', 'cantsleep', 'cat', 'coldrunnynose', 'continuouslyforanhour', 'cost', 'day', 
       'dollar', 'everymorning', 'everynight', 'gold', 'goodnight', 'happy', 'headache', 'home', 'horse', 'hot', 
       'hurt', 'itching', 'large', 'mom', 'monthly', 'notfeelgood', 'orange', 'pizza', 'please', 'shirt', 
       'soreness', 'swelling', 'takeliquidmedicine', 'thatsterrible', 'tired', 'upsetstomach', 'wash']
colnames=['EMG0L', 'EMG1L', 'EMG2L', 'EMG3L', 'EMG4L', 'EMG5L', 'EMG6L', 'EMG7L', 'AXL', 'AYL', 'AZL', 'GXL', 'GYL', 'GZL', 'ORL', 'OPL', 'OYL', 
          'EMG0R', 'EMG1R', 'EMG2R', 'EMG3R', 'EMG4R', 'EMG5R', 'EMG6R', 'EMG7R', 'AXR', 'AYR', 'AZR', 'GXR', 'GYR', 'GZR', 'ORR', 'OPR', 'OYR']
lengths=np.zeros(849, dtype=int)
reps=np.zeros(36,dtype=int)
headers=np.empty(1701, dtype=object)
features=np.zeros(15)
target=np.zeros(15)
matrix=np.zeros(1)
norm=products
stand=products

fn=np.arange(1701)
wordnum=-1
counter=-1
num=0
n=0

**1. Combine all files**

In [3]:
# Combine all files
for w in words:
  repcount=0
  wordnum+=1

  for i in range (10, 120):
    path='/content/2MyoASL/' + w + '_' + str(i) + '.csv'

    if os.path.exists(path)==True:
      counter+=1
      repcount+=1
      trial=pd.read_csv(path)

      trial.reset_index(drop=True)
      
      # Assign word number per row
      row=np.zeros(1)
      for t in range(35):
        if t==0:
          row[0]=wordnum
        else:
          sensor=trial.iloc[0:50,t].values
          sensor.reshape([1,50])
          row=np.concatenate((row, sensor))
      prev=row
      
      # Combine all trials
      if counter==0:
        matrix=prev
      else:
        matrix=np.concatenate([matrix,prev])

# Create header name array
headers[0]='Word'
for c in colnames:
  for t in range(50):
    num+=1
    headers[num]=c

# Give format to final matrix 
matrix=matrix.reshape([849,1701])
matrix=pd.DataFrame(matrix, columns=headers) ####### matrix stores raw data of all 849 files

**1.1 Create combinatory matrices**

In [4]:
# Eliminate unnecessary columns to create combinatory matrices
for v in products:
  m=matrix
  if v%3!=0:
    m=m.drop(emg,axis=1)
  if v%5!=0:
    m=m.drop(acc,axis=1)
  if v%7!=0:
    m=m.drop(gyro,axis=1)
  if v%11!=0:
    m=m.drop(ori,axis=1)
  products[n]=m ####### products stores the raw data of all 849 files according to combinatory sensors
  exec(comb[n]+"=products[n]")
  n+=1
print(products[0])

     Word  EMG0L  EMG0L  EMG0L  EMG0L  ...  EMG7R  EMG7R  EMG7R  EMG7R  EMG7R
0     0.0    0.0    0.0   -3.0   -1.0  ...   -2.0    9.0  -14.0   11.0  -20.0
1     0.0    0.0    0.0    4.0   -2.0  ...    1.0   -3.0    1.0    0.0    2.0
2     0.0    0.0    0.0   -8.0   -1.0  ...    2.0   -2.0   -1.0    2.0   14.0
3     0.0    0.0    0.0   -1.0  -19.0  ...    0.0   -2.0   -1.0   -3.0    1.0
4     0.0    0.0    0.0    1.0  -16.0  ...    3.0   -3.0    2.0    0.0   -1.0
..    ...    ...    ...    ...    ...  ...    ...    ...    ...    ...    ...
844  35.0    0.0    0.0   -2.0    0.0  ...   -2.0   -2.0    1.0   11.0  -10.0
845  35.0    0.0    0.0    2.0   -8.0  ...    1.0    0.0   -1.0   -2.0   -2.0
846  35.0    0.0    0.0   -8.0   26.0  ...   -2.0    1.0   -2.0    7.0   -3.0
847  35.0    0.0    0.0   -1.0   21.0  ...   -4.0   -1.0   -4.0    0.0   -2.0
848  35.0    0.0    0.0    8.0  -19.0  ...   -3.0    4.0    9.0   -4.0    4.0

[849 rows x 801 columns]


**2. Normalize and standardize each combinatory matrix**

In [12]:
n=0
for m in products:
  # Separate features from target values
  x = m.iloc[:, m.columns!='Word'].values   # Features
 
  # Normalize features so that each column is between 0 and 1
  norm_matrix=(m-m.min())/(m.max()-m.min())
  norm_matrix=pd.DataFrame(norm_matrix)
  norm_matrix=norm_matrix.dropna(axis=1)
  for r in range(849):
    wn=int(35*norm_matrix.iloc[r,0])
    norm_matrix.iloc[r,0]=words[wn]
  norm[n]=norm_matrix
  print(m)

  # Standardize features with mean=0 and deviation=1
  standardized_matrix=StandardScaler().fit_transform(x)
  standardized_matrix=pd.DataFrame(standardized_matrix)
  standardized_matrix=standardized_matrix.dropna(axis=1)
  stand[n]=standardized_matrix

  n+=1

cm=norm+stand ####### cm stores all 15 matrices normalized first and then standardized

print(m)
print(products[14])
print(norm[14])
print(stand[14])

  result = op(self.values, np.asarray(other))


     0         1         2         3    ...       796       797       798       799
0    0.0 -0.008979 -0.230710 -0.008521  ...  1.038845 -1.499971  1.623874 -2.743880
1    0.0 -0.008979  0.576775 -0.123346  ... -0.262970  0.215995  0.152081  0.403779
2    0.0 -0.008979 -0.807486 -0.008521  ... -0.154485 -0.012801  0.419679  2.120685
3    0.0 -0.008979  0.000000 -2.075372  ... -0.154485 -0.012801 -0.249318  0.260704
4    0.0 -0.008979  0.230710 -1.730897  ... -0.262970  0.330392  0.152081 -0.025447
..   ...       ...       ...       ...  ...       ...       ...       ...       ...
844  0.0 -0.008979 -0.115355  0.106304  ... -0.154485  0.215995  1.623874 -1.313126
845  0.0 -0.008979  0.346065 -0.812296  ...  0.062484 -0.012801 -0.115518 -0.168522
846  0.0 -0.008979 -0.807486  3.091757  ...  0.170969 -0.127198  1.088676 -0.311598
847  0.0 -0.008979  0.000000  2.517631  ... -0.046000 -0.355994  0.152081 -0.168522
848  0.0 -0.008979  1.038196 -2.075372  ...  0.496422  1.131176 -0.383117  0

  result = op(self.values, np.asarray(other))


          0         1         2    ...       297       298       299
0    0.868317  0.710788  0.299576  ...  0.864979  0.792709  0.792012
1    1.075622  1.016774  1.342839  ...  0.735316  0.816698  0.761102
2    0.939312  0.919926  0.861120  ...  1.015471  0.782078  0.853924
3   -0.043259  0.421646  1.379999  ...  0.898711  0.896517  0.775137
4   -0.240624 -0.332088 -0.011475  ...  1.075126  0.515853 -0.394692
..        ...       ...       ...  ...       ...       ...       ...
844 -1.119541 -1.283731 -0.910223  ...  0.626375  0.497217  0.493951
845  0.484945  0.466562 -0.051389  ... -0.948064 -1.339665 -0.434062
846 -1.067003 -0.906161  0.190845  ... -0.022079  0.057960 -0.436873
847  1.095501  0.752896  0.371145  ... -0.203641 -0.549010 -0.442500
848 -0.913655 -0.948271 -1.383684  ... -0.190674  0.313526  0.817353

[849 rows x 300 columns]


  result = op(self.values, np.asarray(other))


          0         1         2    ...       297       298       299
0    0.015394  0.204646  0.414048  ...  0.392949  0.376767  0.355482
1    1.555134  1.452890  1.152575  ...  0.403843  0.396188  0.378354
2    0.979916  1.096041  0.604727  ...  0.435315  0.402662  0.379784
3    2.178021  0.969323  0.396972  ...  0.362687  0.401367  0.392649
4    0.039229 -0.105594 -1.256533  ...  0.102438  0.056967  0.294013
..        ...       ...       ...  ...       ...       ...       ...
844 -3.067264 -2.469537 -2.855965  ... -2.684049 -2.980485 -2.766586
845 -1.826256 -1.805361 -0.472470  ...  0.591465  0.119114  0.464125
846 -0.195943 -0.608096 -0.970514  ... -1.122551 -0.214928  0.229684
847 -3.504251 -3.796445  0.611842  ...  0.717353  0.143714  0.539890
848  0.005860 -0.012376 -1.122773  ...  0.079439  0.313325  0.509870

[849 rows x 300 columns]


  result = op(self.values, np.asarray(other))


          0         1         2    ...       297       298       299
0    2.380355  2.377798  2.355299  ... -0.062523 -0.057314 -0.053570
1    1.458864  1.460059  1.631281  ... -0.062523 -0.057314 -0.053570
2    2.103908  2.194250  2.355299  ... -0.041058 -0.035783 -0.031995
3    1.551013  1.551833  1.631281  ... -0.062523 -0.057314 -0.053570
4   -0.199819 -0.191871 -0.269268  ...  0.109193  0.114933  0.119033
..        ...       ...       ...  ...       ...       ...       ...
844 -0.844863 -0.926063 -1.355297  ...  0.302373  0.265649  0.205335
845 -1.213459 -1.293158 -1.445799  ...  1.955137  1.966589  1.974518
846 -1.029161 -1.017836 -1.264794  ...  1.912208  1.923528  1.952943
847 -1.397757 -1.476706 -1.626804  ...  1.998066  2.009651  2.017669
848 -0.752714 -0.742515 -0.812282  ...  1.998066  2.009651 -1.844329

[849 rows x 300 columns]


  result = op(self.values, np.asarray(other))


     0         1         2         3     ...      1096      1097      1098      1099
0     0.0 -0.008979 -0.230710 -0.008521  ...  0.922006  0.864979  0.792709  0.792012
1     0.0 -0.008979  0.576775 -0.123346  ...  0.733398  0.735316  0.816698  0.761102
2     0.0 -0.008979 -0.807486 -0.008521  ...  0.846092  1.015471  0.782078  0.853924
3     0.0 -0.008979  0.000000 -2.075372  ...  0.721105  0.898711  0.896517  0.775137
4     0.0 -0.008979  0.230710 -1.730897  ... -0.410749  1.075126  0.515853 -0.394692
..    ...       ...       ...       ...  ...       ...       ...       ...       ...
844   0.0 -0.008979 -0.115355  0.106304  ...  0.456532  0.626375  0.497217  0.493951
845   0.0 -0.008979  0.346065 -0.812296  ...  0.334035 -0.948064 -1.339665 -0.434062
846   0.0 -0.008979 -0.807486  3.091757  ...  0.343839 -0.022079  0.057960 -0.436873
847   0.0 -0.008979  0.000000  2.517631  ...  0.924465 -0.203641 -0.549010 -0.442500
848   0.0 -0.008979  1.038196 -2.075372  ... -0.153507 -0.190674 

  result = op(self.values, np.asarray(other))


     0         1         2         3     ...      1096      1097      1098      1099
0     0.0 -0.008979 -0.230710 -0.008521  ...  0.363894  0.392949  0.376767  0.355482
1     0.0 -0.008979  0.576775 -0.123346  ...  0.388662  0.403843  0.396188  0.378354
2     0.0 -0.008979 -0.807486 -0.008521  ...  0.444096  0.435315  0.402662  0.379784
3     0.0 -0.008979  0.000000 -2.075372  ...  0.367433  0.362687  0.401367  0.392649
4     0.0 -0.008979  0.230710 -1.730897  ... -0.749490  0.102438  0.056967  0.294013
..    ...       ...       ...       ...  ...       ...       ...       ...       ...
844   0.0 -0.008979 -0.115355  0.106304  ... -1.920675 -2.684049 -2.980485 -2.766586
845   0.0 -0.008979  0.346065 -0.812296  ...  0.830950  0.591465  0.119114  0.464125
846   0.0 -0.008979 -0.807486  3.091757  ... -1.707189 -1.122551 -0.214928  0.229684
847   0.0 -0.008979  0.000000  2.517631  ...  0.582089  0.717353  0.143714  0.539890
848   0.0 -0.008979  1.038196 -2.075372  ... -0.670468  0.079439 

  result = op(self.values, np.asarray(other))


     0         1         2         3     ...      1096      1097      1098      1099
0     0.0 -0.008979 -0.230710 -0.008521  ... -0.072356 -0.062523 -0.057314 -0.053570
1     0.0 -0.008979  0.576775 -0.123346  ... -0.072356 -0.062523 -0.057314 -0.053570
2     0.0 -0.008979 -0.807486 -0.008521  ... -0.050930 -0.041058 -0.035783 -0.031995
3     0.0 -0.008979  0.000000 -2.075372  ... -0.072356 -0.062523 -0.057314 -0.053570
4     0.0 -0.008979  0.230710 -1.730897  ...  0.099058  0.109193  0.114933  0.119033
..    ...       ...       ...       ...  ...       ...       ...       ...       ...
844   0.0 -0.008979 -0.115355  0.106304  ...  0.377606  0.302373  0.265649  0.205335
845   0.0 -0.008979  0.346065 -0.812296  ...  1.920333  1.955137  1.966589  1.974518
846   0.0 -0.008979 -0.807486  3.091757  ...  1.920333  1.912208  1.923528  1.952943
847   0.0 -0.008979  0.000000  2.517631  ...  1.963186  1.998066  2.009651  2.017669
848   0.0 -0.008979  1.038196 -2.075372  ...  1.984613  1.998066 

  result = op(self.values, np.asarray(other))


          0         1         2    ...       597       598       599
0    0.868317  0.710788  0.299576  ...  0.392949  0.376767  0.355482
1    1.075622  1.016774  1.342839  ...  0.403843  0.396188  0.378354
2    0.939312  0.919926  0.861120  ...  0.435315  0.402662  0.379784
3   -0.043259  0.421646  1.379999  ...  0.362687  0.401367  0.392649
4   -0.240624 -0.332088 -0.011475  ...  0.102438  0.056967  0.294013
..        ...       ...       ...  ...       ...       ...       ...
844 -1.119541 -1.283731 -0.910223  ... -2.684049 -2.980485 -2.766586
845  0.484945  0.466562 -0.051389  ...  0.591465  0.119114  0.464125
846 -1.067003 -0.906161  0.190845  ... -1.122551 -0.214928  0.229684
847  1.095501  0.752896  0.371145  ...  0.717353  0.143714  0.539890
848 -0.913655 -0.948271 -1.383684  ...  0.079439  0.313325  0.509870

[849 rows x 600 columns]


  result = op(self.values, np.asarray(other))


          0         1         2    ...       597       598       599
0    0.868317  0.710788  0.299576  ... -0.062523 -0.057314 -0.053570
1    1.075622  1.016774  1.342839  ... -0.062523 -0.057314 -0.053570
2    0.939312  0.919926  0.861120  ... -0.041058 -0.035783 -0.031995
3   -0.043259  0.421646  1.379999  ... -0.062523 -0.057314 -0.053570
4   -0.240624 -0.332088 -0.011475  ...  0.109193  0.114933  0.119033
..        ...       ...       ...  ...       ...       ...       ...
844 -1.119541 -1.283731 -0.910223  ...  0.302373  0.265649  0.205335
845  0.484945  0.466562 -0.051389  ...  1.955137  1.966589  1.974518
846 -1.067003 -0.906161  0.190845  ...  1.912208  1.923528  1.952943
847  1.095501  0.752896  0.371145  ...  1.998066  2.009651  2.017669
848 -0.913655 -0.948271 -1.383684  ...  1.998066  2.009651 -1.844329

[849 rows x 600 columns]


  result = op(self.values, np.asarray(other))


          0         1         2    ...       597       598       599
0    0.015394  0.204646  0.414048  ... -0.062523 -0.057314 -0.053570
1    1.555134  1.452890  1.152575  ... -0.062523 -0.057314 -0.053570
2    0.979916  1.096041  0.604727  ... -0.041058 -0.035783 -0.031995
3    2.178021  0.969323  0.396972  ... -0.062523 -0.057314 -0.053570
4    0.039229 -0.105594 -1.256533  ...  0.109193  0.114933  0.119033
..        ...       ...       ...  ...       ...       ...       ...
844 -3.067264 -2.469537 -2.855965  ...  0.302373  0.265649  0.205335
845 -1.826256 -1.805361 -0.472470  ...  1.955137  1.966589  1.974518
846 -0.195943 -0.608096 -0.970514  ...  1.912208  1.923528  1.952943
847 -3.504251 -3.796445  0.611842  ...  1.998066  2.009651  2.017669
848  0.005860 -0.012376 -1.122773  ...  1.998066  2.009651 -1.844329

[849 rows x 600 columns]


  result = op(self.values, np.asarray(other))


     0         1         2         3     ...      1396      1397      1398      1399
0     0.0 -0.008979 -0.230710 -0.008521  ...  0.363894  0.392949  0.376767  0.355482
1     0.0 -0.008979  0.576775 -0.123346  ...  0.388662  0.403843  0.396188  0.378354
2     0.0 -0.008979 -0.807486 -0.008521  ...  0.444096  0.435315  0.402662  0.379784
3     0.0 -0.008979  0.000000 -2.075372  ...  0.367433  0.362687  0.401367  0.392649
4     0.0 -0.008979  0.230710 -1.730897  ... -0.749490  0.102438  0.056967  0.294013
..    ...       ...       ...       ...  ...       ...       ...       ...       ...
844   0.0 -0.008979 -0.115355  0.106304  ... -1.920675 -2.684049 -2.980485 -2.766586
845   0.0 -0.008979  0.346065 -0.812296  ...  0.830950  0.591465  0.119114  0.464125
846   0.0 -0.008979 -0.807486  3.091757  ... -1.707189 -1.122551 -0.214928  0.229684
847   0.0 -0.008979  0.000000  2.517631  ...  0.582089  0.717353  0.143714  0.539890
848   0.0 -0.008979  1.038196 -2.075372  ... -0.670468  0.079439 

  result = op(self.values, np.asarray(other))


     0         1         2         3     ...      1396      1397      1398      1399
0     0.0 -0.008979 -0.230710 -0.008521  ... -0.072356 -0.062523 -0.057314 -0.053570
1     0.0 -0.008979  0.576775 -0.123346  ... -0.072356 -0.062523 -0.057314 -0.053570
2     0.0 -0.008979 -0.807486 -0.008521  ... -0.050930 -0.041058 -0.035783 -0.031995
3     0.0 -0.008979  0.000000 -2.075372  ... -0.072356 -0.062523 -0.057314 -0.053570
4     0.0 -0.008979  0.230710 -1.730897  ...  0.099058  0.109193  0.114933  0.119033
..    ...       ...       ...       ...  ...       ...       ...       ...       ...
844   0.0 -0.008979 -0.115355  0.106304  ...  0.377606  0.302373  0.265649  0.205335
845   0.0 -0.008979  0.346065 -0.812296  ...  1.920333  1.955137  1.966589  1.974518
846   0.0 -0.008979 -0.807486  3.091757  ...  1.920333  1.912208  1.923528  1.952943
847   0.0 -0.008979  0.000000  2.517631  ...  1.963186  1.998066  2.009651  2.017669
848   0.0 -0.008979  1.038196 -2.075372  ...  1.984613  1.998066 

  result = op(self.values, np.asarray(other))


     0         1         2         3     ...      1396      1397      1398      1399
0     0.0 -0.008979 -0.230710 -0.008521  ... -0.072356 -0.062523 -0.057314 -0.053570
1     0.0 -0.008979  0.576775 -0.123346  ... -0.072356 -0.062523 -0.057314 -0.053570
2     0.0 -0.008979 -0.807486 -0.008521  ... -0.050930 -0.041058 -0.035783 -0.031995
3     0.0 -0.008979  0.000000 -2.075372  ... -0.072356 -0.062523 -0.057314 -0.053570
4     0.0 -0.008979  0.230710 -1.730897  ...  0.099058  0.109193  0.114933  0.119033
..    ...       ...       ...       ...  ...       ...       ...       ...       ...
844   0.0 -0.008979 -0.115355  0.106304  ...  0.377606  0.302373  0.265649  0.205335
845   0.0 -0.008979  0.346065 -0.812296  ...  1.920333  1.955137  1.966589  1.974518
846   0.0 -0.008979 -0.807486  3.091757  ...  1.920333  1.912208  1.923528  1.952943
847   0.0 -0.008979  0.000000  2.517631  ...  1.963186  1.998066  2.009651  2.017669
848   0.0 -0.008979  1.038196 -2.075372  ...  1.984613  1.998066 

  result = op(self.values, np.asarray(other))


          0         1         2    ...       897       898       899
0    0.868317  0.710788  0.299576  ... -0.062523 -0.057314 -0.053570
1    1.075622  1.016774  1.342839  ... -0.062523 -0.057314 -0.053570
2    0.939312  0.919926  0.861120  ... -0.041058 -0.035783 -0.031995
3   -0.043259  0.421646  1.379999  ... -0.062523 -0.057314 -0.053570
4   -0.240624 -0.332088 -0.011475  ...  0.109193  0.114933  0.119033
..        ...       ...       ...  ...       ...       ...       ...
844 -1.119541 -1.283731 -0.910223  ...  0.302373  0.265649  0.205335
845  0.484945  0.466562 -0.051389  ...  1.955137  1.966589  1.974518
846 -1.067003 -0.906161  0.190845  ...  1.912208  1.923528  1.952943
847  1.095501  0.752896  0.371145  ...  1.998066  2.009651  2.017669
848 -0.913655 -0.948271 -1.383684  ...  1.998066  2.009651 -1.844329

[849 rows x 900 columns]


  result = op(self.values, np.asarray(other))


     0         1         2         3     ...      1696      1697      1698      1699
0     0.0 -0.008979 -0.230710 -0.008521  ... -0.072356 -0.062523 -0.057314 -0.053570
1     0.0 -0.008979  0.576775 -0.123346  ... -0.072356 -0.062523 -0.057314 -0.053570
2     0.0 -0.008979 -0.807486 -0.008521  ... -0.050930 -0.041058 -0.035783 -0.031995
3     0.0 -0.008979  0.000000 -2.075372  ... -0.072356 -0.062523 -0.057314 -0.053570
4     0.0 -0.008979  0.230710 -1.730897  ...  0.099058  0.109193  0.114933  0.119033
..    ...       ...       ...       ...  ...       ...       ...       ...       ...
844   0.0 -0.008979 -0.115355  0.106304  ...  0.377606  0.302373  0.265649  0.205335
845   0.0 -0.008979  0.346065 -0.812296  ...  1.920333  1.955137  1.966589  1.974518
846   0.0 -0.008979 -0.807486  3.091757  ...  1.920333  1.912208  1.923528  1.952943
847   0.0 -0.008979  0.000000  2.517631  ...  1.963186  1.998066  2.009651  2.017669
848   0.0 -0.008979  1.038196 -2.075372  ...  1.984613  1.998066 

**3. Apply Butterworth**

In [6]:
def butterworth(inmatrix_b):
  high = 1/(50/2)
  low = 23/(50/2)

  b, a = sp.signal.butter(4, [high,low], btype='bandpass')

  for r in emg:
    if r in inmatrix_b:
      # process EMG signal: filter EMG
      emg_filtered = sp.signal.lfilter(b, a, inmatrix_b[[r]])
      inmatrix_b[[r]]=emg_filtered
  return inmatrix_b

**Separate data from target**

In [7]:
def datasplit(inmatrix_p):
    x = inmatrix_p.iloc[:, inmatrix_p.columns!='Word'].values   # Features
    y = inmatrix_p.loc[:,'Word'].values     # Target
    x_train_p, x_test_p, y_train_p, y_test_p = train_test_split(x, y, test_size=0.6)
    return x_train_p, x_test_p, y_train_p, y_test_p

**4. Apply PCA**

In [8]:
def pca(x_train_c, x_test_c, y_train_c, y_test_c):
  pca = PCA(n_components=100)
  pca.fit(x_train_c)
  x_t_train_pca = pca.transform(x_train_c)
  x_t_test_pca = pca.transform(x_test_c)

  # Plot
  print("Normalized matrix")
  print(pca.explained_variance_ratio_)
  print(pca.singular_values_)
  plt.figure()
  plt.bar(fn[:100], pca.explained_variance_ratio_)
  plt.show()
  plt.bar(fn[:100], pca.singular_values_)
  plt.show()
  return x_train_c, x_test_c, y_train_c, y_test_c, x_t_train_pca, x_t_test_pca

**5. Apply SVM**

In [9]:
def svm(x_train_s, x_test_s, y_train_s, y_test_s, x_t_train_s="", x_t_test_s=""):
    if x_t_train_s=="":
      x_t_train_s=x_train_s
    if x_t_test_s=="":
      x_t_test_s=x_test_s
    clf = SVC()
    clf.fit(x_t_train_s, y_train_s)
    print ('score', clf.score(x_t_test_s, y_test_s))
    y_pred=clf.predict(x_t_test_s)
    print ('pred label', y_pred)
    print('length',len(clf.predict(x_t_test_s)))

    # Confusion matrix
    plot_confusion_matrix(clf, x_t_test_s, y_test_s,
                                    cmap=plt.cm.Blues)
    plt.figure(figsize=(50,50))
    plt.show()

    svmresult=classification_report(y_test_s, y_pred, target_names=words)
    return svmresult

Combinations of steps for classification

In [10]:
for u in products:
  print(u)
  #svm(datasplit(u))


     0         1         2         3    ...       796       797       798       799
0    0.0 -0.008979 -0.230710 -0.008521  ...  1.038845 -1.499971  1.623874 -2.743880
1    0.0 -0.008979  0.576775 -0.123346  ... -0.262970  0.215995  0.152081  0.403779
2    0.0 -0.008979 -0.807486 -0.008521  ... -0.154485 -0.012801  0.419679  2.120685
3    0.0 -0.008979  0.000000 -2.075372  ... -0.154485 -0.012801 -0.249318  0.260704
4    0.0 -0.008979  0.230710 -1.730897  ... -0.262970  0.330392  0.152081 -0.025447
..   ...       ...       ...       ...  ...       ...       ...       ...       ...
844  0.0 -0.008979 -0.115355  0.106304  ... -0.154485  0.215995  1.623874 -1.313126
845  0.0 -0.008979  0.346065 -0.812296  ...  0.062484 -0.012801 -0.115518 -0.168522
846  0.0 -0.008979 -0.807486  3.091757  ...  0.170969 -0.127198  1.088676 -0.311598
847  0.0 -0.008979  0.000000  2.517631  ... -0.046000 -0.355994  0.152081 -0.168522
848  0.0 -0.008979  1.038196 -2.075372  ...  0.496422  1.131176 -0.383117  0

In [11]:
# Directly to SVM
x = attempt.iloc[:, attempt.columns!='Word'].values   # Features
y = attempt.loc[:,'Word'].values     # Target
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.6)

clf = SVC()
clf.fit(x_train, y_train)
print ('score', clf.score(x_test, y_test))
y_pred=clf.predict(x_test)
print ('pred label', y_pred)
print('length',len(clf.predict(x_test)))

# Confusion matrix
plot_confusion_matrix(clf, x_test, y_test,
                                 cmap=plt.cm.Blues)
plt.figure(figsize=(50,50))
plt.show()

#print(classification_report(y_test, y_pred, target_names=words))
directresults=classification_report(y_test, y_pred, target_names=words)

NameError: ignored

In [None]:
# Normalized to SVM
x = nmatrix.iloc[:, nmatrix.columns!='Word'].values   # Features
y = nmatrix.loc[:,'Word'].values     # Target

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.6)

clf = SVC()
clf.fit(x_train, y_train)
print ('score', clf.score(x_test, y_test))
y_pred=clf.predict(x_test)
print ('pred label', y_pred)
print('length',len(clf.predict(x_test)))

# Confusion matrix
plot_confusion_matrix(clf, x_test, y_test,
                                 cmap=plt.cm.Blues)
plt.figure(figsize=(50,50))
plt.show()

#print(classification_report(y_test, y_pred, target_names=words))
normresults=classification_report(y_test, y_pred, target_names=words)

In [None]:
# Normalized, Butterworth to SVM
x = buttermatrix.iloc[:, buttermatrix.columns!='Word'].values   # Features
y = buttermatrix.loc[:,'Word'].values     # Target

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.6)

clf = SVC()
clf.fit(x_train, y_train)
print ('score', clf.score(x_test, y_test))
y_pred=clf.predict(x_test)
print ('pred label', y_pred)
print('length',len(clf.predict(x_test)))

# Confusion matrix
plot_confusion_matrix(clf, x_test, y_test,
                                 cmap=plt.cm.Blues)
plt.figure(figsize=(50,50))
plt.show()

#print(classification_report(y_test, y_pred, target_names=words))
butterresults=classification_report(y_test, y_pred, target_names=words)

In [None]:
# PCA for norm_matrix

x = norm_matrix_eago.iloc[:, norm_matrix_eago.columns!='Word'].values   # Features
y = norm_matrix_eago.loc[:,'Word'].values     # Target
print(y)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.6)

pca = PCA(n_components=100)
pca.fit(x_train)
x_t_train = pca.transform(x_train)
x_t_test = pca.transform(x_test)

#### plot
print("Normalized matrix")
print(pca.explained_variance_ratio_)
print(pca.singular_values_)
plt.figure()
plt.bar(fn[:100],pca.explained_variance_ratio_)
plt.show()
plt.bar(fn[:100],pca.singular_values_)
plt.show()

In [None]:
# Normalized, Butterworth, PCA to SVM
clf = SVC()
clf.fit(x_t_train, y_train)
print ('score', clf.score(x_t_test, y_test))
y_pred=clf.predict(x_t_test)
print ('pred label', y_pred)
print('length',len(clf.predict(x_t_test)))

# Confusion matrix
plot_confusion_matrix(clf, x_t_test, y_test,
                                 cmap=plt.cm.Blues)
plt.figure(figsize=(50,50))
plt.show()

#print(classification_report(y_test, y_pred, target_names=words))
pcaresults=classification_report(y_test, y_pred, target_names=words)

**Summary of results**


In [None]:
print('Directly:')
print(directresults)
print('Normalized:')
print(normresults)
print('Butterworth:')
print(butterresults)
print('PCA:')
print(pcaresults)

TODO:

-check what happens when combining diff data sources

-graph components

find correlations

-interpret pca results

-try pca per individual/word

-read dataset papers

correlation component with word

-try straight to svm

-resend email

check what is the data

-merge timeseries word index

see how stable each feature are

standard deviation timeseries

-try without filter

multi-class classifier SVM



https://www.researchgate.net/publication/303707429_Combining_Smartphone_and_Smartwatch_Sensor_Data_in_Activity_Recognition_Approaches_an_Experimental_Evaluation

References

https://stackoverflow.com/questions/56449262/how-to-upload-folders-to-google-colab

https://github.com/datarail/datarail/issues/39

https://dbader.org/blog/python-check-if-file-exists

https://datacarpentry.org/python-socialsci/11-joins/index.html

https://stackoverflow.com/questions/26414913/normalize-columns-of-pandas-data-frame

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

https://towardsdatascience.com/pca-using-python-scikit-learn-e653f8989e60

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

https://www.tutorialspoint.com/matplotlib/matplotlib_bar_plot.htm

https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.butter.html

https://stackoverflow.com/questions/32194967/how-to-do-pca-and-svm-for-classification-in-python

https://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator

https://scikit-learn.org/stable/modules/cross_validation.html

https://stackoverflow.com/questions/47684606/merge-cells-with-pandas

https://scientificallysound.org/2016/08/18/python-analysing-emg-signals-part-3/

https://stackoverflow.com/questions/58374492/python-valueerror-the-length-of-the-input-vector-x-must-be-greater-than-padle

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html

https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

https://statinfer.com/204-4-2-calculating-sensitivity-and-specificity-in-python/

https://www.datacamp.com/community/tutorials/pandas-multi-index

https://www.geeksforgeeks.org/how-to-get-rows-index-names-in-pandas-dataframe/

https://stackoverflow.com/questions/28140771/select-only-one-index-of-multiindex-dataframe

https://www.educative.io/edpresso/how-to-create-a-confusion-matrix-in-python-using-scikit-learn

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

http://www.datasciencemadesimple.com/standard-deviation-function-python-pandas-row-column/

https://stackoverflow.com/questions/39047915/concat-series-onto-dataframe-with-column-name


In [None]:
"""
### Combine all files
words=['allmorning', 'bird', 'blue', 'cantsleep', 'cat', 'coldrunnynose', 'continuouslyforanhour', 'cost', 'day', 'dollar', 'everymorning', 'everynight', 'gold', 'goodnight', 'happy', 'headache', 'home', 'horse', 'hot', 'hurt', 'itching', 'large', 'mom', 'monthly', 'notfeelgood', 'orange', 'pizza', 'please', 'shirt', 'soreness', 'swelling', 'takeliquidmedicine', 'thatsterrible', 'tired', 'upsetstomach', 'wash']
lengths=np.zeros(849, dtype=int)
reps=np.zeros(36,dtype=int)
wordnum=-1
counter=0

for w in words:
  repcount=0
  wordnum+=1

  for i in range (10, 120):
    path='/content/2MyoASL/' + w + '_' + str(i) + '.csv'

    if os.path.exists(path)==True:
      counter+=1
      repcount+=1
      trial=pd.read_csv(path)
      lengths[counter-1]=len(trial)
      
      # Replace counter with evaluated term
      for r in range(0,len(trial)):
        trial.iloc[r,0]=wordnum
        
      # Combine all trials
      if path=='/content/2MyoASL/allmorning_10.csv':
          matrix=trial
          
          # Create plot for EMG

          # Create plot for accelerometer
          plt.figure()
          alx=plt.axes(projection='3d')
          alx.plot3D(trial.loc[:, 'AXL'].values, trial.loc[:, 'AYL'].values, trial.loc[:, 'AZL'].values)
          alx.set_title('Accelerometer on left arm')

          plt.figure()
          arx=plt.axes(projection='3d')
          arx.plot3D(trial.loc[:, 'AXR'].values, trial.loc[:, 'AYR'].values, trial.loc[:, 'AZR'].values)
          arx.set_title('Accelerometer on right arm')

          # Create plot for gyroscope
          plt.figure()
          glx=plt.axes(projection='3d')
          glx.plot3D(trial.loc[:, 'GXL'].values, trial.loc[:, 'GYL'].values, trial.loc[:, 'GZL'].values)
          glx.set_title('Gyroscope on left arm')

          plt.figure()
          grx=plt.axes(projection='3d')
          grx.plot3D(trial.loc[:, 'GXR'].values, trial.loc[:, 'GYR'].values, trial.loc[:, 'GZR'].values)
          grx.set_title('Gyroscope on right arm')
      else:
          matrix=pd.concat([matrix, trial])

          ###############################################
          # Create plot for accelerometer
          alx.plot3D(trial.loc[:, 'AXL'].values, trial.loc[:, 'AYL'].values, trial.loc[:, 'AZL'].values)
          arx.plot3D(trial.loc[:, 'AXR'].values, trial.loc[:, 'AYR'].values, trial.loc[:, 'AZR'].values)
          #plt.show()

          # Create plot for gyroscope
          glx.plot3D(trial.loc[:, 'GXL'].values, trial.loc[:, 'GYL'].values, trial.loc[:, 'GZL'].values)
          grx.plot3D(trial.loc[:, 'GXR'].values, trial.loc[:, 'GYR'].values, trial.loc[:, 'GZR'].values)
          ###############################################

  reps[wordnum]=repcount
plt.show()
""" 

In [None]:
"""
# PCA for normmatrix
pca = PCA(n_components=35)
comp=pca.fit_transform(norm_matrix_eago)
principal=pd.DataFrame(data=comp, columns=['PC 0', 'PC 1', 'PC 2', 'PC 3', 'PC 4', 'PC 5', 'PC 6', 'PC 7', 'PC 8', 'PC 9', 'PC 10', 
                                           'PC 11', 'PC 12', 'PC 13', 'PC 14', 'PC 16', 'PC 17', 'PC 18', 'PC 19', 'PC 20', 
                                           'PC 21', 'PC 22', 'PC 23', 'PC 24', 'PC 25', 'PC 26', 'PC 27', 'PC 28', 'PC 29', 
                                           'PC 30', 'PC 31', 'PC 32', 'PC 33', 'PC 34', 'PC35'])
#principal.reset_index(drop=True, inplace=True)
#norm_matrix_eago[['Counter']].reset_index(drop=True, inplace=True)
norm_matrix_eago.reset_index(drop=True, inplace=True)
m.reset_index(drop=True, inplace=True)
#finaldf=pd.join([principal, norm_matrix_eago[['Counter']]], axis=1, ignore_index=True).reset_index()
finaldf=principal.join(m[['Counter']],how='outer')
finaldf=finaldf.drop(columns=['PC 0'])
####finaldf=principal
####print(m['Counter'])
####finaldf['Word']=m['Counter']
#finaldf=pd.concat([principal,norm_matrix_eago[['Counter']]], axis=1, ignore_index=True)

print("Normalized matrix")
print(pca.explained_variance_ratio_)
print(pca.singular_values_)
plt.figure()
plt.bar(fn,pca.explained_variance_ratio_)
plt.show()
plt.bar(fn,pca.singular_values_)
plt.show()
print(finaldf)
#print(principal.join(norm_matrix_eago[['Counter']],how='inner'))
"""
"""
# PCA for normalizedmatrix
pca = PCA(n_components=34)
pca.fit(normalized_matrix)
print("Normalized features")
print(pca.explained_variance_ratio_)
print(pca.singular_values_)
plt.figure()
plt.bar(fn[0:34],pca.explained_variance_ratio_)
plt.show()
plt.bar(fn[0:34],pca.singular_values_)
plt.show()

# PCA for standardizedmatrix
pca = PCA(n_components=34)
pca.fit(standardized_matrix)
print("Standardized features")
print(pca.explained_variance_ratio_)
print(pca.singular_values_)
plt.figure()
plt.bar(fn[0:34],pca.explained_variance_ratio_)
plt.show()
plt.bar(fn[0:34],pca.singular_values_)
plt.show()
"""