<a href="https://colab.research.google.com/github/matthewhawksby/MatthewHawksbyGithub/blob/main/Copy_of_Week_5_Activity_Knocking_and_Waving_DTW.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

*Credit: This activity was prepared by Taher Ahmadi (former Rosie Lab M.Sc.) and adapted by Dr. Lim*

## Pre-Processing Functions

The code below can be used to perform pre-processing on your gesture data from the Week 3 Activity. While you can analyze your raw gesture data, some pre-processing (or "cleaning") of the data can help you obtain better results.

Play around with the different methods of pre-processing and see how it affects your results.

In [None]:
import os
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from scipy import signal

def date_time_to_elapsed_time(result):
    result['time'] = pd.to_datetime(result['time'])
    position = result.columns.get_loc('time')
    result['time'] =  (result.iloc[1:, position] - result.iat[0, position]).dt.total_seconds()
    result['time'].fillna(0, inplace=True)
    return result

def preprocess(df):
    result = df.copy()

    ## converting date_time column to float time elpsed
    if result['time'].dtype == object:
        result = date_time_to_elapsed_time(result)

    ## Options to try as pre-processing: Use alt+/ to block un-comment / comment
    for feature_name in df.columns:
        ## min-max normalization
        # max_value = result[feature_name].max()
        # min_value = result[feature_name].min()
        # result[feature_name] = (result[feature_name]) / (max_value - min_value)

        ## cropping first and last 20% of the signal
        # cropping_precentage = int(len(result[feature_name])*(2/10))
        # result[feature_name] = result[feature_name][cropping_precentage:-cropping_precentage]

        ## interpolate missing values
        # result[feature_name].interpolate(method='linear', inplace=True)

        ## or simply dropp missing values
        # result[feature_name].dropna(inplace=True)
        pass

    ## re-sampling
    ## if you want to compare signals of different length you can do this step
    # resampled =  signal.resample(result, 100)
    # result = pd.DataFrame(resampled, columns=result.columns)

    ## setting 'time' column as index
    result.set_index('time', inplace=True)

    return result

# Loading the Dataset

Download [gesture_dataset.zip] from Canvas and extract the data folder into the current directory. The dataset contains the class' gestures as follows:

./turtle

./woof

... etc.
where each person's folder contains their generated .csv files (knock_a.csv, knock_b.csv... wave_a.csv, wave_b.csv..)

*Note: Not all students' gesture data was included in the dataset.*

In [None]:
## to unzip uploaded data.zip in the colab directory use following command:
!unzip gesture_dataset.zip

In [None]:
from glob import glob
# Download data.zip from canvas and extract data folder in the current directory
# The file hierarchy is as follows:
#  ./turtle
#  ./woof
# ...
# including recording files of each student named as following:
# ['knock_a.csv', 'knock_b.csv', 'knock_c.csv', 'knock_d.csv',
#  'wave_a.csv', 'wave_b.csv', 'wave_c.csv', 'wave_d.csv']

records = glob("*/")

# The following code reads the data into a pandas dataframe
df = []
for i in range(len(records)):
    knocks = {}
    waves = {}
    print(i, records[i])
    for ch in ['a', 'b', 'c', 'd']:
        try:
            knocks[ch] = pd.read_csv(records[i]+'knock_'+ch+'.csv')
            knocks[ch] = preprocess(knocks[ch])

            waves[ch] = pd.read_csv(records[i]+'wave_'+ch+'.csv')
            waves[ch] = preprocess(waves[ch])
        except:
            print("Failed to read folder ", records[i])
    df.append({'knock':knocks,
               'wave': waves})

# Visualizing the Data

Play around with the data (set the user folder by modifying i).

In [None]:
plt.figure()
fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(18, 16), dpi= 80,)
plt.subplots_adjust(wspace=0.1, hspace=0.5)

# Plotting 8th data record as an example
i = 8
print(records[i])

# df[i] is the ith person's data
df[i]['knock']['a'].plot(ax=axes[0, 0])
axes[0, 0].set_title('knock A');
df[i]['knock']['b'].plot(ax=axes[1, 0])
axes[1, 0].set_title('knock B');
df[i]['knock']['c'].plot(ax=axes[2, 0])
axes[2, 0].set_title('knock C');
df[i]['knock']['d'].plot(ax=axes[3, 0])
axes[3, 0].set_title('knock D');

df[i]['wave']['a'].plot(ax=axes[0, 1])
axes[0, 1].set_title('wave A');
df[i]['wave']['b'].plot(ax=axes[1, 1])
axes[1, 1].set_title('wave B');
df[i]['wave']['c'].plot(ax=axes[2, 1])
axes[2, 1].set_title('wave C');
df[i]['wave']['d'].plot(ax=axes[3, 1])
axes[3, 1].set_title('wave D');

## Dynamic Time Warping

Now that you've loaded the data into a dataframe, compute the DTW distance between various data samples. Use this code to fill in the confusion matrix as described in the Canvas activity.

In [None]:
## you may need to run following command (by uncommenting):
#!pip install dtw

from dtw import dtw,accelerated_dtw

#compute DTW distance for person i knocking with audio clip A and B
d1 = df[0]['knock']['a'].interpolate().values
d2 = df[0]['knock']['b'].interpolate().values
d, cost_matrix, acc_cost_matrix, path = accelerated_dtw(d1,d2, dist='euclidean')

plt.imshow(acc_cost_matrix.T, origin='lower', cmap='gray', interpolation='nearest')
plt.plot(path[0], path[1], 'w')
plt.xlabel('person1 knock with clip A')
plt.ylabel('person1 knock with clip b')
plt.title(f'DTW Minimum Path with minimum distance: {np.round(d,2)}')
plt.show()

In [None]:
list1 = [(i, j) for i in ['knock', 'wave'] for j in ['a','b','c','d']]
list2 = [(i, j) for i in ['knock', 'wave'] for j in ['a','b','c','d']]

# comparison of DTW for different combinations of acting with different audio clips of one person
# you can compare two different person's data for this part, but you have to be careful with the length of data
for (signal1, signal2) in [(i, j) for i in list1 for j in list2]:
    print(signal1, signal2)
    d1 = df[0][signal1[0]][signal1[1]].interpolate().values
    d2 = df[0][signal2[0]][signal2[1]].interpolate().values
    d, cost_matrix, acc_cost_matrix, path = accelerated_dtw(d1,d2, dist='euclidean')

    plt.imshow(acc_cost_matrix.T, origin='lower', cmap='gray', interpolation='nearest')
    plt.plot(path[0], path[1], 'w')
    plt.xlabel('person1 '+str(signal1))
    plt.ylabel('person1 '+str(signal2))
    plt.title(f'DTW Minimum Path with minimum distance: {np.round(d,2)}')
    plt.show()
