<a href="https://colab.research.google.com/github/rpandya5/gaitanalysis/blob/main/Window_Sampling_and_Feature_Extraction_%2B_CNN_Layers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Data Processing and Feature Extraction

The following Data Processing has been done to the each data sample csv file:

1. Window Sampling: A 15 second data sampled has been window sampled such that the window sizes are 2 seconds (400 samples @ 200 Hz) with a 1 second overlap (200 samples @ 200 Hz). This has been chosen carefully based on previous work such that the windows can capture temporal dependencies easily. (done by *window_sample_main()*)

Note: Since different activities have different timelength, we've employed the following techniques to adjust the time lengths before window sampling:

  - Fall Activities: Fall samples have not been processed in any way. 15 seconds has been chosen as the benchmark time length for all samples. Some Fall activities have one sample less (2999 instead of 3000 samples) so we have copied the last reading again at the end to keep the sample size consistent. This does not affect the data as most falls occur in the first 10 seconds and the last 5 seconds are mostly irrelevant data (check EDA for more information) (done by *window_sample_fall()*)

  - ADLs (1-4): These are activities such as Jogging and Walking. Since these are continuous activities (look at EDA analysis), we can easily separate these activities of time lengths of 100 seconds into five 15 second samples and process them seperately. This also helps to increase the number of such samples as each subject only does one trial for these activities.
  (done by *window_sample_adl_3()*)
  - ADLs (5-6): These activities are 25 seconds in length and contain activities going up and down the stairs. To capture movement going up and down the stairs seperately, we have sampled the data such that the first 15 seconds capture one movement and the last 15 seconds capture the other movement. Thus the 25 seconds are thus split into two 15 second such samples. Since the seperation between the two movements occurs at the half mark for most samples, we don't change the data in any way and moreover introduce noise to add to the model's robustness.
  (done by *window_sample_adl_2()*)

  - ADLs (7-19): These are other ADL activities of 12 seconds in length. For such activities, most of the important activity capturing data occurs at the first 10 seconds (check EDA) and the last few seconds only contain irrelevant data where the person is still/doing no activity. Thus, we increase the length of these samples to 15 seconds and use forward fill to fill in the NaN values in the end. This way, we don't add any new data to change the class/activity type but just increase the irrelevant data samples at the end to match the 15 second processing. (done by *window_sample_adl_1()*)

  2. Feature Extraction: Since it is not possible to process sensor data at 200Hz, we use window sampling to down sample our data. However, to ensure we do not lose any susbtantial information, we extract graph, time-domain and frequency-domain features for each window sample. This ensures greater insight into our data for our model to enhance its processing power and also not requiring significant computational resources. The features we have extracted are referenced from the UCI HAR Dataset (https://archive.ics.uci.edu/dataset/240/human+activity+recognition+using+smartphones):

    1. Mean- Computes average of the window
    2. Median- Computes the middle value of the window data
    3. Max- Computes the max value in the window
    4. Min- Computes the min value in the window
    5. Standard Deviation- Computes the amount of variation in the window data by giving the SD
    6. IQR- Gives the spread of the middle 50% of the data in the window
    7. Jerk- Gives the average jerk (rate of change) of the data in the window
    8. Kurtosis- Measures the taildness of the distribution of the window data
    9. Energy- Gives the average absolute (squared) value of the data in the window
    10. Covariance- Calculates the covariance between x and y data in the window

  Note: For each of the x, y, and z components of both the accelerometer and gyroscope data, we extract the above features

  3. The last step processes all files in the "Preprocessed Folder" through the window sampling above and saves it in the "Processed Folder" at drive. This contains different csv files (for different activities with the naming convention preserved) with the data window sampled and features extracted. It also returns two variables- data (list of all dataframes) and labels (1 for Fall and 0 for ADL) for further reshaping and normalization as needed.

  **Please Note:** We have used Sensor 1 (ADXL345) for training

In [None]:
#Mounting google drive to access the preprocessed dataset
from google.colab import drive

drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
#Creating the required sub-directories in the content runtime
import os
os.chdir('/content/gdrive/MyDrive')
os.mkdir('Processed Data')
os.chdir('Processed Data')
os.mkdir('FALL')
os.mkdir('NO FALL')

In [None]:
#Importing libraries
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

class DataProcessing:
  def __init__(self, input_path, window_size = 400, window_overlap = 200):
    self.input_path = input_path #Input path of the Preprocessed data folder

    #Features we extract from the windows
    self.features = ['Window', 'X_Acc_Mean', 'X_Acc_Max', 'X_Acc_Std', 'X_Acc_Min',
                     'X_Acc_IQR', 'X_Acc_Skew', 'X_Acc_Kurtosis', 'X_Acc_Energy', 'X_Acc_Jerk',
                     'Y_Acc_Mean', 'Y_Acc_Max', 'Y_Acc_Std', 'Y_Acc_Min',
                     'Y_Acc_IQR', 'Y_Acc_Skew', 'Y_Acc_Kurtosis', 'Y_Acc_Energy', 'Y_Acc_Jerk',
                     'Z_Acc_Mean', 'Z_Acc_Max', 'Z_Acc_Std', 'Z_Acc_Min',
                     'Z_Acc_IQR', 'Z_Acc_Skew', 'Z_Acc_Kurtosis', 'Z_Acc_Energy', 'Z_Acc_Jerk',
                     'XY_Acc_Cov', 'X_Gyro_Mean', 'X_Gyro_Max', 'X_Gyro_Std', 'X_Gyro_Min',
                     'X_Gyro_IQR', 'X_Gyro_Skew', 'X_Gyro_Kurtosis', 'X_Gyro_Energy', 'X_Gyro_Jerk',
                     'Y_Gyro_Mean', 'Y_Gyro_Max', 'Y_Gyro_Std', 'Y_Gyro_Min',
                     'Y_Gyro_IQR', 'Y_Gyro_Skew', 'Y_Gyro_Kurtosis', 'Y_Gyro_Energy', 'Y_Gyro_Jerk',
                     'Z_Acc_Mean', 'Z_Gyro_Max', 'Z_Gyro_Std', 'Z_Gyro_Min',
                     'Z_Gyro_IQR', 'Z_Gyro_Skew', 'Z_Gyro_Kurtosis', 'Z_Gyro_Energy', 'Z_Gyro_Jerk',
                     'XY_Gyro_Cov']

    self.window_size = 400 # window size is 2 seconds
    self.window_overlap = 200 # Overlap is 1 second

  #Feature Extraction Done by Anya and Nick
  def get_cov(self, x_list_1, x_bar ,y_list, y_bar):
    """ Calculates the covariance of x and y in the selected window.

  Parameters:
  -----------
      x_list_1 : list
      x_bar :
        Average of x list
      y_list : list
      y_bar : float
        Average of y list

  Returns:
  --------
      covariace of x and y : float

  """
    cov = []
    for i in range(len(x_list_1)):
        cov.append((x_list_1[i] - x_bar) * (y_list[i] - y_bar))
    return sum(cov) / len(x_list_1) - 1

  def jerk(self, x_list, t_step):
    """ Calculated jerk, or the rate of change of the acceleration.

    Parameters:
    -----------
    x_list : list
    t_step : float

    Returns:
    --------
    jerk_list : list

    """
    jerk_list = []
    for i in range(len(x_list)-1):
        jerk_list.append((x_list[i+1] - x_list[i]) / t_step)

    jerk_list.append(jerk_list[-1])
    return sum(jerk_list) / len(jerk_list)

  def energy(self, x_list):
    """ Avergaged energy of the corresponding window

    Parameters:
    -----------
    x_list : list

    Returns:
    --------
    energy : float

    """
    sum = 0
    for i in range(len(x_list)):
        sum += x_list[i]**2
    return sum / len(x_list)

  #Not sure if we need the butterfield filter so I commented it out in the function code""
  #Butter Low Pass Filter to remove high frequency noise
  def butter_lowpass_filter(self, data, cutoff, fs, order=5):
    nyq = 0.5 * fs
    normal_cutoff = cutoff / nyq
    b, a = butter(order, normal_cutoff, btype='low', analog=False)
    y = filtfilt(b, a, data)
    return y

  #Returns the mean of the specified column
  def getmean(self,df, col):
    """if col in (0,1,2,6,7,8):
      x = butter_lowpass_filter(df[col], 10, 100)
    else:"""
    x = df[col]
    return x.mean()

  #Returns Standard Deviation of specified column
  def getstd(self,df, col):
    """if col in (0,1,2,6,7,8):
      x = butter_lowpass_filter(df[col], 10, 100)
    else:"""
    x = df[col]
    return x.std()

  #Returns the minimum of the specific column
  def getmin(self, df, col):
    """if col in (0,1,2,6,7,8):
      x = butter_lowpass_filter(df[col], 10, 100)
    else:"""
    x = df[col]
    return x.min()

  #Returns the maximum of the specific column
  def getmax(self, df, col):
    """if col in (0,1,2,6,7,8):
      x = butter_lowpass_filter(df[col], 10, 100)
    else:"""
    x = df[col]
    return x.max()

  #Returns the median of the specific column
  def getmedian(self, df, col):
    """if col in (0,1,2,6,7,8):
      x = butter_lowpass_filter(df[col], 10, 100)
    else:"""
    x = df[col]
    return x.median()

  #Returns the Skewness of the specific column
  def getskewness(self, df, col):
    """if col in (0,1,2,6,7,8):
      x = butter_lowpass_filter(df[col], 10, 100)
    else:"""
    x = df[col]
    return x.skew()

  #Returns the Kurtosis of the specific column
  def getkurtosis(self, df, col):
    """ if col in (0,1,2,6,7,8):
      x = butter_lowpass_filter(df[col], 10, 100)
    else:"""
    x = df[col]
    return x.kurtosis()

  #Returns Inter-quartile range of specified column
  def getiqr(self, df, col):
    """ if col in (0,1,2,6,7,8):
      x = butter_lowpass_filter(df[col], 10, 100)
    else:"""
    x = df[col]
    return x.quantile(0.75) - x.quantile(0.25)

  #Window Sampling done by Krish
  def window_sample_main(self, file_df):
    """
    Description: Converts sensor readings of 15 seconds (200 Hz) into windows

    Inputs:   dir- path to the directory where the file is stored
              file- Name of the file with the sensor data

    Outputs:  DataFrame with the windows and features extracted
    """

    #Window size for activities is 2 seconds with a one second overlap
    window_x = 1
    new_df = []
    for i in range(0, 3000 - self.window_overlap, self.window_overlap):
      window = file_df.iloc[i:i+self.window_size,:]
      sampled = []

      #Calling all functions for the specific window
      sampled.append(window_x)
      sampled.append(self.getmean(window, 'X_Acc_1'))
      sampled.append(self.getmax(window, 'X_Acc_1'))
      sampled.append(self.getstd(window, 'X_Acc_1'))
      sampled.append(self.getmin(window, 'X_Acc_1'))
      sampled.append(self.getiqr(window, 'X_Acc_1'))
      sampled.append(self.getskewness(window, 'X_Acc_1'))
      sampled.append(self.getkurtosis(window, 'X_Acc_1'))
      sampled.append(self.energy(window['X_Acc_1'].tolist()))
      sampled.append(self.jerk(window['X_Acc_1'].tolist(), 0.005))

      sampled.append(self.getmean(window, 'Y_Acc_1'))
      sampled.append(self.getmax(window, 'Y_Acc_1'))
      sampled.append(self.getstd(window, 'Y_Acc_1'))
      sampled.append(self.getmin(window, 'Y_Acc_1'))
      sampled.append(self.getiqr(window, 'Y_Acc_1'))
      sampled.append(self.getskewness(window, 'Y_Acc_1'))
      sampled.append(self.getkurtosis(window, 'Y_Acc_1'))
      sampled.append(self.energy(window['Y_Acc_1'].tolist()))
      sampled.append(self.jerk(window['Y_Acc_1'].tolist(), 0.005))

      sampled.append(self.getmean(window, 'Z_Acc_1'))
      sampled.append(self.getmax(window, 'Z_Acc_1'))
      sampled.append(self.getstd(window, 'Z_Acc_1'))
      sampled.append(self.getmin(window, 'Z_Acc_1'))
      sampled.append(self.getiqr(window, 'Z_Acc_1'))
      sampled.append(self.getskewness(window, 'Z_Acc_1'))
      sampled.append(self.getkurtosis(window, 'Z_Acc_1'))
      sampled.append(self.energy(window['Z_Acc_1'].tolist()))
      sampled.append(self.jerk(window['Z_Acc_1'].tolist(), 0.005))
      sampled.append(self.get_cov(window['X_Acc_1'].tolist(), self.getmean(window, 'X_Acc_1'), window['Y_Acc_1'].tolist(), self.getmean(window, 'Y_Acc_1'), ))


      sampled.append(self.getmean(window, 'X_Gyro'))
      sampled.append(self.getmax(window, 'X_Gyro'))
      sampled.append(self.getstd(window, 'X_Gyro'))
      sampled.append(self.getmin(window, 'X_Gyro'))
      sampled.append(self.getiqr(window, 'X_Gyro'))
      sampled.append(self.getskewness(window, 'X_Gyro'))
      sampled.append(self.getkurtosis(window, 'X_Gyro'))
      sampled.append(self.energy(window['X_Gyro'].tolist()))
      sampled.append(self.jerk(window['X_Gyro'].tolist(), 0.005))

      sampled.append(self.getmean(window, 'Y_Gyro'))
      sampled.append(self.getmax(window, 'Y_Gyro'))
      sampled.append(self.getstd(window, 'Y_Gyro'))
      sampled.append(self.getmin(window, 'Y_Gyro'))
      sampled.append(self.getiqr(window, 'Y_Gyro'))
      sampled.append(self.getskewness(window, 'Y_Gyro'))
      sampled.append(self.getkurtosis(window, 'Y_Gyro'))
      sampled.append(self.energy(window['Y_Gyro'].tolist()))
      sampled.append(self.jerk(window['Y_Gyro'].tolist(), 0.005))

      sampled.append(self.getmean(window, 'Z_Gyro'))
      sampled.append(self.getmax(window, 'Z_Gyro'))
      sampled.append(self.getstd(window, 'Z_Gyro'))
      sampled.append(self.getmin(window, 'Z_Gyro'))
      sampled.append(self.getiqr(window, 'Z_Gyro'))
      sampled.append(self.getskewness(window, 'Z_Gyro'))
      sampled.append(self.getkurtosis(window, 'Z_Gyro'))
      sampled.append(self.energy(window['Z_Gyro'].tolist()))
      sampled.append(self.jerk(window['Z_Gyro'].tolist(), 0.005))
      sampled.append(self.get_cov(window['X_Gyro'].tolist(), self.getmean(window, 'X_Gyro'), window['Y_Gyro'].tolist(), self.getmean(window, 'Y_Gyro'), ))

      new_df.append(sampled)
      window_x += 1

    #Returns the created DataFrame
    return pd.DataFrame(new_df, columns=self.features)

  def window_sample_fall(self, dir, file):
    """
    Processes files containing fall data (adjusts file size as needed for
    fall samples specifically and passes to main window sampler)

    Inputs:  dir- path to the directory where the file is stored
             file- Name of the file to be processed

    Outputs: DataFrame with windows and features extracted
    """

    file_df = pd.read_csv(os.path.join(dir, file))
    new_df = []

    #Adjusting the file sizes for some cases by adding a new row or deleting the last row
    timelen = 3000
    if len(file_df) - timelen == 1:
      file_df.drop(file_df.index[-1])
    elif timelen - len(file_df) == 1:
      file_df.loc[len(file_df)] = file_df.iloc[-1]
      file_df.iloc[-1, 0] += 0.005
    return self.window_sample_main(file_df)

  def window_sample_adl_1(self, dir, file):
    """
    Processes files containing ADL data (adjusts file size as needed for
    ADL samples D07 to D19 specifically and passes to main window sampler)

    Inputs:  dir- path to the directory where the file is stored
             file- Name of the file to be processed

    Outputs: DataFrame with windows and features extracted
    """

    file_df = pd.read_csv(os.path.join(dir, file))
    new_df = []

    #Increasing the file_length using forward fill
    timelen = 3000
    new_rows = pd.DataFrame(np.nan, index=range(0, timelen-len(file_df)), columns=file_df.columns)
    new_rows['Timestamp'] = [(file_df.loc[len(file_df)-1, 'Timestamp'] + 0.005 * i) for i in range(1, timelen-len(file_df)+1)]
    file_df = pd.concat([file_df, new_rows], ignore_index=True)
    file_df.ffill(inplace=True)
    return self.window_sample_main(file_df)

  def window_sample_adl_2(self, dir, file):
    """
    Processes files containing ADL data (adjusts file size as needed for
    ADL samples D05 and D06 specifically and passes to main window sampler)

    Inputs:  dir- path to the directory where the file is stored
             file- Name of the file to be processed

    Outputs: DataFrame with windows and features extracted
    """
    file_df = pd.read_csv(os.path.join(dir, file))

    #Adjusting the file sizes for some cases by adding a new row or deleting the last row
    timelen = 5000
    if len(file_df) - timelen == 1:
      file_df.drop(file_df.index[-1])
    elif timelen - len(file_df) == 1:
      file_df.loc[len(file_df)] = file_df.iloc[-1]
      file_df.iloc[-1, 0] += 0.005

    #Creating 2 samples of the first 15 seconds for first window (to capture going upstairs) and last 15 seconds for second window (to capture going downstairs)
    sample_1 = file_df.iloc[:3000,:]
    sample_2 = file_df.iloc[-3000:,:]

    return self.window_sample_main(sample_1), self.window_sample_main(sample_2)

  def window_sample_adl_3(self, dir, file):
    """
    Processes files containing ADL data (adjusts file size as needed for
    ADL samples D01 to D04 specifically and passes to main window sampler)

    Inputs:  dir- path to the directory where the file is stored
             file- Name of the file to be processed

    Outputs: DataFrame with windows and features extracted
    """

    #Splits the continuous activities into 5 samples of 15 seconds each
    file_df = pd.read_csv(os.path.join(dir, file))
    sample_1 = file_df.iloc[:3000,:]
    sample_2 = file_df.iloc[3000:6000,:]
    sample_3 = file_df.iloc[6000:9000,:]
    sample_4 = file_df.iloc[9000:12000,:]
    sample_5 = file_df.iloc[12000:15000,:]

    sample_1 = self.window_sample_main(sample_1)
    sample_2 = self.window_sample_main(sample_2)
    sample_3 = self.window_sample_main(sample_3)
    sample_4 = self.window_sample_main(sample_4)
    sample_5 = self.window_sample_main(sample_5)

    return sample_1, sample_2, sample_3, sample_4, sample_5

  #Creates window samples for each of the files and saves it in a new directory
  #Also returns the list of dataframes and labels to work with
  #Labels are 1 for Fall and 0 for No Fall
  def window_sampling_all_files(self):
    labels = []
    data = []
    for dir, _, files in os.walk(self.input_path):
      for file in files:
        print(file)
        if file[0] == 'F':
          labels.append(1)
          file_df = self.window_sample_fall(dir, file)
          data.append(file_df)
          file_df.to_csv(os.path.join('/content/gdrive/MyDrive/Processed Data/FALL', file))
        elif file[:3] == 'D01' or file[:3] == 'D02' or file[:3] == 'D03' or file[:3] == 'D04':
          file_df1, file_df2, file_df3, file_df4, file_df5 = self.window_sample_adl_3(dir, file)
          labels.append(0)
          labels.append(0)
          labels.append(0)
          labels.append(0)
          labels.append(0)
          data.append(file_df1)
          data.append(file_df2)
          data.append(file_df3)
          data.append(file_df4)
          data.append(file_df5)
          file_df1.to_csv(os.path.join('/content/gdrive/MyDrive/Processed Data/NO FALL', file[:-4]+'[1]'+'.csv'))
          file_df2.to_csv(os.path.join('/content/gdrive/MyDrive/Processed Data/NO FALL', file[:-4]+'[2]'+'.csv'))
          file_df3.to_csv(os.path.join('/content/gdrive/MyDrive/Processed Data/NO FALL', file[:-4]+'[3]'+'.csv'))
          file_df4.to_csv(os.path.join('/content/gdrive/MyDrive/Processed Data/NO FALL', file[:-4]+'[4]'+'.csv'))
          file_df5.to_csv(os.path.join('/content/gdrive/MyDrive/Processed Data/NO FALL', file[:-4]+'[5]'+'.csv'))
        elif file[:3] == 'D05' or file[:3] == 'D06':
          file_df1, file_df2 = self.window_sample_adl_2(dir, file)
          labels.append(0)
          labels.append(0)
          data.append(file_df1)
          data.append(file_df2)
          file_df1.to_csv(os.path.join('/content/gdrive/MyDrive/Processed Data/NO FALL', file[:-4]+'[1]'+'.csv'))
          file_df2.to_csv(os.path.join('/content/gdrive/MyDrive/Processed Data/NO FALL', file[:-4]+'[2]'+'.csv'))
        else:
          labels.append(0)
          file_df = self.window_sample_adl_1(dir, file)
          data.append(file_df)
          file_df.to_csv(os.path.join('/content/gdrive/MyDrive/Processed Data/NO FALL', file))
    return data, labels


In [None]:
Sampler = DataProcessing('/content/gdrive/MyDrive/Preprocessed Data') #Create object of the class

In [None]:
data, labels = Sampler.window_sampling_all_files()
#data is a list of dataframes
#labels is a list of the labels (1 for Fall and 0 for ADL)

F02_SA14_R05.csv
F13_SA14_R01.csv
F03_SA14_R01.csv
F13_SA14_R03.csv
F13_SA14_R02.csv
F03_SA14_R02.csv
F13_SA14_R04.csv
F13_SA14_R05.csv
F03_SA14_R03.csv
F03_SA14_R04.csv
F14_SA14_R01.csv
F03_SA14_R05.csv
F14_SA14_R02.csv
F04_SA14_R01.csv
F04_SA14_R02.csv
F14_SA14_R03.csv
F14_SA14_R05.csv
F08_SA14_R02.csv
F14_SA14_R04.csv
F04_SA14_R03.csv
F08_SA14_R01.csv
F15_SA14_R03.csv
F15_SA14_R01.csv
F08_SA14_R03.csv
F15_SA14_R04.csv
F15_SA14_R05.csv
F15_SA14_R02.csv
F06_SA15_R01.csv
F08_SA15_R01.csv
F06_SA15_R02.csv
F01_SA15_R02.csv
F01_SA15_R01.csv
F11_SA15_R01.csv
F08_SA15_R02.csv
F06_SA15_R03.csv
F01_SA15_R03.csv
F08_SA15_R03.csv
F06_SA15_R04.csv
F01_SA15_R05.csv
F06_SA15_R05.csv
F11_SA15_R02.csv
F01_SA15_R04.csv
F08_SA15_R04.csv
F07_SA15_R01.csv
F11_SA15_R03.csv
F02_SA15_R01.csv
F08_SA15_R05.csv
F11_SA15_R04.csv
F07_SA15_R02.csv
F02_SA15_R02.csv
F07_SA15_R03.csv
F09_SA15_R01.csv
F11_SA15_R05.csv
F07_SA15_R04.csv
F02_SA15_R03.csv
F09_SA15_R02.csv
F02_SA15_R04.csv
F12_SA15_R01.csv
F07_SA15_R05.c

In [None]:
#Creating the DataFrame for SVM
cols_1 = ['X_Acc_Mean_1', 'X_Acc_Max_1', 'X_Acc_Std_1', 'X_Acc_Min_1',
        'X_Acc_IQR_1', 'X_Acc_Skew_1', 'X_Acc_Kurtosis_1', 'X_Acc_Energy_1', 'X_Acc_Jerk_1',
        'Y_Acc_Mean_1', 'Y_Acc_Max_1', 'Y_Acc_Std_1', 'Y_Acc_Min_1',
        'Y_Acc_IQR_1', 'Y_Acc_Skew_1', 'Y_Acc_Kurtosis_1', 'Y_Acc_Energy_1', 'Y_Acc_Jerk_1',
        'Z_Acc_Mean_1', 'Z_Acc_Max_1', 'Z_Acc_Std_1', 'Z_Acc_Min_1',
        'Z_Acc_IQR_1', 'Z_Acc_Skew_1', 'Z_Acc_Kurtosis_1', 'Z_Acc_Energy_1', 'Z_Acc_Jerk_1',
        'XY_Acc_Cov_1', 'X_Gyro_Mean_1', 'X_Gyro_Max_1', 'X_Gyro_Std_1', 'X_Gyro_Min_1',
        'X_Gyro_IQR_1', 'X_Gyro_Skew_1', 'X_Gyro_Kurtosis_1', 'X_Gyro_Energy_1', 'X_Gyro_Jerk_1',
        'Y_Gyro_Mean_1', 'Y_Gyro_Max_1', 'Y_Gyro_Std_1', 'Y_Gyro_Min_1',
        'Y_Gyro_IQR_1', 'Y_Gyro_Skew_1', 'Y_Gyro_Kurtosis_1', 'Y_Gyro_Energy_1', 'Y_Gyro_Jerk_1',
        'Z_Acc_Mean_1', 'Z_Gyro_Max_1', 'Z_Gyro_Std_1', 'Z_Gyro_Min_1',
        'Z_Gyro_IQR_1', 'Z_Gyro_Skew_1', 'Z_Gyro_Kurtosis_1', 'Z_Gyro_Energy_1', 'Z_Gyro_Jerk_1',
        'XY_Gyro_Cov_1']

cols_2 = [(s[:-2]+'_2') for s in cols_1]
cols_3 = [(s[:-2]+'_3') for s in cols_1]
cols_4 = [(s[:-2]+'_4') for s in cols_1]
cols_5 = [(s[:-2]+'_5') for s in cols_1]
cols_6 = [(s[:-2]+'_6') for s in cols_1]
cols_7 = [(s[:-2]+'_7') for s in cols_1]
cols_8 = [(s[:-2]+'_8') for s in cols_1]
cols_9 = [(s[:-2]+'_9') for s in cols_1]
cols_10 = [(s[:-2]+'_10') for s in cols_1]
cols_11 = [(s[:-2]+'_11') for s in cols_1]
cols_12 = [(s[:-2]+'_12') for s in cols_1]
cols_13 = [(s[:-2]+'_13') for s in cols_1]
cols_14 = [(s[:-2]+'_14') for s in cols_1]
cols = cols_1 + cols_2 + cols_3 + cols_4 + cols_5 + cols_6 + cols_7 + cols_8 + cols_9 + cols_10 + cols_11 + cols_12 + cols_13 + cols_14

svm_data = []
sample = 0
for df in data:
  df = df.drop('Window', axis=1)
  df_np = np.array(df)
  df_np = df_np.flatten()
  df_np = np.append(df_np, labels[sample])
  svm_data.append(df_np)

svm_data = np.array(svm_data)
svm_df = pd.DataFrame(svm_data, columns=cols)
os.chdir('/content/gdrive/MyDrive/')
svm_df.to_csv('/content/gdrive/MyDrive/SVM Data.csv')

#Since SVMs train on only 2D Data, I have flattened the samples such that each row contains the
#features for all windows placed one after another (everything else remains same!)

In [None]:
#Creating the Data for CNN-LSTM Model in a (num_samples, windows, features) format
final_data_2 = []
for df in data:
  df = df.drop('Window', axis='columns')
  final_data_2.append(np.array(df))
final_data_2 = np.array(final_data_2)
labels = np.array(labels)
os.chdir('/content/gdrive/MyDrive/')
np.save('CNN-LSTM Data.npy', final_data_2)
np.save('CNN-LSTM Labels.npy', labels)

Normalization, Data Split

In [None]:
# CNN-LSTM PREPARATION

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

#load data
data = np.load('CNN-LSTM Data.npy')
labels = np.load('CNN-LSTM Labels.npy')

#print shape of existing data
print("Current data shape: ", data.shape)
print("Current labels shape: ", labels.shape)

Current data shape:  (5408, 14, 56)
Current labels shape:  (5408,)


In [None]:
#reshape data
num_samples, num_windows, num_features = data.shape
time_steps = num_windows
data_reshaped = data.reshape((num_samples, time_steps, num_features))

print("Reshaped data shape: ", data_reshaped.shape)

#normalize data across all time
scaler = StandardScaler()
data_normalized = np.zeros_like(data_reshaped)

for feature in range(num_features):
  # to 2d
  feature_data = data_reshaped[:, :, feature].reshape(-1, 1)

  # fit and transform the data
  normalized_feature = scaler.fit_transform(feature_data)

  # back into 3d + store
  data_normalized[:, :, feature] = normalized_feature.reshape(num_samples, time_steps)

In [None]:
#SPLIT INTO TRAINING + VALIDATION SETS

# divide into training+validation and testing sets
X_temp, X_test, y_temp, y_test = train_test_split(data_normalized, labels, test_size=0.2, random_state= 42)

# divide training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42)

print("Training data shape: ", X_train.shape)
print("Validation data shape: ", X_val.shape)
print("Testing data shape: ", X_test.shape)

print("Training label shape: ", y_train.shape)
print("Validation label shape: ", y_val.shape)
print("Testing label shape: ", y_test.shape)

#save all the data
np.save('X_train.npy', X_train)
np.save('X_val.npy', X_val)
np.save('X_test.npy', X_test)

np.save('y_train.npy', y_train)
np.save('y_val.npy', y_val)
np.save('y_test.npy', y_test)

print("All data split + saved")

CNN-LSTM MODEL

In [None]:
import torch
import torch.nn as nn

class Model(nn.Module):
  def __init__(self, input_size):
    super(Model, self).__init__()

    # CNN LAYERS

    self.cnn_layers = nn.Sequential(

        # LAYER 1: 64 filters, kernel size 3, relu, batch norm, maxpooling size 2 (reduce dim)
        nn.Conv1d(in_channels=input_size, out_channels=64, kernel_size=3, padding=1),
        nn.ReLU(),
        nn.BatchNorm1d(64),
        nn.MaxPool1d(kernel_size=2),

        # LAYER 2 128 filters, kernel size 3, relu, batch norm, maxpooling size 2
        nn.Conv1d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
        nn.ReLU(),
        nn.BatchNorm1d(128),
        nn.MaxPool1d(kernel_size=2),

        # LAYER 3 256 filters, kernel size 3, relu, batch norm, maxpooling size 2
        nn.Conv1d(in_channels=128, out_channels=256, kernel_size=3, padding=1),
        nn.ReLU(),
        nn.BatchNorm1d(256),
        nn.MaxPool1d(kernel_size=2),
    )

    # LSTM LAYERS

    # 256 input from cnn, 100 units
    self.lstm_1 = nn.LSTM(input_size=256, hidden_size=100, batch_first=True)

    # 100 input from lstm1, 100 units, return false
    self.lstm_2 = nn.LSTM(input_size=100, hidden_size=100, batch_first=True)

    self.fully_connected = nn.Sequential(

        # First Fully Connected Layer 100 Inputs, 100 Outputs
        nn.Linear(100, 100),
        nn.ReLU,

        # Dropout Layer
        nn.Dropout(0.5),

        # Second Fully Connected Layer, 50 Inputs, 1 Output
        nn.Linear(50, 1),
        #nn.sigmoid = nn.Sigmoid()
    )

  def forward(self, x):
    x = x.permute(0, 2, 1) # change shape to (batch_size, input_channels, time_steps)

    cnn_out = self.cnn_layers(x) #shape is now: (batch_size, 256, time_steps // 8)

    lstm_in = cnn_out.permute(0, 2 , 1) #change to batch_size, time_steps // 8, 256) for lstm inpiut

    lstm_out, _ = self.lstm_1(lstm_in)
    lstm_out, _ = self.lstm_2(lstm_out)

    fully_connected_out = self.fully_connected(cnn_out)

    return lstm_out + fully_connected_out


In [None]:
# convert x_train to PYTORCH TENSOR
X_train_tensor = torch.FloatTensor(X_train)

input_channels = X_train.shape[2] # number of features

model = Model(input_channels) #initialize model

print(model) #print model architecture

Model(
  (cnn_layers): Sequential(
    (0): Conv1d(56, 64, kernel_size=(3,), stride=(1,), padding=(1,))
    (1): ReLU()
    (2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (4): Conv1d(64, 128, kernel_size=(3,), stride=(1,), padding=(1,))
    (5): ReLU()
    (6): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (8): Conv1d(128, 256, kernel_size=(3,), stride=(1,), padding=(1,))
    (9): ReLU()
    (10): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (lstm_1): LSTM(256, 100, batch_first=True)
  (lstm_2): LSTM(100, 100, batch_first=True)
)


In [None]:
#Extraction of D11 Files

sample = DataProcessing('/content/gdrive/MyDrive/Preprocessed Data')
data_d11 = []
for dir, _, files in os.walk('/content/gdrive/MyDrive/Preprocessed Data/NO FALL'):
  for file in files:
    if file[:3] == 'D11':
      file_df = sample.window_sample_adl_1(dir, file)
      file_df = file_df.drop('Window', axis=1)
      data_d11.append(file_df)

In [None]:
data_d11 = np.array(data_d11)
data_d11.shape

(190, 14, 56)

In [None]:
os.chdir('/content/gdrive/MyDrive')
np.save('D11_Data.npy', data_d11)