## Using multiple strides in treadmill-acquired gait data for Multiple Sclerosis prediction 
### Size-N normalizing raw treadmill features (COPX, COPY, ForceZ and belt speed) for each stride

Size N normalization should work. COP should be normalized by length (height), speed by (leg length or similar as gait speed), and force by body weight. However regress N does not make sense for COP metrics in raw data, as there is not a typical COP position expected.
If raw COP data is used for a regression, then the dominant source of variance is the position of the person on treadmill, which would not be expected to provide meaningful information. COP data would need to have the mean subtracted to be comparable across strides, as in butterfly plot features.

In [45]:
import numpy as np
import pandas as pd
import math
import os
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
import seaborn as sns

In [46]:
#Path to raw grouped strides for the raw treadmill 4 features 
path_to_raw_grouped_strides = 'C:\\Users\\Rachneet Kaur\\Box\\GaitLSTMproject\\raw_treadmill_features\\grouped_5strides\\'
grouped_labels = pd.read_csv(path_to_raw_grouped_strides + '..\\grouped_labels.csv', index_col = 0)
grouped_labels.head()

path_to_sizeN_grouped_strides = 'C:\\Users\\Rachneet Kaur\\Box\\GaitLSTMproject\\raw_treadmill_features\\sizeN_grouped_5strides\\'

In [47]:
#Reading the demographics of the subjects
demographies = pd.read_csv('C:\\Users\\Rachneet Kaur\\Box\\GAIT\\sample_data\\demographics.csv')

#Keeping demographics of only the 35 subjects we have the raw data for 
demographies = demographies[demographies['subject ID'].isin(grouped_labels['PID'].unique())]

#Attaching the height, body mass and shoe size columns 
#Make sure the units match so that the final quantities are Dimension-less
demographies = demographies[['subject ID', 'height (m)', 'weight (kg)', 'shoe size (mm)']]
demographies.reset_index(inplace =True, drop = True)
demographies.head()

Unnamed: 0,subject ID,height (m),weight (kg),shoe size (mm)
0,200,1.6,76.1,251
1,201,1.72,97.8,260
2,202,1.651,56.1,245
3,203,1.69,72.1,254
4,204,1.93,80.0,286


In [48]:
#Attaching the height, weight and show size to the corresponding subjects 
grouped_labels['height'] = grouped_labels['PID'].map(demographies.set_index('subject ID')['height (m)'])
grouped_labels['weight'] = grouped_labels['PID'].map(demographies.set_index('subject ID')['weight (kg)'])
#Setting file name as the index 
grouped_labels.set_index('FileName', inplace= True)

In [None]:
g = 9.81 #Acceleration of gravity 

#Creating the new dimensionless scaled dataframe 
for raw_grouped_file in os.listdir(path_to_raw_grouped_strides):
    raw_file = pd.read_csv(path_to_raw_grouped_strides + raw_grouped_file, index_col = 0)
#     display (raw_file.head()) 
    
    #Size-N normalization 
    #Speed
    raw_file['Speed'] = raw_file['Speed']/np.sqrt(g*grouped_labels.loc[raw_grouped_file]['height'])  
    #Forces = Forces/(weight*g)
    raw_file['TreadMill_FZ'] = raw_file['TreadMill_FZ']/(grouped_labels.loc[raw_grouped_file]['weight']*g)
    #Normalize COPX, COPY by height 
    raw_file['COPX'] = raw_file['COPX']/grouped_labels.loc[raw_grouped_file]['height']
    raw_file['COPY'] = raw_file['COPY']/grouped_labels.loc[raw_grouped_file]['height']  
#     display(raw_file.head())
    #Saving the size-N normalized files 
    raw_file.to_csv(path_to_sizeN_grouped_strides + raw_grouped_file)