## Gait Video Study
Calculating the stats for strides in each framework (HOA-BW/W, MS-BW/W, PD-BW/W). This will help write stats for count of strides used in training and testing set of each framework, 1. task generalization a) W to WT, and b) T to TT and 2. subject generalization a) W, b) WT, c) T, and d) TT.

Use the labels.csv file created containing the stats for each subject/trial's strides. 

In [1]:
import numpy as np
import cv2
import os
import glob
import matplotlib.pyplot as plt
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
from IPython.display import display, HTML

In [2]:
#Reading the file with log of each stride 
labels_path = 'C:\\Users\\Rachneet Kaur\\Box\\Gait Video Project\\GaitVideoData\\video\\'
labels_file = pd.read_csv(labels_path+'labels.csv', index_col=0)

In [3]:
#Since we only used W/WT for our analysis
labels_file_reduced = labels_file[labels_file.trial=='W']
labels_file_reduced.reset_index(drop = True, inplace = True)

In [4]:
print ('Total strides: ', labels_file_reduced.shape[0])

labels_W = labels_file_reduced[labels_file_reduced['scenario'] == 'W']
labels_WT = labels_file_reduced[labels_file_reduced['scenario'] == 'WT']

print ('Total strides in W: ', labels_W.shape[0])
print ('Total strides in WT: ', labels_WT.shape[0])

Total strides:  2430
Total strides in W:  1380
Total strides in WT:  1050


In [5]:
print ('Total HOA strides in W: ', labels_W[labels_W['cohort'] == 'HOA'].shape[0])
print ('Total MS strides in W: ', labels_W[labels_W['cohort'] == 'MS'].shape[0])
print ('Total PD strides in W: ', labels_W[labels_W['cohort'] == 'PD'].shape[0])
print ('\n')
print ('Total HOA strides in WT: ', labels_WT[labels_WT['cohort'] == 'HOA'].shape[0])
print ('Total MS strides in WT: ', labels_WT[labels_WT['cohort'] == 'MS'].shape[0])
print ('Total PD strides in WT: ', labels_WT[labels_WT['cohort'] == 'PD'].shape[0])

Total HOA strides in W:  658
Total MS strides in W:  389
Total PD strides in W:  333


Total HOA strides in WT:  351
Total MS strides in WT:  332
Total PD strides in WT:  367


In [6]:
print ('Total HOA, MS, PD subjects in W: ')
display(labels_W[['PID', 'cohort']].groupby('PID').first().reset_index().groupby('cohort').count())

print ('Total HOA, MS, PD subjects in WT: ')
display(labels_WT[['PID', 'cohort']].groupby('PID').first().reset_index().groupby('cohort').count())

Total HOA, MS, PD subjects in W: 


Unnamed: 0_level_0,PID
cohort,Unnamed: 1_level_1
HOA,14
MS,10
PD,8


Total HOA, MS, PD subjects in WT: 


Unnamed: 0_level_0,PID
cohort,Unnamed: 1_level_1
HOA,8
MS,9
PD,9


In [68]:
print (labels_W.video.unique().shape, labels_WT.video.unique().shape)

(32,) (26,)


In [69]:
#Number of frames after we delete extra frames before the first stride and after the last stride, but, 
#before we downsample to 20 frames per stride
print ('Total frames in strides for trial W: ', labels_W['frame_count'].sum())

print ('\nTotal HOA, MS, PD frames in strides for trial W: ')
display(labels_W.groupby('cohort')['frame_count'].sum())

print ('Total frames in strides for trial WT: ', labels_WT['frame_count'].sum())

print ('\nTotal HOA, MS, PD frames in strides for trial WT: ')
display(labels_WT.groupby('cohort')['frame_count'].sum())

Total frames in strides for trial W:  56226

Total HOA, MS, PD frames in strides for trial W: 


cohort
HOA    26541
MS     16187
PD     13498
Name: frame_count, dtype: int64

Total frames in strides for trial WT:  41747

Total HOA, MS, PD frames in strides for trial WT: 


cohort
HOA    13638
MS     13448
PD     14661
Name: frame_count, dtype: int64

In [70]:
#Average +- standard deviation number of strides per subject (in W/WT - across HOA/MS/PD)
print ('Average number of strides per subject in trial W')
display(labels_W.groupby(['PID', 'cohort']).count().groupby('cohort').mean())

print ('Standard deviation number of strides per subject in trial W')
display(labels_W.groupby(['PID', 'cohort']).count().groupby('cohort').std())

print ('Average number of strides per subject in trial WT')
display(labels_WT.groupby(['PID', 'cohort']).count().groupby('cohort').mean())

print ('Standard deviation number of strides per subject in trial WT')
display(labels_WT.groupby(['PID', 'cohort']).count().groupby('cohort').std())

Average number of strides per subject in trial W


Unnamed: 0_level_0,trial,scenario,video,stride_number,key,frame_count,label
cohort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
HOA,47.0,47.0,47.0,47.0,47.0,47.0,47.0
MS,38.9,38.9,38.9,38.9,38.9,38.9,38.9
PD,41.625,41.625,41.625,41.625,41.625,41.625,41.625


Standard deviation number of strides per subject in trial W


Unnamed: 0_level_0,trial,scenario,video,stride_number,key,frame_count,label
cohort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
HOA,7.874008,7.874008,7.874008,7.874008,7.874008,7.874008,7.874008
MS,8.332667,8.332667,8.332667,8.332667,8.332667,8.332667,8.332667
PD,2.13391,2.13391,2.13391,2.13391,2.13391,2.13391,2.13391


Average number of strides per subject in trial WT


Unnamed: 0_level_0,trial,scenario,video,stride_number,key,frame_count,label
cohort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
HOA,43.875,43.875,43.875,43.875,43.875,43.875,43.875
MS,36.888889,36.888889,36.888889,36.888889,36.888889,36.888889,36.888889
PD,40.777778,40.777778,40.777778,40.777778,40.777778,40.777778,40.777778


Standard deviation number of strides per subject in trial WT


Unnamed: 0_level_0,trial,scenario,video,stride_number,key,frame_count,label
cohort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
HOA,2.799872,2.799872,2.799872,2.799872,2.799872,2.799872,2.799872
MS,9.64941,9.64941,9.64941,9.64941,9.64941,9.64941,9.64941
PD,3.898005,3.898005,3.898005,3.898005,3.898005,3.898005,3.898005


In [71]:
#Average +- standard deviation number of frames per stride (represnting speed of walking) (in W/WT across HOA/MS/PD)
#Higher the number of frames per stride = more the time person took to complete single stride = slower the person 
#is walking 
print ('Average frames per stride in trial W', labels_W.groupby('cohort').mean()['frame_count'])
print ('Standard deviation frames per stride in trial W', labels_W.groupby('cohort').std()['frame_count'])

print ('Average frames per stride in trial WT', labels_WT.groupby('cohort').mean()['frame_count'])
print ('Standard deviation frames per stride in trial WT', labels_WT.groupby('cohort').std()['frame_count'])

Average frames per stride in trial W cohort
HOA    40.335866
MS     41.611825
PD     40.534535
Name: frame_count, dtype: float64
Standard deviation frames per stride in trial W cohort
HOA    10.047132
MS      9.356951
PD      8.772708
Name: frame_count, dtype: float64
Average frames per stride in trial WT cohort
HOA    38.854701
MS     40.506024
PD     39.948229
Name: frame_count, dtype: float64
Standard deviation frames per stride in trial WT cohort
HOA    9.247948
MS     8.495065
PD     8.837949
Name: frame_count, dtype: float64


### Stats for the Multi view merged frames (before we did HSR identification and divided data in strides)

In [114]:
#Stats for Multi view merged frames (before we did HSR identification and 
#divided data in strides)
#Add cohort, trial, video and count of frames per that video 

frame_path_merged = 'C:\\Users\\Rachneet Kaur\\Box\\Gait Video Project\\GaitVideoData\\video\\multi_view_merged_data\\'
cohorts = ['HOA\\', 'MS\\', 'PD\\', 'ExtraHOA\\']
trials = ['walking\\']
video_wise_df = pd.DataFrame(columns = ['cohort', 'video', 'frame count'])

for cohort in cohorts:
    for trial in trials:
        merged_path = frame_path_merged+cohort+trial 
        if (os.path.exists(merged_path)):
            videos = os.listdir(merged_path)
#             print (len(videos))
        for video in videos:
            frames = glob.glob(merged_path+'\\'+video+'\\hip_height_normalized\\*.csv')
            count_frames = len(frames)
            print (video, count_frames)
            if ((cohort == 'HOA\\') or (cohort == 'ExtraHOA\\')):
                cohort_ = 'HOA'
            elif cohort == 'MS\\':
                cohort_ = 'MS'
            else:
                cohort_ = 'PD'
            
            if count_frames>2500:
                count_frames = 2500
            video_wise_df.loc[len(video_wise_df)] = [cohort_, video, count_frames]

GVS_212_W_T1 1733
GVS_212_W_T2 1729
GVS_213_W_T1 1741
GVS_213_W_T2 1742
GVS_214_W_T1 1710
GVS_214_W_T2 1704
GVS_215_W_T1 1736
GVS_215_W_T2 1739
GVS_216_W_T1 1732
GVS_216_W_T2 1725
GVS_217_W_T1 1728
GVS_217_W_T2 1724
GVS_218_W_T1 1710
GVS_218_W_T2 1707
GVS_219_W_T1 1679
GVS_219_W_T2 1589
GVS_310_W_T1 1802
GVS_310_W_T2 1728
GVS_311_W_T1 1548
GVS_311_W_T2 1020
GVS_312_W_T2 1726
GVS_313_W_T1 1771
GVS_313_W_T2 1272
GVS_314_W_T1 1744
GVS_314_W_T2 1730
GVS_318_W_T1 727
GVS_318_W_T2 812
GVS_320_W_T1 1760
GVS_320_W_T2 1745
GVS_321_W_T1 1697
GVS_321_W_T2 1757
GVS_322_W_T1 1728
GVS_322_W_T2 1762
GVS_323_W_T1 1673
GVS_323_W_T2 1780
GVS_403_W_T2 1453
GVS_404_W_T1 1518
GVS_404_W_T2 1686
GVS_404_W_T3 1787
GVS_404_W_T4 1748
GVS_405_W_T1 1444
GVS_405_W_T2 1646
GVS_405_W_T3 1648
GVS_405_W_T4 1679
GVS_406_W_T1 1739
GVS_406_W_T2 1684
GVS_407_W_T1 1702
GVS_407_W_T2 1732
GVS_408_W_T1 1741
GVS_408_W_T2 1745
GVS_409_W_T1 1625
GVS_409_W_T2 1564
GVS_410_W_T1 1747
GVS_410_W_T2 1747
GVS_411_W_T1 1722
GVS_411_W_T2

In [115]:
#Use labels.csv to label each video as W or WT so that we can have stats for W-WT-each cohort
#separately 
labels_file_reduced[['video', 'scenario']].groupby('video').first()
video_wise_df.set_index('video', inplace = True)
video_wise_df['trial'] = labels_file_reduced[['video', 'scenario']].groupby('video').first()['scenario']
video_wise_df.dropna(inplace = True)
video_wise_df.to_csv('C:\\Users\\Rachneet Kaur\\Box\\Gait Video Project\\GaitVideoData\\multi_view_stats.csv')

In [116]:
#Computing stats from the saved csv
multi_view_merged_stats = pd.read_csv('C:\\Users\\Rachneet Kaur\\Box\\Gait Video Project\\GaitVideoData\\multi_view_stats.csv')

print ('Total multi-view merged frames before detecting HSR: ', multi_view_merged_stats['frame count'].sum())

print ('Total multi-view merged frames in trial W and WT: ')
display(multi_view_merged_stats.groupby('trial').sum())

display(multi_view_merged_stats.groupby(['cohort', 'trial']).sum())

Total multi-view merged frames before detecting HSR:  99942
Total multi-view merged frames in trial W and WT: 


Unnamed: 0_level_0,frame count
trial,Unnamed: 1_level_1
W,57708
WT,42234


Unnamed: 0_level_0,Unnamed: 1_level_0,frame count
cohort,trial,Unnamed: 2_level_1
HOA,W,28174
HOA,WT,13763
MS,W,16210
MS,WT,13572
PD,W,13324
PD,WT,14899


### Stats for global 3D coordinates computed after 2D->3D conversion and postprocessing 

In [17]:
#Stats for global 3D frames (before multi-view fusion step)
#Add cohort, trial, video, view (front/side) and count of frames per that video 

frame_path_3D = 'C:\\Users\\Rachneet Kaur\\Box\\Gait Video Project\\GaitVideoData\\video\\3D_data\\'
cohorts = ['HOA\\', 'MS\\', 'PD\\', 'ExtraHOA\\']
trials = ['walking\\']
views = ['lower_body\\', 'feet\\']
video3d_wise_df = pd.DataFrame(columns = ['cohort', 'view', 'video', 'frame count'])

for cohort in cohorts:
    for trial in trials:
        for view in views:
            merged_path = frame_path_3D+cohort+trial+view
            if (os.path.exists(merged_path)):
                videos = os.listdir(merged_path)
    #             print (len(videos))
            for video in videos:
                frames = glob.glob(merged_path+'\\'+video+'\\processed3d\\*.csv')
                count_frames = len(frames)
                print (video, count_frames)
                if ((cohort == 'HOA\\') or (cohort == 'ExtraHOA\\')):
                    cohort_ = 'HOA'
                elif cohort == 'MS\\':
                    cohort_ = 'MS'
                else:
                    cohort_ = 'PD'
                
                if view == 'lower_body\\':
                    view_ = 'front'
                else:
                    view_ = 'side'                  
                video_ = video[5:-7]
                if count_frames>2520:
                    count_frames = 2520
                video3d_wise_df.loc[len(video3d_wise_df)] = [cohort_, view_, video_, count_frames]

InkedGVS_212_W_T1_1_Trim 1740
InkedGVS_212_W_T2_1_Trim 1735
InkedGVS_213_W_T1_1_Trim 1758
InkedGVS_213_W_T2_1_Trim 1746
InkedGVS_214_W_T1_1_Trim 1735
InkedGVS_214_W_T2_1_Trim 1745
InkedGVS_215_W_T1_1_Trim 1754
InkedGVS_215_W_T2_1_Trim 1739
InkedGVS_216_W_T1_1_Trim 1737
InkedGVS_216_W_T2_1_Trim 1752
InkedGVS_217_W_T1_1_Trim 1728
InkedGVS_217_W_T2_1_Trim 1738
InkedGVS_218_W_T1_1_Trim 1754
InkedGVS_218_W_T2_1_Trim 1744
InkedGVS_219_W_T1_1_Trim 1760
InkedGVS_219_W_T2_1_Trim 1759
InkedGVS_212_W_T1_0_Trim 1748
InkedGVS_212_W_T2_0_Trim 1753
InkedGVS_213_W_T1_0_Trim 1746
InkedGVS_213_W_T2_0_Trim 1743
InkedGVS_214_W_T1_0_Trim 1741
InkedGVS_214_W_T2_0_Trim 1737
InkedGVS_215_W_T1_0_Trim 1740
InkedGVS_215_W_T2_0_Trim 1756
InkedGVS_216_W_T1_0_Trim 1745
InkedGVS_216_W_T2_0_Trim 1741
InkedGVS_217_W_T1_0_Trim 1745
InkedGVS_217_W_T2_0_Trim 1737
InkedGVS_218_W_T1_0_Trim 1737
InkedGVS_218_W_T2_0_Trim 1762
InkedGVS_219_W_T1_0_Trim 1763
InkedGVS_219_W_T2_0_Trim 1739
InkedGVS_310_W_T1_1_Trim 1804
InkedGVS_3

In [18]:
video3d_wise_df

Unnamed: 0,cohort,view,video,frame count
0,HOA,front,GVS_212_W_T1,1740
1,HOA,front,GVS_212_W_T2,1735
2,HOA,front,GVS_213_W_T1,1758
3,HOA,front,GVS_213_W_T2,1746
4,HOA,front,GVS_214_W_T1,1735
...,...,...,...,...
113,HOA,side,GVS_112_W_T1,0
114,HOA,side,GVS_113_W_T1,0
115,HOA,side,GVS_115_W_T1,0
116,HOA,side,GVS_123_W_T1,0


In [22]:
#Use labels.csv to label each video as W or WT so that we can have stats for W-WT-each cohort
#separately 
labels_file_reduced[['video', 'scenario']].groupby('video').first()
video3d_wise_df.set_index('video', inplace = True)
video3d_wise_df['trial'] = labels_file_reduced[['video', 'scenario']].groupby('video').first()['scenario']
video3d_wise_df.dropna(inplace = True)
video3d_wise_df.to_csv('C:\\Users\\Rachneet Kaur\\Box\\Gait Video Project\\GaitVideoData\\3Dview_stats.csv')

In [23]:
video3d_wise_df

Unnamed: 0_level_0,cohort,view,frame count,trial
video,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
GVS_212_W_T1,HOA,front,1740,WT
GVS_212_W_T2,HOA,front,1735,W
GVS_213_W_T1,HOA,front,1758,W
GVS_213_W_T2,HOA,front,1746,WT
GVS_214_W_T1,HOA,front,1735,W
...,...,...,...,...
GVS_112_W_T1,HOA,side,0,W
GVS_113_W_T1,HOA,side,0,W
GVS_115_W_T1,HOA,side,0,W
GVS_123_W_T1,HOA,side,0,W


In [27]:
#Computing stats from the saved csv
stats_3d = pd.read_csv('C:\\Users\\Rachneet Kaur\\Box\\Gait Video Project\\GaitVideoData\\3Dview_stats.csv')

print ('Total 3D frames before detecting HSR: ', stats_3d['frame count'].sum())

print ('Total 3D frames in views: ')
display(stats_3d.groupby('view').sum())

print ('Total 3D frames in trial W and WT: ')
display(stats_3d[stats_3d['view']=='front'].groupby('trial').sum())

display(stats_3d.groupby(['view', 'cohort', 'trial']).sum())

Total 3D frames before detecting HSR:  179741
Total 3D frames in views: 


Unnamed: 0_level_0,frame count
view,Unnamed: 1_level_1
front,102598
side,77143


Total 3D frames in trial W and WT: 


Unnamed: 0_level_0,frame count
trial,Unnamed: 1_level_1
W,58648
WT,43950


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,frame count
view,cohort,trial,Unnamed: 3_level_1
front,HOA,W,28554
front,HOA,WT,13959
front,MS,W,16398
front,MS,WT,14707
front,PD,W,13696
front,PD,WT,15284
side,HOA,W,13967
side,HOA,WT,13966
side,MS,W,14357
side,MS,WT,13091


### Stats for 2D coordinates computed after OpenPose and postprocessing 

### Original data stats 