## Gait Video Study 
### Validating the estimates 3D poses via CoP computed on the treadmill 
### This code makes (and saves as csv) the dataframe for each video containing their relevant frames, corresponding feet coordinates for these frames and their relative treadmill extracted CoP values 

We will do qualitative and quantitative validation for CoP. 
* First, we need to align the treadmill GaitCycles.csv file to the video time. This will help us align the corresponding video frames to the gait events i.e. HSR/HSL/TOR and TOL to be specific. 
* Once we know for each video, which frame numbers correspond to heel strikes and toe offs, we compute the sequence of frames that are in single support left phase, frames that are in single support right phase and similarly in double support phase. Thus, each frame of a video is labelled to be in SSR, SSL or DS phase. 
* Now, for each frame in DS phase, use the computed real world x, y coordinates for big toe, small toe and heel to make 2 triangular regions for both left and right feet, since in this phase, both feet are on ground and thus impact the center of pressure. Now, plot the corresponding actual COPX, COPY coordinate (as a red dot) for this particular frame. If this red dot lies in the shaded region of computed CoP drawn, we are good to claim that actual CoP lies in the approximate computed CoP region. Similarly, for each frame in the SSL phase, since left foot rests on ground for this phase, the CoP must be determined using the left foot, and hence use the computed x, y coordinates of the left big toe, small toe and heel to draw a shaded triangular region spanned by CoP for this frame, and draw the red dot for the actual x, y of CoP for this frame and if it lies within the shaded region, we are good to claim that actual and computed CoP region match. Now, for each frame in SSR phase, the shaded CoP region must be determined using the right feet's big/small toe and heel's x, y coordinates and if the actual CoP's x, y is bounded in this shaded computed CoP's region, we are good. 

* For qualitative validation, we plot these above mentioned regions for computed CoP and actual CoPs as markers in/out of that region for each video. If we do this for a complete stride, it should follow a butterfly pattern. And hence the inverted triangles and hexagons should occur in a butterfly pattern.
* For quantitative validation, we will call it success (1) if the actual CoP is bounded by the computed CoP shaded region and failure (0) otherwise for every frame of every video for every trial and cohort. 
* Further, for more precise quantitative validation, we can find the lateral, anterior-posterior and euclidean distance of the actual (COP_x, COP_y) with the centroid of our region drawn. This gives us a numerical value quantifying the error in the true and predicted CoP. This step can especially be done for only wrongly predicted values, to further check what is the measure of wrongly predicted values. Further, we may check that we may have error most in the lateral direction or most in the AP direction or eucliean only. 
* Based on the statistics of these success and failure counts, we can quantify the performance of our marker estimation framework using CoP validation. 
* Further, we can try to correlate/have a look at the distribution pattern to relate the correctness of CoP (either quantified using binary scores or using the numerical scores) with the confidence scores predicted by the OpenPose algorithm. Now, since we are only using toes and heel coordinates to draw the CoM trajectory/region, we should only use the confidence scores for heel and toes for this correlation. To be precise, we can average the confidence scores of left feet's heel and 2 toes to get the aggregated confidence score for frames in left single support, similarly, we can average the confidence scores of right feet's heel and 2 toes to get the aggregated confidence score for frames in right single support, and average the confidence scores of both the left and right feet's heel and toes to get the aggregated confidence scores for frames in double support. Now this correlation/relation between the confidence score for each frame and it's coorsponding correctness of CoP metric can be either done on a frame by frame basis. Or rather we can aggreagte all frames over a stride and do the relationship analysis on a stride by stride basis based on some aggregatd confidence scores of stride with some aggregated correctness of CoP score over each stride. 

In [35]:
from importlib import reload
import imports 
reload(imports)
from imports import *

In [36]:
# labels_file = pd.read_csv('C:\\Users\\Rachneet Kaur\\Box\\Gait Video Project\\GaitVideoData\\video\\labels.csv', index_col = 0)
# pd.DataFrame(labels_file.video.unique(), columns = ['video']).to_csv(cop_path+'treamill_video_cop_sync.csv')

In [37]:
#Folder for CoP validation
cop_path = 'C:\\Users\\Rachneet Kaur\\Box\\Gait Video Project\\CoPvalidation\\'
#Subfolder containing the sync files between frame numbers and treadmill identified events for each video
treadmill_video_sync_files = cop_path + 'CoP_treadmill_video_sync\\'
#Path to store the new dataframes to be created for CoP validation 
path_viz_dataframes = cop_path + 'CoP_dataframes_for_viz\\'
#Path to log file corresponding to the sync files between frame numbers and treadmill identified events for each video
sync_log_file = cop_path + 'treamill_video_cop_sync.csv'

#Path for reading the frame coordinates and OpenPose confidence scores from (toes and heel in particular)
frame_path = cop_path + '..\\GaitVideoData\\video\\multi_view_merged_data\\' 
#Path for the RAWDATA.csv containing the COP values extracted by the treadmill wrt the time of the walking trial 
cop_treadmill_path = cop_path + '..\\GaitCSVData\\csv\\'

#Configuration for which to run the code for 
cohorts = ['\\HOA', '\\MS', '\\PD', '\\ExtraHOA']
trials = ['\\beam_walking', '\\walking']

# for every GaitCycle file, a sequence of walk will always start with a heel strike on the right foot.
# Thus the order of the Gait event points would be HSR, TOL, MidSSR, HSL, TOR and MidSSL.
gait_type = np.array(['HSR', 'TOL', 'MidSSR', 'HSL', 'TOR', 'MidSSL'])

trial_dict = {'BW': 'beam_walking', 'W': 'walking'}

In [38]:
# #Need to run only once to create the new sync file 
# #Reading the log of the treadmill and video syncs 
# sync_log = pd.read_csv(sync_log_file, index_col = 0)
# #Setting the new scenario column for marking the video as belonging to one of the W/WT/VBW/VBWT trials 
# sync_log.set_index('video', inplace=True)
# sync_log['scenario'] = labels_file.groupby('video').first()['scenario']
# sync_log.reset_index(inplace=True)
# print ('Total video files: ', sync_log.shape[0])
# #Saving the new sync file with scenario marked 
# sync_log.to_csv(sync_log_file)

### Utility functions 

In [39]:
# Valid strides in the gait_cycles.csv file 
def get_cycle(dataframe):
    stride_start = min(dataframe.loc[dataframe.EventType == 'HSR'].index)
    stride_end = max(dataframe.loc[dataframe.EventType == 'MidSSL'].index)   
    return dataframe.loc[stride_start:stride_end]

In [40]:
# Restore the indexing for the cropped dataframe 
def change_index(dataframe):
    dataframe.index = range(len(dataframe))
    return dataframe

In [41]:
# get all the valid index in order: HSR-TOL-MidSSR-HSL-TOR-MidSSL
def set_complete(data_frame):
    # input is the Dataframe includes ONLY valid points 
    # get all the index of HSR since it starts with heal strike left
    # if the length of last gait cycle contain HSR does not equals to 6, then ignore it
    
    HSR = data_frame.loc[data_frame.EventType == 'HSR'].index
    last_idx = HSR[-1]
    last_all_idx = data_frame.index[-1]
    # if the last gait cycles contains HSR is not a valid gait cycle, then we should consider the last second HSR instead.
    if((last_all_idx-last_idx) < 5):
        HSR = HSR[0:-1] 
    else:
        HSR = HSR
    
    # get all the valid index in order: HSR-TOL-MidSSR-HSL-TOR-MidSSL
    valid = []
    for idx_HSR in HSR:
        if (((idx_HSR + 1) in data_frame.index) & ((idx_HSR + 2) in data_frame.index) &
            ((idx_HSR + 3) in data_frame.index) & ((idx_HSR + 4) in data_frame.index) & 
            ((idx_HSR + 5) in data_frame.index)):
            # the valid index exist in the dataframe.
            if((data_frame.loc[idx_HSR + 1].EventType == 'TOL') & (data_frame.loc[idx_HSR + 2].EventType == 'MidSSR') & 
               (data_frame.loc[idx_HSR + 3].EventType == 'HSL') & (data_frame.loc[idx_HSR + 4].EventType == 'TOR') & 
               (data_frame.loc[idx_HSR + 5].EventType == 'MidSSL')):
                valid.extend(range(idx_HSR, idx_HSR+6))
    #returns the list of valid indices which form complete strides 
    return valid

In [42]:
#Preprocessing the files to delete missing and invalid data 
def cleaning(gaitcycles_dataframe):         
    #Reducing to complete strides data 
    #Making sure we start at the HSR and end at the MidSSL
    gaitcycles_dataframe = get_cycle(gaitcycles_dataframe)
    #Retaining only complete six even strides 
    indices_complete = set_complete(gaitcycles_dataframe)
    gaitcycles_dataframe = gaitcycles_dataframe.loc[indices_complete]

    #Resetting the index 
    gaitcycles_dataframe = change_index(gaitcycles_dataframe)
    #Returning indices to identify consequetive and non-consequetive strides 
    return indices_complete, gaitcycles_dataframe

In [43]:
def clean_sync_files(video):
    #Read the file for the current video syncing event types to frame numbers 
    video_csv = pd.read_excel(treadmill_video_sync_files + video + '.xlsx')
    #Retaining only time for treadmill's CoP matching, event type for SS/DS group assignment 
    #and frame number for extracting body coordinates
    video_csv = video_csv[['Time', 'EventType', 'frame_number']]
    #Dropping the entries/events that could not be synced to their corresponding video frames 
    video_csv.dropna(inplace = True)
    #Retaining the indices and corresponding rows of dataframe for 'complete' 6 event strides only
    #Indices will help identify consequetive and non-consequetive strides 
    indices_retain, video_csv_retain = cleaning(video_csv)
    #Converting frame number to ints 
    video_csv_retain['frame_number'] = video_csv_retain['frame_number'].astype(int)
#     display(video_csv_retain.head(), video_csv.shape, video_csv_retain.shape)
    return indices_retain, video_csv_retain

In [44]:
def fill_up_body_coordinates(viz_df, coordinate_path, coords_of_interest, coordinate_cols):
    '''
    Filling up the toe and heel coordinates in cm and confidence scores in the viz dataframe for the current video
    '''
    #Iterating through each frame number to read it's corresponding file for body coordinates and filling them up in the dataframe
    for frame_number in viz_df.index:
        try: #Since some frames from the video data may be missing 
            frame = pd.read_csv(coordinate_path+str(frame_number)+'.csv', index_col = 0)
            #display(frame)
            #For each frame, we are interested in only feet coordinates for drawing the CoM area
            #Further, we use only x, y coordinates and confidence scores for this validation analysis 
            frame_coords_of_interest = frame.loc[coords_of_interest][['x', 'y', 'confidence']] 
            #Filling up the feet coordinates for a particular frame 
            viz_df.loc[frame_number, coordinate_cols] = frame_coords_of_interest.values.flatten()
        except: 
            #If the particular frame was missing in video data, let the values be NaN for that missing video data frame  
            pass
#     display(viz_df.head())
    return viz_df

In [45]:
def fill_up_treadmill_COP_values(viz_df, cop_path):
    '''
    Filling up the treadmill CoP values 
    '''
    #For each trial/video, reading the corresponding RAWDATA.csv file containing the CoP_x, CoP_y values spaced at 0.002 seconds 
    cop_file = pd.read_csv(cop_path, header = 1)
    #Retaining only time, COPX, COPY columns from the file
    cop_file = cop_file[['Time', 'COPX', 'COPY']]
    #     display(cop_file.head())

    #Since the frame times are different than the 0.002 spacing of time in the RAWDATA.csv file, we find the 
    #closest time from the RAWDATA.csv file (since it's much more granular) to each frame's time in viz_df
    #We use the fact that time is sorted in increasing order in the RAWDATA.csv file
    cop_closest_times_left_bound = [cop_file['Time'][cop_file['Time']>(viz_df['Time'].iloc[i]-(1/60))].iloc[0] for i in range(len(viz_df))]
    cop_closest_times_right_bound = [cop_file['Time'][cop_file['Time']<(viz_df['Time'].iloc[i]+(1/60))].iloc[-1] for i in range(len(viz_df))]
    cop_file.set_index('Time', inplace = True)
    #Assinging the COPX and COPY corresponding to the closest times in the RAWDATA.csv file to frame times 
    treadmill_COP_x = [cop_file.loc[i:j]['COPX'].mean() for i, j in zip(cop_closest_times_left_bound, cop_closest_times_right_bound)]
    treadmill_COP_y = [cop_file.loc[i:j]['COPY'].mean() for i, j in zip(cop_closest_times_left_bound, cop_closest_times_right_bound)]
    return treadmill_COP_x, treadmill_COP_y

In [46]:
def fill_up_support_types(viz_df):
    '''
    Filling up the support types
    '''
    #Marking the frame numbers of frames for HSR, TOL, HSL and TOR events 
    HSR_frames = viz_df[viz_df['EventType'] == 'HSR'].index
    TOL_frames = viz_df[viz_df['EventType'] == 'TOL'].index
    HSL_frames = viz_df[viz_df['EventType'] == 'HSL'].index
    TOR_frames = viz_df[viz_df['EventType'] == 'TOR'].index

    '''For our complete strides with sequence of events being: HSR-TOL-MidSSR-HSL-TOR-MidSSL-next stride's HSR, we compute the 
    initial double support as frames between (including) HSR and (not including) TOL, right single support as frames between (including) TOL 
    and (not including) HSL, terminal double support as frames between (including) HSL and (not including) TOR, left single support as 
    frame between (including) TOR and (not including) HSR. 
    The reason for including the left interval but not the right one, is because these events are typically treated as the 
    boundaries of the different states, so in the case of double support: HSR_time < time < TOL_time. 
    For single support, you can carry out the same process, where: TOL < time < HSL_time is right single support. 
    Now since when we compute the frame number, the frame no.s we assigned were one integer ahead round of float frame number 
    we got for each event. So according to HSR_time < time < TOL_time rule, if frame 231 is TOL_frame, then it should not belong to 
    double support but belong to right SS. 
    Hence for our case, double support frames are HSR_frame<=frames<TOL_frame, the right single support frames are TOL_frame<=frames<HSL_frame
    and so on.
    '''
    #Initial double support
    initial_double_support_indices = [viz_df.loc[HSR_frames[i]:TOL_frames[i]-1].index.values for i in range(len(HSR_frames))]
    #-1 from the right limit of the interval to make sure we do not include frame of TOL to the initial double support frames list
    initial_double_support_list = np.concatenate(initial_double_support_indices).ravel().tolist()

    #Right single support 
    right_single_support_indices = [viz_df.loc[TOL_frames[i]:HSL_frames[i]-1].index.values for i in range(len(TOL_frames))]
    #-1 from the right limit of the interval to make sure we do not include frame of HSL to the right single support frames list
    right_single_support_list = np.concatenate(right_single_support_indices).ravel().tolist()

    #Terminal double support
    terminal_double_support_indices = [viz_df.loc[HSL_frames[i]:TOR_frames[i]-1].index.values for i in range(len(HSL_frames))]
    terminal_double_support_list = np.concatenate(terminal_double_support_indices).ravel().tolist()

    #Left single support
    left_single_support_indices = [viz_df.loc[TOR_frames[i]:HSR_frames[i+1]-1].index.values for i in range(len(HSR_frames)-1)]
    left_single_support_list = np.concatenate(left_single_support_indices).ravel().tolist()

    return initial_double_support_indices, initial_double_support_list, right_single_support_indices, right_single_support_list, \
            terminal_double_support_indices, terminal_double_support_list, left_single_support_indices, left_single_support_list

In [47]:
def handle_non_consequetive_strides(indices_retain, video_csv_retain, viz_df):
    #Handling non-consequetive strides 
    #Using the indices for the complete strides to infer inconsequtive strides' MidSSL frame number
    non_consequetive_stride_MidSSLs = np.where(np.array(list(map(operator.sub, indices_retain[1:], indices_retain[:-1])))!=1)[0]
    print ('No. of non-consequetive strides: ', len(non_consequetive_stride_MidSSLs))
    #Looping through each non consequetive strides' MidSSL
    for non_consequetive_stride_MidSSL in non_consequetive_stride_MidSSLs:
        #Getting the frame nunber for TOR and HSR (since for non-consequetive strides, left single support from 
        #current stride's TOR-next strides' HSR is invalid!)
        non_consequetive_stride_TOR = video_csv_retain.iloc[non_consequetive_stride_MidSSL-1].frame_number
        non_consequetive_stride_HSR = video_csv_retain.iloc[non_consequetive_stride_MidSSL+1].frame_number
        #Changing all the frames between TOR (including) and HSR (not including) support time to 'non_consequetive_stride' keyword
        viz_df.loc[non_consequetive_stride_TOR:non_consequetive_stride_HSR-1]['support_type'] = 'non_consequetive_stride'
    return viz_df

In [48]:
def generate_data_grouped_by_support_type(indices, coordinate_cols, viz_df):
    #Starting frame numbers for initial double support 
    frame_start = [index_list[0] for index_list in indices]
    #Ending frame numbers for initial double support
    frame_end = [index_list[-1] for index_list in indices]
    support_type = [viz_df.loc[index_list[0]].support_type for index_list in indices]
    coordinate_cols_mean = [viz_df[coordinate_cols].loc[index_list].mean().values for index_list in indices]
    treadmill_cop_mean = [viz_df[['treadmill_COP_x',  'treadmill_COP_y']].loc[index_list].mean().values for index_list in indices]

    time_start = viz_df['Time'].loc[frame_start].values
    time_end = viz_df['Time'].loc[frame_end].values

    x = [sum(list([ [time_start[i]], [time_end[i]], [frame_start[i]], [frame_end[i]], [support_type[i]], list(coordinate_cols_mean[i]), list(treadmill_cop_mean[i])]), []) \
         for i in range(len(frame_start))]
    return pd.DataFrame(x)

In [63]:
def generate_frame_and_support_group_dataframes(index, viz_df_column_names):
    video = index['video']
    cohort = index['cohort']
    trial = trial_dict[index['trial']]
    
    #For each video, we will create a dataframe and a csv file corresponding to frame coordinates and treadmill CoP
    viz_df = pd.DataFrame(columns = viz_df_column_names)
    #Retaining the complete strides only from the synced treadmill six event times and video frames 
    indices_retain, video_csv_retain = clean_sync_files(video)
    
    #Filling up the event type, frame number and time columns
    viz_df[['Time', 'EventType', 'frame_number']] = copy.deepcopy(video_csv_retain)
    viz_df.set_index('frame_number', inplace=True)
    #Listing all the frame numbers from the first HSR to the last MidSSL for viz_df 
    display(viz_df.head(), viz_df.shape)
    print (np.arange(min(viz_df.index), max(viz_df.index)+1))
    viz_df = viz_df.reindex(np.arange(min(viz_df.index), max(viz_df.index)+1))
    display(viz_df.head())
    #Filling up the time using interpolation (this will indeed follow that each frame is 1/30 seconds apart since our FPS=30)
    viz_df.interpolate(method = 'index', inplace= True)
    
    #Filling up the toe and heel coordinates in cm and confidence scores in the viz dataframe for the current video
    #To fill up the feet coordinates for each video, setting the path for body coordinate files 
    try: 
        coordinate_path = frame_path+ cohort + '\\' + trial + '\\' + video + '\\hip_height_normalized\\'
    except: #For ExtraHOA files 
        coordinate_path = frame_path+ 'ExtraHOA' + '\\' + trial + '\\' + video + '\\hip_height_normalized\\'
    viz_df = fill_up_body_coordinates(viz_df, coordinate_path, coords_of_interest, coordinate_cols)

    #Filling up the treadmill CoP values 
    #Note that the video extracted coordinates are in 'cm', but the treadmill's CoP are in 'm'
    cop_path = cop_treadmill_path + cohort + '\\' + trial + '\\' + video  + '_RAWDATA.csv'
    treadmill_COP_x, treadmill_COP_y = fill_up_treadmill_COP_values(viz_df, cop_path) 
    viz_df['treadmill_COP_x'] = treadmill_COP_x
    viz_df['treadmill_COP_y'] = treadmill_COP_y
        
    #Filling up the support types 
    initial_double_support_indices, initial_double_support_list, right_single_support_indices, right_single_support_list, \
    terminal_double_support_indices, terminal_double_support_list, left_single_support_indices, left_single_support_list = fill_up_support_types(viz_df)

    #For frame numbers corresponding to each support group, assigning the relative label to 'support_type' column in the viz dataframe
    viz_df.loc[initial_double_support_list, 'support_type'] = 'initial DS'
    viz_df.loc[right_single_support_list, 'support_type'] = 'right SS'
    viz_df.loc[terminal_double_support_list, 'support_type'] = 'terminal DS'
    viz_df.loc[left_single_support_list, 'support_type'] = 'left SS'
    
    #Since body coordinates are recorded in 'cm', but treadmill CoP in 'm', converting CoP values to 'cm'
    viz_df['treadmill_COP_x'] = 100*viz_df['treadmill_COP_x']
    viz_df['treadmill_COP_y'] = 100*viz_df['treadmill_COP_y']
    
    #Handling non consequetive strides 
    viz_df = handle_non_consequetive_strides(indices_retain, video_csv_retain, viz_df)
    
    #Saving 2 dataframes, one being frame wise for each video and other being group wise where we have 4 groups, namely initial/terminal
    #double support and left/right single support per stride of the video 
    #Ideally, we should use the frame wise computed coordinates and CoPs for vizualization purposes and support group wise 
    #(basically, average all the frames per support group) computed coordinates and CoPs for validation purposes. 
    '''
    You need the average COP and frame coordinates because you are trying to minimize noise. 
    You are trying to average across the few frames within each event to get an estimate of where the average COP position is.
    '''
    #New dataframe which contains coordinates and treadmill CoPs grouped by support type, i.e. each stride of the video has only 4 entries,
    #for left/right SS and initial/terminal DS
    #Columns for the new reduced dataframe are: frame_number_start, frame_number_end, time_start, time_end, support type, average of all
    #coordinates, confidences and treadmill CoP coordinates

    #Initial double support 
    initial_double_support_grouped_data = generate_data_grouped_by_support_type(initial_double_support_indices, coordinate_cols, viz_df)
    #Right single support 
    right_single_support_grouped_data = generate_data_grouped_by_support_type(right_single_support_indices, coordinate_cols, viz_df)
    #Terminal double support 
    terminal_double_support_grouped_data = generate_data_grouped_by_support_type(terminal_double_support_indices, coordinate_cols, viz_df)
    #Left single support 
    left_single_support_grouped_data = generate_data_grouped_by_support_type(left_single_support_indices, coordinate_cols, viz_df)

    #Concatenating all the four support type groups 
    viz_df_grouped_by_support_type = pd.concat((initial_double_support_grouped_data, right_single_support_grouped_data, terminal_double_support_grouped_data, \
               left_single_support_grouped_data), ignore_index=True)
    viz_df_grouped_by_support_type.columns = viz_df_grouped_by_support_type_column_names
    
    #Sorting by starting frame number to arrange in correct strides order 
    viz_df_grouped_by_support_type = viz_df_grouped_by_support_type.sort_values(by = 'frame_number_start')
    viz_df_grouped_by_support_type.reset_index(inplace = True)
    viz_df_grouped_by_support_type.drop('index', axis = 1, inplace = True)
    
    viz_df.reset_index(inplace = True)
    
    #Saving both the frame wise and support group wise dataframes to .csvs
    viz_df.to_csv(path_viz_dataframes+video+'.csv')
    viz_df_grouped_by_support_type.to_csv(path_viz_dataframes+video+'_grouped_by_support_type.csv')

#### main()

In [64]:
sync_log = pd.read_csv(sync_log_file, index_col = 0)
print ('Total video files: ', sync_log.shape[0])

#Reducing only to video files for which treadmill sync exists 
#(sync time was present in the logs and treadmill software identified valid events)
sync_log = sync_log[sync_log['Sync']=='Exists']
print ('Total video files for which treadmill sync exists (sync time was present in the logs and treadmill software identified valid events): '\
       , sync_log.shape[0])

'''
Columns for the vizualization dataframe for each video 
Support type is left single support, right single support, double support or NaN for the left single support when 
the strides are not consequetive 
'HSR' - 'TOL': Initial double support 
'TOL' - MidSSR' - 'HSL': Right single support
'HSL' - TOR': Terminal double support
'TOR' - MidSSL' - Next 'HSR': Left single support (only when the strides are consequetive, else NaN)
Extracted (x, y) for real world 3D coordinates and OpenPose confidence scores for the feet
Treadmill's COP_x, COP_y for validation 
'''
#We are only extracting feet coordinates to draw center of mass trajectory from the body coordinate files
coords_of_interest = ['left toe 1', 'left toe 2', 'left heel', 'right toe 1', 'right toe 2', 'right heel']

coordinate_cols = ['left toe 1-x', 'left toe 1-y', 'left toe 1-conf', 'left toe 2-x', \
                      'left toe 2-y', 'left toe 2-conf', 'left heel-x', 'left heel-y', 'left heel-conf', 'right toe 1-x', 'right toe 1-y',\
                       'right toe 1-conf', 'right toe 2-x', 'right toe 2-y', 'right toe 2-conf', 'right heel-x', 'right heel-y', \
                       'right heel-conf']
viz_df_column_names = ['Time', 'EventType', 'frame_number', 'support_type'] + coordinate_cols + ['treadmill_COP_x',  'treadmill_COP_y']

#Column names for dataframe grouped by support types 
viz_df_grouped_by_support_type_column_names = ['Time_start', 'Time_end', 'frame_number_start', 'frame_number_end', 'support_type'] + coordinate_cols + ['treadmill_COP_x',  'treadmill_COP_y']


Total video files:  107
Total video files for which treadmill sync exists (sync time was present in the logs and treadmill software identified valid events):  102


In [65]:
#For each video with treadmill and video sync available 
for idx in range(len(sync_log)):
    start_time = time.time()
    index = sync_log.loc[78] #sync_log.iloc[idx]
    print ('Running', index['video'])
    generate_frame_and_support_group_dataframes(index, viz_df_column_names)
    print (index['video'], 'completed in ', time.time()-start_time, ' seconds.')
    break

Running GVS_404_W_T1


Unnamed: 0_level_0,Time,EventType,support_type,left toe 1-x,left toe 1-y,left toe 1-conf,left toe 2-x,left toe 2-y,left toe 2-conf,left heel-x,...,right toe 1-y,right toe 1-conf,right toe 2-x,right toe 2-y,right toe 2-conf,right heel-x,right heel-y,right heel-conf,treadmill_COP_x,treadmill_COP_y
frame_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,9.00027,HSR,,,,,,,,,...,,,,,,,,,,
9,9.27627,TOL,,,,,,,,,...,,,,,,,,,,
12,9.38027,MidSSR,,,,,,,,,...,,,,,,,,,,
15,9.48427,HSL,,,,,,,,,...,,,,,,,,,,
24,9.77827,TOR,,,,,,,,,...,,,,,,,,,,


(192, 23)

[   1    2    3 ... 1511 1512 1513]


ValueError: cannot reindex from a duplicate axis

In [58]:
sync_log.loc[78]

video                GVS_404_W_T1
Sync                       Exists
total_frame_count            1519
cohort                         PD
trial                           W
scenario                       WT
Name: 78, dtype: object