# Sync activity video clips with accelerometer data

Project status:
- COMPLETE: Get start times for videos using Python
- COMPLETE: Use start/stop frame number and convert to UTC
- IN PROGRESS: deal with cisrol12
- Modify GUI function to use my start/stop times to label data for relevant subjects and cycles below

Notes:
- fps = 29.97 aka Video Frame Rate
- 33.367 milliseconds per frame

In [192]:
# Importing the Libraries
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import re
import datetime as dt
from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip

## RTO drive is X:

## test case video clip to get timestamps


In [9]:
# subject 1050
id = 'cisuabn14'
path = r'X:\CIS-PD Videos'
subj_path = os.path.join(path,id)
video_name = 'cisuabn14_cycle2.mp4'
video_clip_path = os.path.join(subj_path,video_name)

In [10]:
video_clip_path

'X:\\CIS-PD Videos\\cisuabn14\\cisuabn14_cycle2.mp4'

In [11]:
# these give my downloaded time not the actual time
import os.path, time
print("Last modified: %s" % time.ctime(os.path.getmtime(video_clip_path)))
print("Created: %s" % time.ctime(os.path.getctime(video_clip_path)))

Last modified: Thu Jul 12 08:55:07 2018
Created: Thu Jul 12 09:04:26 2018


# Load all sec_annotation.csv files for each subj, concatenate into 1 df

In [272]:
# read in timestamp file with mp4 metadata
path = r'X:\CIS-PD Videos\timestamp'
filename = os.path.join(path, 'video_utc_timestamp.csv')
timestamp_df = pd.read_csv(filename)
timestamp_df = timestamp_df.drop(columns=['Unnamed: 0','videoname','CreateDate','ModifyDate','UTC_modify_date'])

In [273]:
# list of subjects without ciscid4, ciscih8, ciccij10 due to it being edited multiple times
# omitted cisrol12 as it doesn't have a sec_annotation file
names_minus3 = ['cisnwh8','cisuabd4','cisuabe5','cisuabf6','cisuabg7','cisnwe5','cisnwf6','cisuabn14']

# create empty list
appended_data = []

# create 1 dataframe from each subject's sec_annotation.csv file
for i, k in enumerate(names_minus3):
    path = r'X:\CIS-PD Videos'
    path_subj = os.path.join(path,k) 
    path_file = os.path.join(path_subj,'sec_annotation.csv')
    data = pd.read_csv(path_file)
    appended_data.append(data)
    
# concatenate list of dataframes
appended_data = pd.concat(appended_data, ignore_index=True)
appended_data = appended_data.drop(columns=['Unnamed: 0'])

# combine subjid and cycle number column to create a key for merge
# combine strings of both columns
timestamp_df.cycle = timestamp_df.cycle.astype(str)
timestamp_df.cycle = timestamp_df.subjid + timestamp_df.cycle
# drop subjid column
timestamp_df = timestamp_df.drop(columns=['subjid'])
# change name of column
timestamp_df = timestamp_df.rename(index=str,columns={'cycle':'subj_cycle'})

In [274]:
# Combine subject code and cycle column to create a key for merge in appended_data dataframe that
# has the activity clip frame annotations
appended_data.cycle = appended_data.cycle.astype(str)
appended_data['subj_cycle'] = appended_data['subject code'] + appended_data.cycle

# Merge dataframes based on subj_cycle columns in both

In [275]:
utc_df = pd.merge(timestamp_df, appended_data, on='subj_cycle',how='outer')

# Transform start and stop frame with UTC create time

In [276]:
utc_df['start_utc'] = utc_df['start frame']*33.367+utc_df.UTC_create_date
utc_df['stop_utc'] = utc_df['stop frame']*33.367+utc_df.UTC_create_date

# Adjust UAB site data by... 1yr 5 hrs
- UAB subject: 1003, 1005, 1007, 1009, 1050
- cisuabd4 cisuabe5 cisuabf6 cisuabg7 cisuabn14

In [198]:
# millisecond conversions
year = 31556952000
fivehr = 18000000
uab_convertor = year + fivehr

In [277]:
# Add 1 year and 5 hrs to uab subjects
uab_names = ('cisuabd4','cisuabe5','cisuabf6','cisuabg7','cisuabn14')
for i, k in enumerate(uab_names):
    utc_df.loc[utc_df['subject code'] == k, 'start_utc'] += uab_convertor
    utc_df.loc[utc_df['subject code'] == k, 'stop_utc'] += uab_convertor

# Combine NtsBts activity split into cycle 6 part 1 and 2 videos into 1 row

In [278]:
# combine rows 503 and 504
utc_df.stop_utc[503] = utc_df.stop_utc[504]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [295]:
# drop row and reindex
utc_df = utc_df.drop([504]).reset_index(drop=True)

# Change cycle 7 to 6

In [314]:
utc_df.loc[utc_df.cycle == '7', 'cycle'] = '6'

# temp dataframe

In [385]:
df = utc_df.copy()

# Deal with cisrol12 (1048) separately
- utc_df does NOT have cisrol12 data on it
- Create separate script for cisrol12 since it doesn't have sec_annotation.csv

In [345]:
def keeprightstring(string, sep='cisrol12'):
    """Take a string and keep text after specified character.
    Default character is 'cisrol12'."""
    new_string = string.split(sep, 1)[-1]
    return new_string

In [387]:
# Add necessary data for cisrol12
# subject code
df.loc[df['subj_cycle'].str.contains('cisrol12'), 'subject code'] = 'cisrol12'
# start frame
df.loc[df['subj_cycle'].str.contains('cisrol12'), 'start frame'] = 0
# start_utc = create time
df.loc[df['subj_cycle'].str.contains('cisrol12'), 'start_utc'] = df.UTC_create_date
# cycle
df.loc[df['subj_cycle'].str.contains('cisrol12'), 'cycle'] = df.subj_cycle
for i in range(391,433):
    df.cycle[i] = keeprightstring(df.cycle[i])
# SKIP start time
# SKIP stop time

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()


In [389]:
df.loc[df['subj_cycle'].str.contains('cisrol12')].head(5)

Unnamed: 0,UTC_create_date,subj_cycle,subject code,start frame,stop frame,activity,cycle,shortname,start time sec,stop time sec,start_utc,stop_utc,subject_number
391,1505134245000,cisrol121,cisrol12,0.0,,,1,,,,1505134000000.0,,Unknown
392,1505134286000,cisrol121,cisrol12,0.0,,,1,,,,1505134000000.0,,Unknown
393,1505134361000,cisrol121,cisrol12,0.0,,,1,,,,1505134000000.0,,Unknown
394,1505137757000,cisrol122,cisrol12,0.0,,,2,,,,1505138000000.0,,Unknown
395,1505137780000,cisrol122,cisrol12,0.0,,,2,,,,1505138000000.0,,Unknown


# can remove next 2 lines

In [190]:
df['subj_cycle'][391] # - 'cisrol12'

'cisrol121'

In [191]:
df['subj_cycle'][391].replace('cisrol12','')

'1'

# Add column containing 4 digit id
- need to complete cisrol12 data first to add its 4 digit id
- make sure to user 'df' dataframe

In [390]:
# Get subject id or code
path_id = r'X:\CIS-PD MUSC\decoded_forms'
filename_id = os.path.join(path_id, 'videoID.csv') # ie. file = 'videoID.csv'
subjid_df = pd.read_csv(filename_id)
subjid_df.SubjectCode = subjid_df.SubjectCode.astype('int')
# get 4 digit subject code
reverse_id_dict = subjid_df.set_index('FoxInsightID').to_dict()['SubjectCode']

In [397]:
# test
reverse_id_dict.get('cisrol12', 'Unknown')

1048

In [398]:
df['subject_number'] = df['subject code']
for i, k in enumerate(df.subject_number):
    df['subject_number'][i] = reverse_id_dict.get(k,1048)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [399]:
df.groupby('subject_number').count()
# unknown is cisrol12

Unnamed: 0_level_0,UTC_create_date,subj_cycle,subject code,start frame,stop frame,activity,cycle,shortname,start time sec,stop time sec,start_utc,stop_utc
subject_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1003,90,90,90,90,90,90,90,90,90,90,90,90
1005,60,60,60,60,60,60,60,60,60,60,60,60
1007,60,60,60,60,60,60,60,60,60,60,60,60
1009,45,45,45,45,45,45,45,45,45,45,45,45
1019,15,15,15,15,15,15,15,15,15,15,15,15
1024,15,15,15,15,15,15,15,15,15,15,15,15
1030,90,90,90,90,90,90,90,90,90,90,90,90
1048,58,58,42,42,0,0,42,0,0,0,42,0
1050,75,75,75,75,75,75,75,75,75,75,75,75


In [400]:
df.head(5)

Unnamed: 0,UTC_create_date,subj_cycle,subject code,start frame,stop frame,activity,cycle,shortname,start time sec,stop time sec,start_utc,stop_utc,subject_number
0,1501681699000,cisnwh81,cisnwh8,1118.0,2145.0,Standing,1,Stndg,37.0,71.0,1501682000000.0,1501682000000.0,1030
1,1501681699000,cisnwh81,cisnwh8,2315.0,3333.0,Walking,1,Wlkg,77.0,111.0,1501682000000.0,1501682000000.0,1030
2,1501681699000,cisnwh81,cisnwh8,3608.0,4800.0,Walking while counting,1,WlkgCnt,120.0,160.0,1501682000000.0,1501682000000.0,1030
3,1501681699000,cisnwh81,cisnwh8,5832.0,6518.0,Finger to nose--right hand,1,FtnR,194.0,217.0,1501682000000.0,1501682000000.0,1030
4,1501681699000,cisnwh81,cisnwh8,6518.0,7121.0,Finger to nose--left hand,1,FtnL,217.0,237.0,1501682000000.0,1501682000000.0,1030


# Adjust cycle for Nick's GUI

In [410]:
# Change cycle dtype from str to integer
df.cycle = pd.to_numeric(df.cycle, downcast='integer')

In [417]:
# Start cycle 1 at 0 for GUI
df.cycle += -1

# Save utc_df as csv file

In [419]:
path = r'X:\CIS-PD Videos\timestamp'
fname = 'GUI_timestamp.csv'
filename = os.path.join(path, fname)
with open(filename,'wb') as f:
    df.to_csv(filename, sep=',')

# for cisrol12, add the following data
################### work on this
# stop frame
# stop_utc = ?
# activity
# shortname

# Subjects 1030, 1019, and 1024 UTC timestamps were not adjusted for any offset from watch timestamp.

# combine cisuabn14 cycle 6 part 1 and 2 ntsbts video

Summary of data that is off (info from watch data)
- 1003 is off by 1 yr and 5 hours and some seconds, cycle 3 was off by an additional 9 min
- 1005 is off by 1 yr 5 hrs and some seconds, cycle 2 is off by an additional 5 min
- 1007 is off by 1 yr 5 hrs and some seconds
- 1009 is off by 1 yr 5 hrs and some seconds
- 1050 is off by 1 yr, 5 hrs and some sec
- 1030 is off by several seconds (usually around 30 sec)
- 1019 is off by 29.5 min
- 1024 is off by 50 sec
- 1048 cycle 1 3.5 hours off, cycle 2 missing, cycle 4 is 4 hrs off, cycle 5 is about 4 hrs off

These videos we suspect editing, so Skip these subjects:
- 1023 is off by 18 days, but the watch shaking time for all cycles the same
- 1039 is off by 2 months, 13 days, and variable time but the watch shaking time for all cycles the same
- 1043 is off by 2 months, 1 day, but  the watch shaking time for all cycles the same