# Instructions

Your `data/labels.csv` file needs to be in the format `video_name, video_frame_filename_with_timestamp, label, split` to use this package to train models. 

It is likely that your labels start in one of the following 2 formats:
1. `video_name, video_frame_filename_with_timestamp, label` format
2. `video_name, timestamp_start, timestamp_end, label`

If your data is in format 2, then run the helper notebook in `notebooks/helper_convert_timestamps_file_to_labels.ipynb` to convert it to format 1.

Once your labels file is in format 1, use this notebook to add the `split` column that allocates videos to train/test/validation splits by specifying a list of video names for each split. 

Note, this package assumes train/valid/test videos are split at the video level. If you have one very long video, you could cut it up into several smaller videos first...

# Setup

In [19]:
import os
import pandas as pd
import numpy as np
import json

import cv2
from time import time as timer
import sys
sys.path.append('..')

In [20]:
# setup paths
pwd = os.getcwd().replace("notebooks","")
path_cache = pwd + 'cache/'
path_data = '/media/tiesbarendse/DATA/be_ts_k_3350_n_200_0404231811' #pwd + 'data_cnn_ts_3d/'

In [21]:
labels_df = pd.read_csv(path_data + '/labels.csv')

In [22]:
labels_df

Unnamed: 0,boarding_event,frame_index,label,t
0,trajs_2018-03-11_Ut_3048_door_3,0,pre-deboarding,
1,trajs_2018-03-11_Ut_3048_door_3,1,pre-deboarding,
2,trajs_2018-03-11_Ut_3048_door_3,2,pre-deboarding,
3,trajs_2018-03-11_Ut_3048_door_3,3,pre-deboarding,
4,trajs_2018-03-11_Ut_3048_door_3,4,pre-deboarding,
...,...,...,...,...
669995,trajs_2017-05-23_Ut_3052_door_3,195,post-boarding,
669996,trajs_2017-05-23_Ut_3052_door_3,196,post-boarding,
669997,trajs_2017-05-23_Ut_3052_door_3,197,post-boarding,
669998,trajs_2017-05-23_Ut_3052_door_3,198,post-boarding,


In [23]:
bes = [filename[:-4] for filename in os.listdir(path_data) if filename[:5]=='trajs']
bes

['trajs_2017-03-01_Ut_3020_door_3',
 'trajs_2017-03-01_Ut_3024_door_3',
 'trajs_2017-03-01_Ut_3026_door_3',
 'trajs_2017-03-01_Ut_3028_door_4',
 'trajs_2017-03-01_Ut_3034_door_4',
 'trajs_2017-03-01_Ut_3040_door_4',
 'trajs_2017-03-01_Ut_3048_door_4',
 'trajs_2017-03-01_Ut_3118_door_3',
 'trajs_2017-03-01_Ut_3154_door_3',
 'trajs_2017-03-02_Ut_3042_door_4',
 'trajs_2017-03-02_Ut_3044_door_4',
 'trajs_2017-03-02_Ut_3048_door_3',
 'trajs_2017-03-02_Ut_3050_door_3',
 'trajs_2017-03-02_Ut_3056_door_3',
 'trajs_2017-03-02_Ut_3070_door_3',
 'trajs_2018-01-16_Ut_3042_door_3',
 'trajs_2018-01-16_Ut_3048_door_4',
 'trajs_2018-01-16_Ut_3064_door_4',
 'trajs_2018-01-16_Ut_3078_door_3',
 'trajs_2018-01-16_Ut_3122_door_3',
 'trajs_2018-01-16_Ut_3128_door_3',
 'trajs_2018-01-16_Ut_3130_door_3',
 'trajs_2018-01-16_Ut_3142_door_4',
 'trajs_2018-01-17_Ut_3028_door_3',
 'trajs_2018-01-17_Ut_3034_door_4',
 'trajs_2018-01-17_Ut_3060_door_3',
 'trajs_2018-01-17_Ut_3080_door_3',
 'trajs_2018-01-17_Ut_3118_d

In [24]:
n = len(bes)
test_n  = int(np.floor(0.2 * n))
val_n = int(np.floor(0.2 * n))
train_n = n - test_n - val_n

In [25]:
test_bes = bes[:int(test_n)]
val_bes = bes[-int(val_n):]
train_bes = [be_name for be_name in bes if (be_name not in test_bes) and (be_name not in val_bes)]

In [26]:
labels_df.set_index('boarding_event', inplace=True)

for be_name in bes:
    if be_name in train_bes:
        labels_df.loc[be_name, 'split'] = 'train'
    elif be_name in test_bes:
        labels_df.loc[be_name, 'split'] = 'test'
    elif be_name in val_bes:
        labels_df.loc[be_name, 'split'] = 'val'
        
labels_df.reset_index(drop=False, inplace=True)
labels_df

Unnamed: 0,boarding_event,frame_index,label,t,split
0,trajs_2018-03-11_Ut_3048_door_3,0,pre-deboarding,,val
1,trajs_2018-03-11_Ut_3048_door_3,1,pre-deboarding,,val
2,trajs_2018-03-11_Ut_3048_door_3,2,pre-deboarding,,val
3,trajs_2018-03-11_Ut_3048_door_3,3,pre-deboarding,,val
4,trajs_2018-03-11_Ut_3048_door_3,4,pre-deboarding,,val
...,...,...,...,...,...
669995,trajs_2017-05-23_Ut_3052_door_3,195,post-boarding,,train
669996,trajs_2017-05-23_Ut_3052_door_3,196,post-boarding,,train
669997,trajs_2017-05-23_Ut_3052_door_3,197,post-boarding,,train
669998,trajs_2017-05-23_Ut_3052_door_3,198,post-boarding,,train


In [27]:
labels_df.to_csv(path_data + '/labels_split_complete.csv', index=False)