# Instructions

Your `data/labels.csv` file needs to be in the format `video_name, video_frame_filename_with_timestamp, label, split` to use this package to train models. 

It is likely that your labels start in one of the following 2 formats:
1. `video_name, video_frame_filename_with_timestamp, label` format
2. `video_name, timestamp_start, timestamp_end, label`

If your data is in format 2, then run the helper notebook in `notebooks/helper_convert_timestamps_file_to_labels.ipynb` to convert it to format 1.

Once your labels file is in format 1, use this notebook to add the `split` column that allocates videos to train/test/validation splits by specifying a list of video names for each split. 

Note, this package assumes train/valid/test videos are split at the video level. If you have one very long video, you could cut it up into several smaller videos first...

# Setup

In [1]:
import os
import pandas as pd
import numpy as np
import json

import cv2
from time import time as timer
import sys
sys.path.append('..')

In [19]:
# setup paths
pwd = os.getcwd().replace("notebooks","")
path_cache = pwd + 'cache/'
path_data = '/media/tiesbarendse/DATA/be_ts/' #pwd + 'data_cnn_ts_3d/'

In [8]:
# read vid paths
path_videos = []
for filename in os.listdir(path_data):
    if os.path.isdir(os.path.join(path_data, filename)):
        path_videos.append(filename)

path_videos = [path_data + v + '/' for v in path_videos if v != '.DS_Store']

In [9]:
def add_splits_to_labels_file(vids_valid, vids_test):
    """
    Helper function to add splits to your labels file 

    If your labels file has the columns "video","filename","label", 
    you can use this function to add train/valid/test splits by specifying
    as lists of video names which videos should be valid and which should be test

    Will overwrite labels file on disk

    Sample usage
    """
    # e.g. 
    # * vids_valid = ['vid_a', 'vid_b', 'vid_c']
    # * vids_test = ['vid_y', 'vid_z']
    # all the rest will be train

    labels = pd.read_csv(path_data + 'labels.csv', usecols=['video','frame','label'])

    def allocate_set(vid):
        if vid in vids_valid:
            return "valid"
        elif vid in vids_test:
            return "test"
        else: 
            return "train"

    # apply split
    labels['split'] = labels['video'].apply(lambda x: allocate_set(x))

    # sort 
    labels.sort_values(["video","frame"], inplace=True)

    # output as csv
    labels.to_csv(path_data + 'labels.csv', index=False)
    
    print(f"Done saving new labels file with splits to {path_data}/labels.csv")

# Use function to add splits to labels file

> Define the video names for validation and test set

In [10]:
dirlist = [filename for filename in os.listdir(path_data) if os.path.isdir(f'{path_data}/{filename}')]
dirlist

['trajs_2017-03-01_Ut_3040_door_4',
 'trajs_2017-03-03_Ut_3128_door_4',
 'trajs_2017-03-03_Ut_3132_door_4',
 'trajs_2017-03-05_Ut_3054_door_4',
 'trajs_2017-03-06_Ut_3044_door_4',
 'trajs_2017-03-06_Ut_3124_door_3',
 'trajs_2017-03-06_Ut_3166_door_4',
 'trajs_2017-03-07_Ut_3142_door_3',
 'trajs_2017-03-07_Ut_814_door_3',
 'trajs_2017-03-09_Ut_3052_door_4',
 'trajs_2017-03-09_Ut_3062_door_3',
 'trajs_2017-03-12_Ut_3028_door_3',
 'trajs_2017-03-14_Ut_3156_door_3',
 'trajs_2017-03-16_Ut_3078_door_4',
 'trajs_2017-03-16_Ut_3130_door_4',
 'trajs_2017-03-18_Ut_3150_door_3',
 'trajs_2017-03-20_Ut_3052_door_3',
 'trajs_2017-03-21_Ut_3150_door_3',
 'trajs_2017-03-24_Ut_3156_door_4',
 'trajs_2017-03-29_Ut_3024_door_3',
 'trajs_2017-04-01_Ut_3062_door_3',
 'trajs_2017-04-02_Ut_3050_door_3',
 'trajs_2017-04-04_Ut_3048_door_3',
 'trajs_2017-04-07_Ut_3030_door_4',
 'trajs_2017-04-08_Ut_3024_door_3',
 'trajs_2017-04-08_Ut_3168_door_3',
 'trajs_2017-04-09_Ut_3142_door_4',
 'trajs_2017-04-29_Ut_822_doo

In [15]:
vids_valid = dirlist[:10]
vids_test = dirlist[-10:]

In [16]:
len(vids_test)

10

In [17]:
vids_test

['trajs_2017-05-10_Ut_3064_door_3',
 'trajs_2017-05-13_Ut_3162_door_4',
 'trajs_2017-05-15_Ut_3036_door_3',
 'trajs_2017-05-15_Ut_3048_door_4',
 'trajs_2017-05-17_Ut_3066_door_4',
 'trajs_2017-05-22_Ut_3074_door_4',
 'trajs_2017-05-26_Ut_3034_door_4',
 'trajs_2017-05-26_Ut_3044_door_4',
 'trajs_2017-05-31_Ut_3056_door_3',
 'trajs_2017-05-31_Ut_3116_door_4']

In [20]:
add_splits_to_labels_file(vids_valid, vids_test)

Done saving new labels file with splits to /media/tiesbarendse/DATA/be_ts//labels.csv
