# Instructions

Your `data/labels.csv` file needs to be in the format `video_name, video_frame_filename_with_timestamp, label, split` to use this package to train models. 

It is likely that your labels start in one of the following 2 formats:
1. `video_name, video_frame_filename_with_timestamp, label` format
2. `video_name, timestamp_start, timestamp_end, label`

If your data is in format 2, then run the helper notebook in `notebooks/helper_convert_timestamps_file_to_labels.ipynb` to convert it to format 1.

Once your labels file is in format 1, use this notebook to add the `split` column that allocates videos to train/test/validation splits by specifying a list of video names for each split. 

Note, this package assumes train/valid/test videos are split at the video level. If you have one very long video, you could cut it up into several smaller videos first...

# Setup

In [1]:
import os
import pandas as pd
import numpy as np
import json

import cv2
from time import time as timer
import sys
sys.path.append('..')

In [2]:
# setup paths
pwd = os.getcwd().replace("notebooks","")
path_cache = pwd + 'cache/'
path_data = pwd + 'data_cnn_ts_3d/'

In [3]:
# read vid paths
path_videos = []
for filename in os.listdir(path_data):
    if os.path.isdir(os.path.join(path_data, filename)):
        path_videos.append(filename)

path_videos = [path_data + v + '/' for v in path_videos if v != '.DS_Store']

In [23]:
def add_splits_to_labels_file(vids_valid, vids_test):
    """
    Helper function to add splits to your labels file 

    If your labels file has the columns "video","filename","label", 
    you can use this function to add train/valid/test splits by specifying
    as lists of video names which videos should be valid and which should be test

    Will overwrite labels file on disk

    Sample usage
    """
    # e.g. 
    # * vids_valid = ['vid_a', 'vid_b', 'vid_c']
    # * vids_test = ['vid_y', 'vid_z']
    # all the rest will be train

    labels = pd.read_csv(path_data + 'labels.csv', usecols=['video','frame','label'])

    def allocate_set(vid):
        if vid in vids_valid:
            return "valid"
        elif vid in vids_test:
            return "test"
        else: 
            return "train"

    # apply split
    labels['split'] = labels['video'].apply(lambda x: allocate_set(x))

    # sort 
    labels.sort_values(["video","frame"], inplace=True)

    # output as csv
    labels.to_csv(path_data + 'labels.csv', index=False)
    
    print(f"Done saving new labels file with splits to {path_data}/labels.csv")

# Use function to add splits to labels file

> Define the video names for validation and test set

In [9]:
os.path.isdir(f'{path_data}/labels.csv')

False

In [10]:
dirlist = [filename for filename in os.listdir(path_data) if os.path.isdir(f'{path_data}/{filename}')]
dirlist

['trajs_2017-06-03_Ut_3030_door_4',
 'trajs_2018-05-08_Ut_3038_door_4',
 'trajs_2018-02-06_Ut_3048_door_4',
 'trajs_2018-02-09_Ut_830_door_3',
 'trajs_2018-05-16_Ut_3040_door_3',
 'trajs_2017-12-16_Ut_700852_door_3',
 'trajs_2017-03-08_Ut_3064_door_3',
 'trajs_2017-06-14_Ut_3066_door_3',
 'trajs_2018-05-21_Ut_3072_door_4',
 'trajs_2017-09-28_Ut_3126_door_3']

In [19]:
vids_valid = dirlist[:2]
vids_test = dirlist[9:]

In [20]:
vids_valid

['trajs_2017-06-03_Ut_3030_door_4', 'trajs_2018-05-08_Ut_3038_door_4']

In [21]:
vids_test

['trajs_2017-09-28_Ut_3126_door_3']

In [22]:
add_splits_to_labels_file(vids_valid, vids_test)

Done saving new labels file with splits to /data/labels.csv
