In [1]:
import os, sys
import numpy as np
import glob
import random
import cv2

### Dataset
I use the set of cropped images instead of the original ones, since they are object centric and resembles images from classification datasets.
LASOT has several hundreds of classes, and I randomly select 10 classes (similar to CIFAR and STL-10).

In [2]:
path_dataset = '/home/hyunjoon/dataset/lasot/crop/'
# classes = os.listdir(path_dataset)
classes = ['airplane', 'bicycle', 'boat', 'car', 'cat', 'dog', 'horse', 'motorcycle', 'train', 'truck']

In [3]:
# names of resulting annotation files
fn_train = 'train_lasot_10class.txt'
fn_val = 'val_lasot_10class.txt'

Let's check how many images we have for our selected 10 classes.

In [4]:
# show total number of images as well as per-class numbers
total = 0
for cname in classes:
    N = len(glob.glob(os.path.join(path_dataset, cname) + '/**/*.jpg'))
    if N > 0:
        print('{}: {}'.format(N, cname))
    total += N
total

28565: airplane
13364: bicycle
16568: boat
17981: car
16005: cat
12806: dog
18284: horse
14416: motorcycle
15482: train
23924: truck


177395

### How to sample images for training and validation
LASOT is a dataset for single object tracking, and is made up of short video clips with various object categories.
Each object category has several video clips, each of them traking a single object instance.
In LASOT, each video has hundreds to thousands frames of an object, possibly with lots of near duplications.

To make a biased dataset containing near-duplicated images, I use the following approach:
For a few classes, densely sample lots of frames from each video, and for the other classes, sparsely sample frames from each video.
When sampling video frames, I use the first 70% of frames for training and last 15% for validation. 
15% of frames in the middle are not used to reduce [data leakage](https://machinelearningmastery.com/data-leakage-machine-learning/).

The resulting dataset will be biased to densely sampled classes, with near-duplications (especially within densely sampled frames).

In [5]:
fh_train = open(fn_train, 'w')
fh_val = open(fn_val, 'w')

# for each class, 
for cid, cname in enumerate(classes):
    dirs = os.listdir(os.path.join(path_dataset, cname))
    train_dirs = dirs[:]

    for tdir in train_dirs:
        fn_list = glob.glob(os.path.join(path_dataset, cname, tdir) + '/*.jpg')
        fn_list = np.sort(fn_list)
        
        # use first 70% of frames for training
        p7 = int(len(fn_list) * 0.7)
        # use last 15% of frames for validation
        p8 = int(len(fn_list) * 0.85)
        
        # we sample densely for `airplane` and `truck`
        if cname in ('airplane', 'truck'):
            train_list = fn_list[:p7]
        else:
            train_list = fn_list[:p7:20]
        annot_str = '\n'.join(['{} {}'.format(fn.replace(path_dataset, ''), cid) for fn in train_list])
        fh_train.write(annot_str)
        fh_train.write('\n')
        
        # no dense sampling for validation, sample an image for each ten
        val_list = fn_list[p8::10]
        annot_str = '\n'.join(['{} {}'.format(fn.replace(path_dataset, ''), cid) for fn in val_list])
        fh_val.write(annot_str)
        fh_val.write('\n')
            
fh_train.close()
fh_val.close()

After everything's done, move the created annotation files to the actual dataset directory.