The purpose of this notebook is to demo the construction of the DeepFish dataset and the YOLO preprocessed dataset and ensure that the data is deterministically shuffled.

1) DeepFish Dataset

In [1]:
from pathlib import Path
import os
os.chdir(str(Path.cwd().parent))
from src.dataset_tools.dataset_constructor import DeepFishConstructor
from src.dataset_tools.data_setup import YOLODataSetup
from pathlib import Path
import pandas as pd

In [2]:
constructor = DeepFishConstructor(dataset_dir='src/datasets')

In [None]:
constructor.construct()

Now another DeepFish directory will be created so that deterministic shuffling can be tested. For that, the original dataset downloaded from kaggle will have to be deleted so that it can be smoothly redownloaded without any issues. Additionally, the existing DeepFish directory must be renamed.

In [None]:
constructor2 = DeepFishConstructor(dataset_dir='src/datasets')
constructor2.construct()

In [None]:
train1 = pd.read_csv('src/datasets/DeepFish/train.csv')
train2 = pd.read_csv('src/datasets/DeepFish1/train.csv')
train1.equals(train2)

In [None]:
val1 = pd.read_csv('src/datasets/DeepFish/val.csv')
val2 = pd.read_csv('src/datasets/DeepFish1/val.csv')
val1.equals(val2)

In [None]:
test1 = pd.read_csv('src/datasets/DeepFish/test.csv')
test2 = pd.read_csv('src/datasets/DeepFish1/test.csv')
test1.equals(test2)

If the above cells return True, then the dataset can be constructed, and the train-test-val split is shuffled deterministically.

2) Preprocessed YOLO Dataset

In [None]:
yolo_ds = YOLODataSetup({'image_path': 'src/datasets/DeepFish/images',
                         'label_path': 'src/datasets/DeepFish/labels',
                         'train_csv': 'src/datasets/DeepFish/train.csv',
                         'val_csv': 'src/datasets/DeepFish/val.csv',
                         'test_csv': 'src/datasets/DeepFish/test.csv',
                         'num_classes': 1,
                         'class_names': ['fish']
                         })

In [None]:
yolo_ds.run()

Now we must determine if each image has a labels file associated with it.

In [None]:
img_list = os.listdir('src/datasets/AL_Train/labeled/images')
img_list = [Path(f).stem for f in img_list]
lbl_list = os.listdir('src/datasets/AL_Train/labeled/labels')
lbl_list = [Path(f).stem for f in lbl_list]
img_list == lbl_list

Now another directory will be created so that deterministic shuffling can be tested. For that, the previously downloaded directory must be renamed.

In [None]:
yolo_ds2 = YOLODataSetup({'image_path': 'src/datasets/DeepFish/images',
                         'label_path': 'src/datasets/DeepFish/labels',
                         'train_csv': 'src/datasets/DeepFish/train.csv',
                         'val_csv': 'src/datasets/DeepFish/val.csv',
                         'test_csv': 'src/datasets/DeepFish/test.csv',
                         'num_classes': 1,
                         'class_names': ['fish']
                         })
yolo_ds2.run()

In [None]:
os.listdir('src/datasets/AL_Train/labeled/images') == os.listdir('src/datasets/AL_Train1/labeled/images')

In [None]:
os.listdir('src/datasets/AL_Train/labeled/labels') == os.listdir('src/datasets/AL_Train1/labeled/labels')

In [None]:
os.listdir('src/datasets/AL_Train/unlabeled/images') == os.listdir('src/datasets/AL_Train1/unlabeled/images')

In [None]:
os.listdir('src/datasets/AL_Train/unlabeled/labels') == os.listdir('src/datasets/AL_Train1/unlabeled/labels')

In [None]:
os.listdir('src/datasets/AL_Train/val/images') == os.listdir('src/datasets/AL_Train1/val/images')

In [None]:
os.listdir('src/datasets/AL_Train/val/labels') == os.listdir('src/datasets/AL_Train1/val/labels')

In [None]:
os.listdir('src/datasets/AL_Train/test/images') == os.listdir('src/datasets/AL_Train1/test/images')

In [None]:
os.listdir('src/datasets/AL_Train/test/labels') == os.listdir('src/datasets/AL_Train1/test/labels')

If the above cells are True, then the data has been preprocessed for YOLO format, and determinstic shuffling has been enabled.