# Create training and validation sets

When we label images, the images, masks, and object coordinates were saved in the `images`, `masks`, and `coordinates` directories. 

In this Notebook, we want to create training and validation sets. 

The train and validation sets will be saved within the `dataset` folder of the project. 

It will create the following subdirectories
* `train_images`
* `train_masks`
* `train_coordinates`
* `val_images`
* `val_masks`
* `val_coordinates`


We will use the validation folders to estimate our model's accuracy on data it has not seen during training.

In [1]:
# this will run the code in the setup_project.py and create a variable called `project`
%run setup_project.py

from unetTracker.dataset import UNetDataset

Project directory: /home/kevin/Documents/trackingProjects/finger_tracker
Getting configuration from config file. Values from config file will be used.
Loading /home/kevin/Documents/trackingProjects/finger_tracker/config.yalm
{'augmentation_HorizontalFlipProb': 0.0, 'augmentation_RandomBrightnessContrastProb': 0.2, 'augmentation_RandomSizedCropProb': 1.0, 'augmentation_RotateProb': 0.3, 'image_extension': '.png', 'image_size': [270, 480], 'labeling_ImageEnlargeFactor': 2.0, 'name': 'finger_tracker', 'normalization_values': {'means': [0.4079657196998596, 0.4543980062007904, 0.5158050656318665], 'stds': [0.23991422355175018, 0.25161123275756836, 0.26905474066734314]}, 'object_colors': [(0.0, 0.0, 255.0), (255.0, 0.0, 0.0), (255.0, 255.0, 0.0), (240.0, 255.0, 255.0)], 'objects': ['f1', 'f2', 'f3', 'f4'], 'target_radius': 6, 'unet_features': [64, 128, 256, 512]}


In [2]:
dataset = UNetDataset(image_dir=project.image_dir, mask_dir=project.mask_dir, coordinate_dir=project.coordinate_dir,image_extension=project.image_extension)

In [3]:
dataset.create_training_validation_dataset(train_image_dir = os.path.join(project.dataset_dir,"train_images"),
                                           train_mask_dir =  os.path.join(project.dataset_dir,"train_masks"),
                                           train_coordinate_dir = os.path.join(project.dataset_dir,"train_coordinates"),
                                           
                                           val_image_dir = os.path.join(project.dataset_dir,"val_images"),
                                           val_mask_dir =  os.path.join(project.dataset_dir,"val_masks"),
                                           val_coordinate_dir = os.path.join(project.dataset_dir,"val_coordinates"),
                                           
                                           test_ratio=0.10) # ratio of images assigned to the validation dataset, the rest goes to the training set.

Number of item in dataset: 210
Length of training set: 191
Length of validation set: 19
Actual test ratio: 0.090
Copying files to training and validation directories


We can use the same `UNetDataset` class to represent our training and validation datasets. They will pick images from different directories. 

In [6]:
train_image_dir = os.path.join(project.dataset_dir,"train_images")
train_mask_dir =  os.path.join(project.dataset_dir,"train_masks")
train_coordinate_dir = os.path.join(project.dataset_dir,"train_coordinates")
trainDataset = UNetDataset(image_dir=train_image_dir,
                           mask_dir=train_mask_dir,
                           coordinate_dir=train_coordinate_dir,
                           image_extension=project.image_extension)
len(trainDataset)

191

In [7]:
val_image_dir = os.path.join(project.dataset_dir,"val_images")
val_mask_dir =  os.path.join(project.dataset_dir,"val_masks")
val_coordinate_dir = os.path.join(project.dataset_dir,"val_coordinates")
valDataset = UNetDataset(image_dir=val_image_dir,
                         mask_dir=val_mask_dir,
                         coordinate_dir=val_coordinate_dir,
                         image_extension=project.image_extension)
len(valDataset)

19

We now have a training and validation datasets.