# Create training and validation sets

When we label images, the images, masks, and object coordinates were saved in the `images`, `masks`, and `coordinates` directories.

In this Notebook, we want to create training and validation sets.

The train and validation sets will be saved within the `dataset` folder of the project.

It will create the following subdirectories
* `train_images`
* `train_masks`
* `train_coordinates`
* `val_images`
* `val_masks`
* `val_coordinates`


We will use the validation folders to estimate our model's accuracy on data it has not seen during training.

In [1]:
!pip install albumentations==1.3.0
!git clone https://github.com/kevin-allen/unetTracker
!pip install -r unetTracker/requirements.txt
!pip install -e unetTracker

Collecting albumentations==1.3.0
  Downloading albumentations-1.3.0-py3-none-any.whl (123 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m123.5/123.5 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: albumentations
  Attempting uninstall: albumentations
    Found existing installation: albumentations 1.3.1
    Uninstalling albumentations-1.3.1:
      Successfully uninstalled albumentations-1.3.1
Successfully installed albumentations-1.3.0
Cloning into 'unetTracker'...
remote: Enumerating objects: 965, done.[K
remote: Counting objects: 100% (344/344), done.[K
remote: Compressing objects: 100% (146/146), done.[K
remote: Total 965 (delta 207), reused 327 (delta 196), pack-reused 621[K
Receiving objects: 100% (965/965), 124.39 MiB | 38.15 MiB/s, done.
Resolving deltas: 100% (597/597), done.
Collecting jupyterlab (from -r unetTracker/requirements.txt (line 2))
  Downloading jupyterlab-4.0.9-py3-none-any.whl (9.2 MB)
[2K     [90m━━━━━━━━━━

Obtaining file:///content/unetTracker
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: unetTracker
  Building editable for unetTracker (pyproject.toml) ... [?25l[?25hdone
  Created wheel for unetTracker: filename=unetTracker-0.0.1-0.editable-py3-none-any.whl size=16048 sha256=3fca928b3b245627ec2e96645d12744fd8aa2cad2698580ddb2892c82b717631
  Stored in directory: /tmp/pip-ephem-wheel-cache-qrwd5xn4/wheels/62/9b/5a/0cb547490a9187d698861d98e1e803c5e64f31a9d899a8e84c
Successfully built unetTracker
Installing collected packages: unetTracker
Successfully installed unetTracker-0.0.1


In [1]:
from google.colab import drive
import os

drive.mount('/content/drive')

fn = "/content/drive/My Drive/dsfolder"
if os.path.exists(fn):
  print("We can access the dsfolder directory.")
else:
  raise IOError("Problem accessing the dsfolder directory.")

Mounted at /content/drive
We can access the dsfolder directory.


In [2]:
# this will run the code in the setup_project.py and create a variable called `project`
fn = "/content/drive/My Drive/dsfolder/setup_project.py"
if os.path.exists(fn):
  print("We can access the file.")
else:
  raise IOError("Problem accessing the file.")

%run "/content/drive/My Drive/dsfolder/setup_project.py"

We can access the file.
Project directory: /content/drive/My Drive/dsfolder/trackingProjects/finger_tracker
Getting configuration from config file. Values from config file will be used.
Loading /content/drive/My Drive/dsfolder/trackingProjects/finger_tracker/config.yalm
{'augmentation_HorizontalFlipProb': 0.5, 'augmentation_RandomBrightnessContrastProb': 0.2, 'augmentation_RandomSizedCropProb': 1.0, 'augmentation_RotateProb': 0.3, 'image_extension': '.png', 'image_size': [270, 480], 'labeling_ImageEnlargeFactor': 2.0, 'name': 'finger_tracker', 'normalization_values': {'means': [0.40664952993392944, 0.4527093172073364, 0.5142642259597778], 'stds': [0.2394399791955948, 0.2509937286376953, 0.26815035939216614]}, 'object_colors': [(0.0, 0.0, 255.0), (255.0, 0.0, 0.0), (255.0, 255.0, 0.0), (240.0, 255.0, 255.0)], 'objects': ['f1', 'f2', 'f3', 'f4'], 'target_radius': 6, 'unet_features': [64, 128, 256, 512]}


In [3]:
from unetTracker.dataset import UNetDataset

In [4]:
dataset = UNetDataset(image_dir=project.image_dir, mask_dir=project.mask_dir, coordinate_dir=project.coordinate_dir,image_extension=project.image_extension)

In [5]:
dataset.create_training_validation_dataset(train_image_dir = os.path.join(project.dataset_dir,"train_images"),
                                           train_mask_dir =  os.path.join(project.dataset_dir,"train_masks"),
                                           train_coordinate_dir = os.path.join(project.dataset_dir,"train_coordinates"),

                                           val_image_dir = os.path.join(project.dataset_dir,"val_images"),
                                           val_mask_dir =  os.path.join(project.dataset_dir,"val_masks"),
                                           val_coordinate_dir = os.path.join(project.dataset_dir,"val_coordinates"),

                                           test_ratio=0.10) # ratio of images assigned to the validation dataset, the rest goes to the training set.

Number of item in dataset: 185
Length of training set: 167
Length of validation set: 18
Actual test ratio: 0.097
Copying files to training and validation directories


We can use the same `UNetDataset` class to represent our training and validation datasets. They will pick images from different directories.

In [6]:
train_image_dir = os.path.join(project.dataset_dir,"train_images")
train_mask_dir =  os.path.join(project.dataset_dir,"train_masks")
train_coordinate_dir = os.path.join(project.dataset_dir,"train_coordinates")
trainDataset = UNetDataset(image_dir=train_image_dir,
                           mask_dir=train_mask_dir,
                           coordinate_dir=train_coordinate_dir,
                           image_extension=project.image_extension)
len(trainDataset)

167

In [7]:
val_image_dir = os.path.join(project.dataset_dir,"val_images")
val_mask_dir =  os.path.join(project.dataset_dir,"val_masks")
val_coordinate_dir = os.path.join(project.dataset_dir,"val_coordinates")
valDataset = UNetDataset(image_dir=val_image_dir,
                         mask_dir=val_mask_dir,
                         coordinate_dir=val_coordinate_dir,
                         image_extension=project.image_extension)
len(valDataset)

18

We now have a training and validation datasets.