In [None]:
import cv2
import h5py
from pathlib import Path
import random
import os
import sys

# F-RCNN Dataset Generation
## PIV Acquisitions

## Importing the function

In [None]:
sys.path.append("..")  # TODO: Fix it!
from dataset.create_dataset import create_dataset

## Defining the path

A `.h5` file containing a F-RCNN example is provided in this example (`PIV_annotation_files/PIV_dataset.h5`).

The following F-RCNN dataset was created using the image annotation tool available [here](https://github.com/sinmec/multilabellerg).

The current implementation can read multiple `.h5` files. This is done by simply placing the dataset files in the same `h5_dataset_path` folder.

The images are extracted to the `output_path` folder.

In [None]:
h5_dataset_path = Path('PIV_annotation_files')
output_path = Path(r"example_dataset_FRCNN_PIV")

## Defining parameters

In [None]:
# Number of Validation images
N_VALIDATION = 2

# Number of Verification images
N_VERIFICATION = 2

For the dataset generation, two options are provided.

In the first one, the dataset is composed of the annotated contours and the original images, i.e., the F-RCNN object detection is based on the original image information. This option may work well if the image is not complex and the objects are easily distinguished.
For this option, simply define the variable  `UNET_model_options = None`, as shown (and commented) below.

In [None]:
# UNET_model_options = None

In some cases, for instance, when dealing with [bubbly flow PIV acqusitions](https://www.sciencedirect.com/science/article/pii/S0009250918303269), the objects are not readily visible in the original image.
In this case, the F-RCNN object detection is based on an intermediate representation of the original image. Here, we apply a U-Net model to segment the image and highlight the bubble positions. For this option, the `UNET_model_options` should be configured.
It is important to note that, to use this option, it is required to train an appropriate U-Net model, following the guidelines detailed in this repository.

In [None]:
UNET_model_options = {'keras_file': Path('PIV_annotation_files', 'UNET_model.keras'),
                      'window_size': 512,
                      'sub_image_size': 128,
                      'stride_division': 16}                      

## Generating the dataset

The dataset is generated from the `create_dataset` function.

In [None]:
create_dataset(h5_dataset_path, output_path, N_VALIDATION, N_VERIFICATION, UNET_model_options=UNET_model_options)

By running this function, you'll see that it created three new folders on the chosen path:
 - Training: Samples used during the F-RCNN training
 - Validation: Samples used for validation purpouses during training - Total of `N_VALIDATION` full images
 - Verification: Samples used to evaluate the F-RCNN accuracy after the training step. Unseen data during training  - Total of `N_VERIFICATION` full images



## Option 1
**If the `UNET_model_options` <span style="color:red">is not</span> defined**, the F-RCNN object detection is based on the original image. 
In this case, folder cotains 3 folders:
 - `contours`: Text files which contains coordinates and parameters of the contours
 - `debug`: All the labelled images from the `.h5` file
 - `images`: All the raw images from the `.h5` file

## Option 2
**If the `UNET_model_options` <span style="color:red">is </span> defined**, the F-RCNN object detection is based on the U-Net (segmented) representation of the image. 
In this case, folder cotains 4 folders:
 - `contours`: Text files which contains coordinates and parameters of the contours
 - `debug`: All the labelled images from the `.h5` file
 - `images`: All the raw images from the `.h5` file
 - `masks`: U-Net segmented images from images originally labeled in the `.h5` file

In both options, the folder created in this step should be sent to the `FRCNN/dataset` folder for training.