# YOLO Dataset Image Creation

The role of this notebook is to create images for the YOLO dataset.
The YOLO model is the model that is used to predict a bounding box around worm's head.
In order for the model to make successful detections, the model must be trained on example images.
In this notebook, we create such example images. As input, we expect images containing the entire arena, and the resulting output images are images of some *pre-defined fixed* size, in which the worm is visible.
These output images are obtained by cropping the input images around the worm which was detected within it. Each such cropped image is referred to as sample.
It's important to note, that we create as many samples as we would like, and we do not have to extract a sample image from every input image. 

The proper function of this notebook relies on the following assumptions, and would not function correctly if they do not hold:
1.  In the original footage of the experiment, the background is stationary and always within the field of view. 
    That is, the position of the camera that captured the experiment footage is constant with regards to the arena.
2.  At each frame, a single instance of the worm is visible, no less and no more.

To detect the worm positions within the input images, the following process is performed:
1.  The background is calculated by sampling large amount of raw input images, and calculating pixelwise median among them.
2.  Given some input image, non-background objects are calculated by subtracting the background from the image, and marking the regions where the difference is greater than some pre-defined threshold.
3.  Among non-background objects detected in the input image, the biggest object is found and treated as the worm.


In [None]:
import cv2 as cv
from utils.frame_reader import *
from dataset.box_calculator import *
from dataset.sample_extractor import *
from utils.gui_utils import UserPrompt

### Input and output definitions

In [None]:
################################ User Input ################################

# select the folder containing the original frame images 
src_folder = None

# select the folder into which the extracted samples will be saved to
output_folder = None

############################################################################

if src_folder is None:
    src_folder = UserPrompt.open_directory("Select the folder containing the original frame images")

if output_folder is None:
    output_folder = UserPrompt.open_directory("Select the folder into which the extracted samples will be saved to")

print(f"original frame images folder: {src_folder}")
print(f"output folder: {output_folder}")

In [None]:
# read all files from the source folder
src_frames = FrameReader.create_from_directory(src_folder, read_format=cv.IMREAD_GRAYSCALE)

### Foreground object detection parameters

In [None]:
################################ User Input ################################

# define the parameters by which objects are detected

box_calc = BoxCalculator(
    bg_probes=1000,  # number of images to use to calculate the background
    diff_thresh=10,  # threshold for the difference between the background and the current frame to detect non-background objects
    frame_reader=src_frames,
)

############################################################################

# create sample extractor which uses the BoxCalculator to extract samples
sample_extractor = SampleExtractor(box_calc)

### Extraction of samples from the input images

Note, that the following step might take a while, especially if the count of extracted samples is large.

In [None]:
################################ User Input ################################

# extract samples from the original images

sample_extractor.create_samples(
    count=300,  # number of samples to extract
    target_size=(384, 384),  # size of the extracted samples
    name_format="img_{:09d}.png",  # naming format of the extracted samples
    num_workers=None,  # multiprocessing related, read doc for more info
    chunk_size=50,  # multiprocessing related, read doc for more info
    save_folder=output_folder,
)

############################################################################