# Additional Information
> This Notebook contains information on use of _deepflash2_.

## Required Data Structure and Naming

### Ground Truth Estimation 
- __One parent folder__
- __One folder per expert__
- __Identical names for segmentations__

_Examplary structure:_

* [folder] parent_folder
    * [folder] expert1
        * [file] mask1.png
        * [file] mask2.png
    * [folder] expert1
        * [file] mask1.png
        * [file] mask2.png

### Training

- __One folder for training images__
    - Images must have unique name or ID
    - _0001.tif --> name/ID: 0001; img_5.png --> name/ID: img_5, ..._ 
- __One folder for segmentation masks__
    - Corresponding masks must start with name or ID + a mask suffix__
        - _0001 -> 0001_mask.png (mask_suffix = "_mask.png")_
        - _0001 -> 0001.png (mask_suffix = ".png")_
        - mask suffix is inferred automatically 

_Examplary structure:_
* [folder] images
  * [file] 0001.tif
  * [file] 0002.tif
* [folder] masks
  * [file] 0001_mask.png
  * [file] 0002_mask.png

### Prediction

- __One folder for training images__
    - Images must have unique name or ID
        - _0001.tif --> name/ID: 0001; img_5.png --> name/ID: img_5, ..._ 
- __One folder containing trained models (ensemble)__
    - Ensemble folder and models will be created during Training__
        - Do not change the naming of the models
        - If you want to train different ensembles, simply rename the ensemble folder

_Examplary structure:_
* [folder] images
  * [file] 0001.tif
  * [file] 0002.tif
* [folder] ensemble
  * [file] unext50_deepflash2_model-1.pth
  * [file] unext50_deepflash2_model-2.pth
  



## Train-validation-split

The train-validation-split is defined as _[k-fold cross validation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html)_ with `n_splits`
- `n_splits` is the minimum of: (number of files in dataset,  `max_splits` (default:5))
- By default, the number of models per ensemble is limited to `n_splits`

_Example for a dataset containing 15 images_
- `model_1` is trained on 12 images (3 validation images) 
- `model_2` is trained on 12 images (3 different validation images) 
- ...
- `model_5` is trained on 12 images (3 different validation images) 

_Example for a dataset containing 2 images_
- `model_1` is trained on 1 image (1 validation image) 
- `model_2` is trained on 1 images (1 different validation image) 
- Only two models per ensemble

## Training Epochs and Iterations

To streamline the training process and allow an easier comparison across differently sized datasets, we decided to use the number of training _iterations_ instead of _epochs_ to define the lenght of a [training cycle](https://matjesg.github.io/deepflash2/utils.html#calc_iterations).

Some useful definitions (adapted from [stackoverflow](https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks)):
- Epoch: one training pass (forward pass and one backward pass) of all the training examples
- Batch size: the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
- Iteration: One forward pass and one backward pass using [batch size] number of examples.

_Example:_
Your dataset comprises 20 images and you want to train for 1000 iterations given a batch size of 4. The [algorithm](https://matjesg.github.io/deepflash2/utils.html#calc_iterations) calculates the minimum of epochs needed to train 1000 iterations):

$Epochs = \frac{iterations}{\frac{\#images}{batch size}} = \frac{1000}{\frac{20}{4}} = 200$

The number of epochs will be ceiled to the next integer.