Image segmentation
==================

In this notebook we illustrate how to use the script `scripts/image_parsing/main_raw_to_clips.py` to segment (i.e. extract) clips containing a single organisms from large-pane images containing multiple organisms. 

We now need to declare the parameters to tell the script where to find the files and where to save its outputs. In this notebook, we pass these arguments as a dictionary to Python, rather than variables in a shell (`.sh`) or batch (`.bat`) script, as in workflow mode. 

The cell below contains paths to the example dataset. Alternatively you can change the file paths to the locations of folders of your own dataset in the `arguments = {}` block; `ROOT_DIR` is where the repository is located. 

In [8]:
import yaml
from pathlib import Path

ROOT_DIR = Path("D:\mzb-workflow") #Path("/home/jovyan/work/mzb-workflow")

arguments = {
    "input_dir": ROOT_DIR / "data/mzb_example_data/raw_img", 
    "output_dir": ROOT_DIR / "data/mzb_example_data/derived/blobs/", 
    "save_full_mask_dir": ROOT_DIR / "data/mzb_example_data/derived/full_image_masks/", 
    "config_file": ROOT_DIR / "configs/mzb_example_config.yaml", 
    "verbose": True
}
    
with open(str(arguments["config_file"]), "r") as f:
    cfg = yaml.load(f, Loader=yaml.FullLoader)

cfg["trcl_gpu_ids"] = None # this sets the number of available GPUs to zero, since this script doesn't benefit from GPU compute. 

Now we use the custom function `cfg_to_arguments` to parse the parameters we have just supplied and the parameters in the configuration file: 

In [9]:
from mzbsuite.utils import cfg_to_arguments

args = cfg_to_arguments(arguments)
cfg = cfg_to_arguments(cfg)
print(str(cfg))

{'glob_random_seed': 222, 'glob_root_folder': '/home/jovyan/work/mzb-workflow/', 'glob_blobs_folder': '/home/jovyan/work/mzb-workflow/data/derived/blobs/', 'glob_local_format': 'pdf', 'model_logger': 'wandb', 'impa_image_format': 'jpg', 'impa_clip_areas': [2700, 4700, -1, -1], 'impa_area_threshold': 5000, 'impa_gaussian_blur': [21, 21], 'impa_gaussian_blur_passes': 3, 'impa_adaptive_threshold_block_size': 351, 'impa_mask_postprocess_kernel': [11, 11], 'impa_mask_postprocess_passes': 5, 'impa_bounding_box_buffer': 200, 'impa_save_clips_plus_features': True, 'lset_class_cut': 'order', 'lset_val_size': 0.1, 'trcl_learning_rate': 0.0001, 'trcl_batch_size': 8, 'trcl_weight_decay': 0, 'trcl_step_size_decay': 5, 'trcl_number_epochs': 75, 'trcl_save_topk': 1, 'trcl_num_classes': 8, 'trcl_model_pretrarch': 'convnext-small', 'trcl_num_workers': 16, 'trcl_wandb_project_name': 'mzb-classifiers', 'trcl_logger': 'wandb', 'trsk_learning_rate': 0.001, 'trsk_batch_size': 32, 'trsk_weight_decay': 0, 'tr

Specifically, if there is a reference scale in the same place in all of the images (as is the case for the example data), you can earmark this area in `impa_clip_areas` for exclusion in later processing. 

> üìù NOTE: this parameter is defined as a list, whereby [x1, y1, x2, y2] are the four pixel coordinates of the top left [x1, y1] and bottom-right [x2, y2] corners; -1 mean until the edge of the image. 

In [10]:
if cfg.impa_clip_areas is not None:
    location_cutout = [int(a) for a in cfg.impa_clip_areas]
print(location_cutout)

[2700, 4700, -1, -1]


Below we load the main function that processes the images into clips, and will also produce figures. It will also save a `.csv` file with information about each image and clips generated as well as other information such as bounding box coordinates, pixel areas of the mask, etc in `output_dir`. 

For further details about the logic of this script please refer to the explanation in the section [`Segmentation`](https://mzb-workflow.readthedocs.io/en/latest/files/scripts/processing_scripts.html#segmentation) under [`Processing scripts`](https://mzb-workflow.readthedocs.io/en/latest/files/scripts/processing_scripts.html) in the documentation. 

In [11]:
from scripts.image_parsing.main_raw_to_clips import main as segmentation
?segmentation

[1;31mSignature:[0m [0msegmentation[0m[1;33m([0m[0margs[0m[1;33m,[0m [0mcfg[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
This script takes a folder of raw images and clips them into smaller images, with their mask.

Parameters
----------
args : argparse.Namespace
    Arguments passed to the script. Namely:

        - input_dir: path to directory with raw images
        - output_dir: path to directory where to clip images
        - save_full_mask_dir: path to directory where to save labeled full masks
        - v (verbose): print more info
        - config_file: path to config file with per-script args

cfg : argparse.Namespace
    Configuration with detailed parametrisations.

Returns
-------
None. Everything is saved to disk.
[1;31mFile:[0m      d:\mzb-workflow\scripts\image_parsing\main_raw_to_clips.py
[1;31mType:[0m      function

Now we can run the function with the provided parameters, and check the output produced. 

> ‚ö†Ô∏è WARNING: depending on the number of images and how many organisms are present, the processing time can be considerable. 

In [None]:
segmentation(args, cfg)