Skeletonization unsupervised
========================

In this notebook we use the script `skeletons/main_unsupervised_skeleton_estimation.py` to automatically extract the length of the organisms from clips. No pre-trained model is necessary for this module (i.e. unsupervised). 

First, we need to set up some running parameters for the script to know where to find input images and where to write outputs. 

In [1]:
import yaml
from pathlib import Path

ROOT_DIR = Path("D:\mzb-workflow") #Path("/home/jovyan/work/mzb-workflow")

arguments = {
    "config_file": ROOT_DIR / "configs/mzb_example_config.yaml", 
    "input_dir": ROOT_DIR / "data/mzb_example_data/derived/blobs", 
    "errors": ROOT_DIR / "data/mzb_example_data/derived/classification", 
    "output_dir": ROOT_DIR / "results/bgb/skeletons/skeletons_unsupervised", 
    "save_masks": ROOT_DIR / "data/bgb/skeletons/skeletons_unsupervised", 
    "list_of_files": None
    }
    
with open(str(arguments["config_file"]), "r") as f:
    cfg = yaml.load(f, Loader=yaml.FullLoader)

cfg["trcl_gpu_ids"] = None # this sets the number of available GPUs to zero, since this script doesn't benefit from GPU compute. 

Convert to dictionary for Python script using the custom function `cfg_to_arguments`: 

In [2]:
from mzbsuite.utils import cfg_to_arguments

# Transforms configurations dicts to argparse arguments
args = cfg_to_arguments(arguments)
cfg = cfg_to_arguments(cfg)
print(str(cfg))

{'glob_random_seed': 222, 'glob_root_folder': '/home/jovyan/work/mzb-workflow/', 'glob_blobs_folder': '/home/jovyan/work/mzb-workflow/data/derived/blobs/', 'glob_local_format': 'pdf', 'model_logger': 'wandb', 'impa_image_format': 'jpg', 'impa_clip_areas': [2700, 4700, -1, -1], 'impa_area_threshold': 5000, 'impa_gaussian_blur': [21, 21], 'impa_gaussian_blur_passes': 3, 'impa_adaptive_threshold_block_size': 351, 'impa_mask_postprocess_kernel': [11, 11], 'impa_mask_postprocess_passes': 5, 'impa_bounding_box_buffer': 200, 'impa_save_clips_plus_features': True, 'lset_class_cut': 'order', 'lset_val_size': 0.1, 'trcl_learning_rate': 0.0001, 'trcl_batch_size': 8, 'trcl_weight_decay': 0, 'trcl_step_size_decay': 5, 'trcl_number_epochs': 75, 'trcl_save_topk': 1, 'trcl_num_classes': 8, 'trcl_model_pretrarch': 'convnext-small', 'trcl_num_workers': 16, 'trcl_wandb_project_name': 'mzb-classifiers', 'trcl_logger': 'wandb', 'trsk_learning_rate': 0.001, 'trsk_batch_size': 32, 'trsk_weight_decay': 0, 'tr

In [3]:
from scripts.skeletons.main_unsupervised_skeleton_estimation import main as unsupervised_skeletonization
?unsupervised_skeletonization

[1;31mSignature:[0m [0munsupervised_skeletonization[0m[1;33m([0m[0margs[0m[1;33m,[0m [0mcfg[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Main function for skeleton estimation (body size) in the unsupervised setting.

Parameters
----------
args : argparse.Namespace
    Arguments parsed from command line. Namely:
    
        - config_file: path to the configuration file
        - input_dir: path to the directory containing the masks
        - output_dir: path to the directory where to save the results
        - save_masks: path to the directory where to save the masks as jpg
        - list_of_files: path to the csv file containing the classification predictions
        - v (verbose): whether to print more info
        
cfg : argparse.Namespace
    Arguments parsed from the configuration file.

Returns
-------
None. All is saved to disk at specified locations.
[1;31mFile:[0m      d:\mzb-workflow\scripts\skeletons\main_unsupervised_skeleton_estimation.py
[

Load in clips, excluding those predicted to be `error` by the DL model. 

> üêõ BUG: the path to the folder with the clips classified as error is currently hardcoded in the script! 

In a nutshell, it uses the configuration parameters provided before to apply a series of morphological operations on the binary mask of each organism's clip, subsequently thinning it into segment(s), eventually connecting and calculating the longest path through them, thus producing the skeleton, which should approximate well the length of the organism.

In [None]:
unsupervised_skeletonization(args, cfg)