# MOQC: a comprehensive tutorial

This is a tutorial notebook to show how to use the MOQC package. The package is designed to be used with any segmentation framework.  

The tutorial is divided into 3 parts:
* data
* training
* evaluation

## Data

Firstly, we need to navigate into the source folder of the directory, i.e., _moqc_. 

**NOTE:** this step is avoidable if you are running the scripts by command line.

In [None]:
%cd /data/marciano/experiments/multi-organ-qc/moqc

### Visualization

Even though it's not mandatory, visualizing the images allow you a better understanding of the task and domain for the organ segmentation. To do so we start by simply loading a random image. 

**NOTE:** MOQC training was designed to work with the Medical Segmentation Decathlon dataset (MSD). However, you can easily adapt the code to work with your own dataset. 

In [None]:
import pandas as pd
import numpy as np
import os
import nibabel as nib
from utils.common import Visualizer
from utils.dataset import select_valid_imgs, remove_non_common_files

In [None]:
rand_img = nib.load(f'/data/marciano/experiments/multi-organ-qc/MSD_data/Task01_BrainTumour/labelsTr/BRATS_001.nii.gz').get_fdata().transpose(2, 0, 1)
rand_img.shape

The `Visualizer` is an embedded tool built by using the _plotly_ infrastructure (more details in `utils/common`). It allows you to properly visualize volumetric data. It's designed to simulate the behaviour of the _ITKSnap_ software, which is a popular tool for medical image visualization.

In the following example, we load a random image and visualize it. The `Visualizer` class is initialized by passing the image. Then a convient slider is displayed to navigate through the slices. Our goal is to get some insights from the imputed image and try to understand the distribution of non-empty 2D slices.

In [None]:
viz = Visualizer(rand_img=rand_img)
viz.plot_3d()

### Loading

After investigating where the majority of non-empty slices is in terms of slice indexes, we proceede to select and load the data. The `select_valid_imgs` function comes in handy. It consists of a simple function that saves the 2D niftii files of the non_empty slices. 

The available parameters are:
* `data_path`: path to the data folder
* `save_path`: path to the folder where the 2D slices will be saved
* `inter_slice_range`: range of slices to be saved (optional)
* `non_zero_thres`: threshold to consider a slice as non-empty (optional)

As you can see, some parameters are optional, even though they are dependent on the task you are running. If you don't know how to set them, you can simply ignore them. The function will automatically set them to the default values (see `utils/common`).

In [None]:
label_path = '/data/marciano/experiments/multi-organ-qc/MSD_data/Task01_BrainTumour/labelsTr/'
labsave = '/data/marciano/experiments/multi-organ-qc/data/brain/labels'
select_valid_imgs(label_path, labsave, inter_slice_range=[50, 120], non_zero_thres=0.005)

segpath = '/data/marciano/experiments/multi-organ-qc/nnUnet_seg/brain/Tr/'
segsave = '/data/marciano/experiments/multi-organ-qc/data/brain/nnunet/segmentations'
select_valid_imgs(segpath, segsave, inter_slice_range=[50, 120], non_zero_thres=0.005)

Sometimes it can happen that the UNet segmentations are not accurate enough. This is reflected by a different representative area in the labels, and it can be a cause for misalignaments between ground truths and segmentations during evaluation/inference. For example, let's suppose a ground truth 2D slice has a mask which area covers the 0.5% of the entire image and the threshold is set to the same value. If the area covered by the mask is the 0.49%, then it will be ignored when the function is called. 

That's the rationale behind the function called `remove_non_common_files`: it deletes the (hopefully low) amount of files that are not common between the ground truths and the segmentations.

In [None]:
remove_non_common_files(labsave, segsave)
assert len(os.listdir(labsave)) == len(os.listdir(segsave))
assert os.listdir(labsave).sort() == os.listdir(segsave).sort()

### Preprocessing

We need to climb up the MOQC folder, arriving into the root.

In [None]:
%cd /data/marciano/experiments/multi-organ-qc

This step is mandatory for the training phase. It creates a new folder structure that can be easily managed both by the training and the evaluation scripts. Please refer to the `README` files for more information. 

The script `moqc/data_preparation.py` consists of several parameters:
- The `-d` or `--data` argument is used to specify the data folder. It expects a string input (`type=str`). If not provided, it defaults to **'data'**.
- The `-mf` or `--mask_folder` argument is used to specify the masks folder. It also expects a string input and defaults to **'labels'**.
- The `-o` or `--output` argument is used to specify the output folder of the structured dataset. It expects a string input and defaults to **'structured/'**.
- The `-pf` or `--pair_folder` argument is a boolean flag used to enable pair folder. It defaults to **False**.
- The `-og` or `--organ` argument is used to specify the selected organ. It expects a string input.
- The `-k` or `--keyword` argument is used to specify a keyword to identify your segmentations. It expects a **list** input.
- The `--verbose` argument is a flag that enables verbose mode. It doesn't expect a value. 

In [None]:
!python moqc/data_preparation.py -og brain -k ['brats']
!python moqc/data_preparation.py -d data/brain/nnunet/ -mf -segmentations -og '' --k ['brats']

## Training

In [None]:
!python moqc/train.py -og prostate --model small_cae

### Test

In [None]:
!python moqc/test.py -og prostate -m small_cae

## Evaluation

In [None]:
!python moqc/evaluate.py -og prostate -m small_cae -seg nnunet -l -c