[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/intel/e2eAIOK/blob/main/demo/ma/domain_adapter/Model_Adapter_Domain_Adapter_Walkthrough_Unet_KITS19.ipynb)

# Model Adapter Domain Adapter Walkthrough Unet KITS19

In this demo, we will introduce how to use Domain Adapter to transfer knowledge in medical image semantic segmentation.
Unlike the [built-in demo](./Model_Adapter_Domain_Adapter_builtin_Unet_KITS19.ipynb), we will illustrate how to invoke the Model Adaptor API on your own workflow.

# Content

* [Overview](#overview)
    * [Model Adapter Domain Adapter Overview](#Model-Adapter-Domain-Adapter-Overview)
* [Getting Started](#Getting-Started)
    * [1. Environment Setup](#1.-Environment-Setup)
    * [2. Data Prepare](#2.-data-prepare)
    * [3. Model Prepare](#3.-model-prepare)
    * [4. Train](#4.-train)
    * [5. Inference](#5.-inference)

# Overview

## Model Adapter Domain Adapter Overview

Model Adapter is a convenient framework can be used to reduce training and inference time, or data labeling cost by efficiently utilizing public advanced models and those datasets from many domains. It mainly contains three components served for different cases: Finetuner, Distiller, and Domain Adapter. 

Directly applying pre-trained model into target domain cannot always work due to covariate shift and label shift, while fine-tuning is also not working due to the expensive labeling in some domains. Even if users invest resource in labeling, it will be time-consuming and delays the model deployment.

Domain Adapter aims at reusing the transferable knowledge with the help of another labeled dataset with same learning task. That is, achieving better generalization with little labeled target dataset or achieving a competitive performance in label-free target dataset.

The following picture show the network strcture of domain adaption, which add a discriminator to users' base network, and try to differentiate the souce domain data and target domain data, hence, it can force the feature extractor to learn a generalized feature representation among domains.

<p align="center">
  <img src='../imgs/adapter.png' width='80%' height='80%' title='Adapter Architecture'>
</p>


# Getting Started

- **Note1: this demo cannot run directly on colab, since it require you to download dataset manually, and store all files according to the specified directory hierarchy. Please refer to [2. Data Prepare](#2-data-prepare) for more details.**
- **Note2: The performance data from this demo is just based on a sampled dataset for better demonstration, any performance data in the below cell does not stand for the actual performance of this toolkit.**

## 1. Environment Setup

### (Option 1) Use Pip install

In [None]:
!pip install e2eAIOK-ModelAdapter --pre

### (Option 2) Use Docker

Step1. prepare code
   ``` bash
   git clone https://github.com/intel/e2eAIOK.git
   cd e2eAIOK
   git submodule update --init –recursive
   ```
    
Step2. build docker image
   ``` bash
   python3 scripts/start_e2eaiok_docker.py -b pytorch112 --dataset_path ${dataset_path} -w ${host0} ${host1} ${host2} ${host3} --proxy  "http://addr:ip"
   ```
   
Step3. run docker and start conda env
   ``` bash
   sshpass -p docker ssh ${host0} -p 12347
   conda activate pytorch-1.12.0
   ```
  
Step4. Start the jupyter notebook and tensorboard service
   ``` bash
   nohup jupyter notebook --notebook-dir=/home/vmagent/app/e2eaiok --ip=${hostname} --port=8899 --allow-root &
   nohup tensorboard --logdir /home/vmagent/app/data/tensorboard --host=${hostname} --port=6006 & 
   ```
   Now you can visit demso in `http://${hostname}:8899/`, and see tensorboad log in ` http://${hostname}:6006`.

## 2. Data Prepare

### Data Download

* Our source domain is AMOS dataset(Download AMOS data from [here](https://amos22.grand-challenge.org/Dataset/)), which provides 500 CT and 100 MRI scans with voxel-level annotations of 15 abdominal organs, including the spleen, right kidney, left kidney, gallbladder, esophagus, liver, stomach, aorta, inferior vena cava, pancreas, right adrenal gland, left adrenal gland, duodenum, bladder, prostate/uterus.
* Our target domain is KiTS dataset(Download KiTS data from [here](https://github.com/neheller/kits19)), which provides 300 CT scans with voxel-level annotations of kidney organs and kidney tumor.
* Our task is to explore reliable kidney semantic segmentation methodologies with the help of labeled AMOS dataset and unlabeled KiTS dataset, evalutaion metric is kidney dice score in target domain.

- Then, setup some enviroment variables
    - It tell the program where to read data, and where to write the output model and log

In [None]:
import os
os.environ['nnUNet_raw_data_base'] = "/home/vmagent/app/data/nnUNet_raw_data_base" 
os.environ['nnUNet_preprocessed'] = "/home/vmagent/app/data/nnUNet_preprocessed"
os.environ['RESULTS_FOLDER'] = "/home/vmagent/app/data/nnUNet_trained_models"

* After downloading the dataset, remember to put all your data in right places, now your files should be located at:
   - Images at: ```${nnUNet_raw_data_base}/nnUNet_raw_data/TaskId_TaskName/imagesTr/```
   - Labels/Segmentations at: ```${nnUNet_raw_data_base}/nnUNet_raw_data/TaskId_TaskName/labelsTr/```
   - Please refer to [here](https://github.com/MIC-DKFZ/nnUNet) to know how to put all your data in your `${dataset_path}` in right format.

- Now the structure should look like (*for simlicy, we only take 5 case of each task for demostration*):

In [None]:
!tree $nnUNet_raw_data_base/nnUNet_raw_data

[01;34m/home/vmagent/app/dataset/nnUNet_raw_data_base/nnUNet_raw_data[00m
├── [01;34mTask041_KiTS[00m
│   ├── dataset.json
│   ├── [01;34mimagesTr[00m
│   │   ├── [01;31mcase_00000_0000.nii.gz[00m
│   │   ├── [01;31mcase_00001_0000.nii.gz[00m
│   │   ├── [01;31mcase_00002_0000.nii.gz[00m
│   │   ├── [01;31mcase_00003_0000.nii.gz[00m
│   │   └── [01;31mcase_00004_0000.nii.gz[00m
│   └── [01;34mlabelsTr[00m
│       ├── [01;31mcase_00000.nii.gz[00m
│       ├── [01;31mcase_00001.nii.gz[00m
│       ├── [01;31mcase_00002.nii.gz[00m
│       ├── [01;31mcase_00003.nii.gz[00m
│       └── [01;31mcase_00004.nii.gz[00m
└── [01;34mTask505_AMOS[00m
    ├── [01;34mimagesTr[00m
    │   ├── [01;31mamos_0001.nii.gz[00m
    │   ├── [01;31mamos_0004.nii.gz[00m
    │   ├── [01;31mamos_0005.nii.gz[00m
    │   ├── [01;31mamos_0006.nii.gz[00m
    │   └── [01;31mamos_0007.nii.gz[00m
    ├── [01;34mlabelsTr[00m
    │   ├── [01;31mamos_0001.nii.g

### Data Preprocess

#### Data Alignment

- In this part, we do the following thing:
    - Keep the both data in the same axis ordering, for background knowledge, you can refer to [here](https://www.jarvis73.com/2019/06/24/Medical-Imaging-Guide/#13-%E5%9D%90%E6%A0%87%E7%B3%BB%E7%BB%9F)
        - Axis ordering: it determines in what direction we see the medical image, it is adjustable, and something like rotation in natural images, we should make the two dataset have same perspective;
    - Change the tumor annotation in KiTS to kidney, because we cannot know the tumor from source domain AMOS

In [None]:
%cd modelzoo/unet/nnUNet/nnunet

In [None]:
%% bash
python dataset_conversion/amos_convert_label.py
python dataset_conversion/kits_convert_label.py basic

#### Data Verification

- Before going any further, verify that the data is present and labels and data matches.

In [None]:
!nnUNet_plan_and_preprocess -t 507 --verify_dataset_integrity

Verifying training set
checking case case_00000
checking case case_00001
checking case case_00002
checking case case_00003
checking case case_00004
Verifying label values
Expected label values are [0, 1]
Labels OK
Dataset OK


In [None]:
!nnUNet_plan_and_preprocess -t 508 --verify_dataset_integrity

Verifying training set
checking case amos_0001
checking case amos_0004
checking case amos_0005
checking case amos_0006
checking case amos_0007
Verifying label values
Expected label values are [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Labels OK
Dataset OK


#### Data Target Spacing Sample & Normalization

- We need to perform same target spacing sample && normalization in both domains, and saves it into the "nnUNet_preprocessed" folder.

    - Voxel Spacing: it is the distance between voxels, it influence the image size, we can understand it as the resolution in natural images. Every image have different voxel spacing even if they are in the exact one dataset, it is not suitable for convolution operations according to literature, so we usually doing some resampling operations to make the voxel spacing is same in every image of both dataset;

    - Intensity: it is the float value of every pixel in each slice of the grey CT images, usually same organs have similar intensity distribution even if they are captured by different scanners. Currently we use the intensity mean and std from the foreground of source domain dataset to perform normalization in both datasets, since foreground of source dataset have more classes, and we need to segmentation target dataset to these classes, so target dataset executed the same normalization.
    
- So we first process the target domain, get the dataset characteristic, and then apply it to the source domain
- Also some rule based parameters will be extracted in this step, such as model architecture, learning rate, batch size...

In [None]:
%%bash
nnUNet_plan_and_preprocess -t 508 -pl2d None -pl3d ExperimentPlanner3D_v21_customTargetSpacing_kits19
nnUNet_plan_and_preprocess -t 507 -pl2d None -pl3d ExperimentPlanner3D_v21_customTargetSpacing_kits19 -no_pp
python dataset_conversion/kits_convert_label.py intensity
nnUNet_plan_and_preprocess -t 507 -pl2d None -pl3d ExperimentPlanner3D_v21_customTargetSpacing_kits19 -no_plan

## 3. Model Prepare

### Demo Code Prepare

- First download the workflow preparation script
    ``` bash
    wget https://raw.githubusercontent.com/intel/e2eAIOK/main/demo/ma/domain_adapter/workflow_prepare_ma_da.sh
    ```
- Then run this script to prepare the workflow
    ```bash
    sh workflow_prepare_ma_da.sh
    ```

### Wrap model with Model Adapter

In the demo code, we actually make some changes on users' original model by using the Model Adapter API.

The following is an example on conversion on users' `backbone` model, after using Model Adapter API `make_transferrable_with_domain_adaption`, we get a converted model `converted_model`, then replace the original `backbone` model with this `converted_model` in users' training circle.

```python
from e2eAIOK.ModelAdapter.backbone.unet.generic_UNet_DA import Generic_UNet_DA
from e2eAIOK.ModelAdapter.engine_core.adapter.adversarial.DA_Loss import CACDomainAdversarialLoss
from e2eAIOK.ModelAdapter.engine_core.transferrable_model import make_transferrable_with_domain_adaption

backbone = Generic_UNet_DA(
    self.threeD, self.num_input_channels, 
    self.base_num_features, self.num_classes,         
    self.conv_per_stage, self.net_num_pool_op_kernel_sizes, 
    self.net_conv_kernel_sizes
)

adv_kwargs = {
    'input_channels': backbone.encoder_channels,
    'threeD': self.threeD,
    'pool_op_kernel_sizes': self.net_num_pool_op_kernel_sizes,
    'loss_weight': self.loss_weights[2:]
}
cac_domain_adv = CACDomainAdversarialLoss(**adv_kwargs)

converted_model = make_transferrable_with_domain_adaption(
    backbone, None, cac_domain_adv, 
    False, self.source_loss_weight, 1.0)
```

## 4. Train

### Pre-train Target Domain

- We will first pre-train model in AMOS dataset, and use this pre-trained model later for prameter initialization for domain adaptation
- We use [3D-UNet](https://arxiv.org/abs/1606.06650) to train the model
- *For demostration, we only train 1 epochs:*

In [None]:
%%bash
nnUNet_train 3d_fullres nnUNetTrainerV2 508 1 --epochs 1 -p nnUNetPlansv2.1_trgSp_kits19 --disable_postprocessing_on_folds

### Domain Adaption from AMOS to KiTS

- We use a DANN-like model architecture, the DANN algorithm is illustrated as follows:

<p align="center">
  <img src='../imgs/dann.png' width='80%' height='80%' title='DANN Architecture'>
</p>

- Now we use Model Adapter API to transfer knowledge from AMOS dataset to KiTS dataset

- After using `make_transferrable_with_domain_adaption`, we got an adapted model, we use this model for further training. We use the following command to start training, we omit the training process since it will take very long time(hundreds of hours)

In [None]:
%%bash
nnUNet_train_da 3d_fullres nnUNetTrainer_DA_V2 508 507 1 \
    -p nnUNetPlansv2.1_trgSp_kits19 \
    -sp nnUNetPlansv2.1_trgSp_kits19 \
    --epochs 1 --loss_weights 1 0 1 0 0 \
    -pretrained_weights /home/vmagent/app/dataset/nnUNet_trained_models/nnUNet/3d_fullres/Task508_AMOS_kidney/nnUNetTrainerV2__nnUNetPlansv2.1_trgSp_kits19/fold_1/model_final_checkpoint.model 


- Notice: 
    - we donot use **any label** from target domain KiTS, we only use label from source domain AMOS for training
    - *For demostration, we only train 1 epochs:*

## 5. Inference

### Inference on KiTS Dataset with Adapted Model

- Now we use the adapted model trained in last section to perferm inference on KiTS dataset

- We use following command for perform inference and evaluation, you can find your predictions in `${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/predict/`

In [None]:
!time nnUNet_predict \
    -i ${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/testTr/ \
    -o ${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/predict/ \
    -f 1 \
    -t 507 -m 3d_fullres -p nnUNetPlansv2.1_trgSp_kits19 \
    --disable_tta \
    -tr nnUNetTrainer_DA_V2 \
    --overwrite_existing \
    --disable_mixed_precision 

### Evaluate the Prediction on KiTS Using the Given Label

- Note while evaluating: 
    - The label is not used in training, it is only used in this evaluation step
    - In practical, if you donnot have any label, you can just skip this step

In [1]:
%%bash
nnUNet_evaluate_folder \
    -ref ${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/labelsTr \
    -pred ${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/predict \
    -l 1 \
    --common

The final dice score is 0.89


### Visualization of Data and Segmentations

- Download files from server:
   - Images from: ```${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/imagesTr/```
   - Segmentations from: ```${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/labelsTr/```
   - predictions from: ```${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/predict/```
- After downloading these files you can visualize them with any volumetric visualization program.
For this we would advise to use [MITK](https://www.mitk.org/wiki/The_Medical_Imaging_Interaction_Toolkit_(MITK)) which already has some great [tutorials](https://www.mitk.org/wiki/Tutorials). 
    - If you have not already downloaded it, here is the [MITK Download Link](https://www.mitk.org/wiki/Downloads) 
- Here is a demostration of visualization result from MITK on KiTS dataset

<p align="center">
  <img src='../imgs/KiTS_visualization.png' width='80%' height='80%' title='KiTS_visualization'>
</p>