<a href="https://colab.research.google.com/github/ssktotoro/neuro/blob/colab_notebook/Neuro%20UNet%20Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neuro UNet/ MeshnetTutorial

Authors: [Kevin Wang] (), [Alex Fedorov] (), [Sergey Kolesnikov](https://github.com/Scitator)

[![Catalyst logo](https://raw.githubusercontent.com/catalyst-team/catalyst-pics/master/pics/catalyst_logo.png)](https://github.com/catalyst-team/catalyst)

### Colab setup

First of all, do not forget to change the runtime type to GPU. <br/>
To do so click `Runtime` -> `Change runtime type` -> Select `\"Python 3\"` and `\"GPU\"` -> click `Save`. <br/>
After that you can click `Runtime` -> `Run all` and watch the tutorial.

## Requirements

Download and install the latest versions of catalyst and other libraries required for this tutorial.

In [None]:
%%bash 
git clone https://github.com/ssktotoro/neuro.git
git pull
pip install -r neuro/requirements/requirements.txt


Collecting alchemy==20.4
  Downloading https://files.pythonhosted.org/packages/e1/d0/29085429e2f6203ee206a4aa93cb20cdafbdc2aa649d7b20de24eeb7fb69/alchemy-20.4-py2.py3-none-any.whl
Collecting catalyst==20.10.1
  Downloading https://files.pythonhosted.org/packages/1c/1f/7c0591a256990e146b377c282f17e2cd2717b25ac7e489c97dc972ed7248/catalyst-20.10.1-py2.py3-none-any.whl (475kB)
Collecting reaction==20.2
  Downloading https://files.pythonhosted.org/packages/75/9b/c549eb02e2b5caf8e2dcfb6386fa82645ffaaf2e7fc3c6d682f0591d8187/reaction-20.2-py2.py3-none-any.whl
Collecting osfclient
  Downloading https://files.pythonhosted.org/packages/2d/2f/b24d24c6376f6087048e1aaf93b0a4a7a6a2f5709ef74b7a0bbe267f8d52/osfclient-0.0.4-py2.py3-none-any.whl
Collecting requests==2.22.0
  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)
Collecting tensorboardX
  Downloading https://files.pythonhosted.org/p

Cloning into 'neuro'...
fatal: not a git repository (or any of the parent directories): .git
ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.22.0 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.


In [2]:
from typing import Callable, List, Tuple

import os
import torch
import catalyst
from catalyst import utils

print(f"torch: {torch.__version__}, catalyst: {catalyst.__version__}")

# os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # "" - CPU, "0" - 1 GPU, "0,1" - MultiGPU

SEED = 42
utils.set_global_seed(SEED)
utils.prepare_cudnn(deterministic=True)

torch: 1.7.0+cu101, catalyst: 20.10.1


# Dataset

We'll be using the Mindboggle 101 dataset for a multiclass 3d segmentation task.
The dataset can be downloaded off osf with the following command from osfclient after you register with osf.

`osf -p 9ahyp clone .`

Otherwise you can download it using a Catalyst utility `download-gdrive` which downloads a version from the Catalyst Google Drive

`usage: download-gdrive {FILE_ID} {FILENAME}`

In [4]:
cd neuro

/content/neuro


In [5]:
%%bash
mkdir Mindboggle_data 
mkdir -p data/Mindboggle_101/
osf -p 9ahyp clone Mindboggle_data/
cp -r Mindboggle_data/osfstorage/Mindboggle101_volumes/ data/Mindboggle_101/
find data/Mindboggle_101 -name '*.tar.gz'| xargs -i tar zxvf {} -C data/Mindboggle_101
find data/Mindboggle_101 -name '*.tar.gz'| xargs -i rm {}

OASIS-TRT-20_volumes/
OASIS-TRT-20_volumes/OASIS-TRT-20-3/
OASIS-TRT-20_volumes/OASIS-TRT-20-4/
OASIS-TRT-20_volumes/OASIS-TRT-20-5/
OASIS-TRT-20_volumes/OASIS-TRT-20-2/
OASIS-TRT-20_volumes/OASIS-TRT-20-13/
OASIS-TRT-20_volumes/OASIS-TRT-20-14/
OASIS-TRT-20_volumes/OASIS-TRT-20-15/
OASIS-TRT-20_volumes/OASIS-TRT-20-12/
OASIS-TRT-20_volumes/OASIS-TRT-20-7/
OASIS-TRT-20_volumes/OASIS-TRT-20-9/
OASIS-TRT-20_volumes/OASIS-TRT-20-8/
OASIS-TRT-20_volumes/OASIS-TRT-20-1/
OASIS-TRT-20_volumes/OASIS-TRT-20-6/
OASIS-TRT-20_volumes/OASIS-TRT-20-19/
OASIS-TRT-20_volumes/OASIS-TRT-20-17/
OASIS-TRT-20_volumes/OASIS-TRT-20-10/
OASIS-TRT-20_volumes/OASIS-TRT-20-11/
OASIS-TRT-20_volumes/OASIS-TRT-20-16/
OASIS-TRT-20_volumes/OASIS-TRT-20-20/
OASIS-TRT-20_volumes/OASIS-TRT-20-18/
OASIS-TRT-20_volumes/OASIS-TRT-20-18/labels.DKT31.manual.nii.gz
OASIS-TRT-20_volumes/OASIS-TRT-20-18/t1weighted_brain.MNI152.nii.gz
OASIS-TRT-20_volumes/OASIS-TRT-20-18/t1weighted.MNI152.nii.gz
OASIS-TRT-20_volumes/OASIS-TRT-20

0files [00:00, ?files/s]
  0%|          | 0.00/3.22M [00:00<?, ?bytes/s][A100%|██████████| 3.22M/3.22M [00:00<00:00, 136Mbytes/s]
1files [00:06,  6.05s/files]
  0%|          | 0.00/3.66k [00:00<?, ?bytes/s][A100%|██████████| 3.66k/3.66k [00:00<00:00, 18.1Mbytes/s]
2files [00:07,  4.79s/files]
  0%|          | 0.00/843M [00:00<?, ?bytes/s][A
  0%|          | 4.21M/843M [00:01<05:43, 2.44Mbytes/s][A
  1%|          | 8.40M/843M [00:01<04:06, 3.39Mbytes/s][A
  3%|▎         | 25.4M/843M [00:01<02:50, 4.80Mbytes/s][A
  4%|▍         | 33.6M/843M [00:02<02:02, 6.59Mbytes/s][A
  5%|▍         | 42.0M/843M [00:02<01:32, 8.71Mbytes/s][A
  7%|▋         | 58.0M/843M [00:02<01:04, 12.2Mbytes/s][A
  9%|▉         | 75.5M/843M [00:02<00:47, 16.2Mbytes/s][A
 10%|▉         | 83.9M/843M [00:02<00:39, 19.2Mbytes/s][A
 12%|█▏        | 97.9M/843M [00:03<00:28, 26.0Mbytes/s][A
 14%|█▎        | 114M/843M [00:03<00:20, 34.7Mbytes/s] [A
 15%|█▍        | 125M/843M [00:03<00:21, 33.8

Run the prepare data script that limits the labels to the DKT human labels (60 labels).

`usage: python ../neuro/scripts/prepare_data.py ../data/Mindboggle_101 {N_labels)`

In [7]:
%%bash 

python neuro/scripts/prepare_data.py data/Mindboggle_101/ 60

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100


Import Catalyst and Torch utils for training

In [8]:
import torch
import collections

from catalyst.contrib.utils.pandas import read_csv_data
from torch.utils.data import RandomSampler
from torch.utils.data import DataLoader
from torchvision import transforms
from catalyst.data import Augmentor, ReaderCompose
from torch.nn import CrossEntropyLoss
from torch.optim.lr_scheduler import CosineAnnealingLR
from catalyst.dl import SupervisedRunner
from catalyst.callbacks.logging import TensorboardLogger
from catalyst.callbacks import SchedulerCallback, CheckpointCallback



Here we import a BrainDataSet, which reads T1 scans and labels and samples either random patches of 38x38x38 samples from them or nonoverlapping patches of 38x38x38 for validation.  More detail can be found in brain_dataset.py and generator_coords.py  

In [9]:
cd training/

/content/neuro/training


In [11]:
from brain_dataset import BrainDataset
from reader import NiftiReader_Image, NiftiReader_Mask
from custom_metrics import CustomDiceCallback
from model import UNet

In [None]:
open_fn = ReaderCompose(                                                                                                                                                                            
    readers=[                                                                                                                                                                                       
        NiftiReader_Image(input_key="images", output_key="images"),                                                                                                                                 
        NiftiReader_Mask(input_key="nii_labels", output_key="targets"),
    ]
)

In [None]:

def get_loaders(
    random_state: int,
    volume_shape: List[int],
    subvolume_shape: List[int],
    in_csv_train: str = None,                                                                                                                                                                           
    in_csv_valid: str = None,                                                                                                                                                                           
    in_csv_infer: str = None,
    batch_size: int = 16,
    num_workers: int = 10,
) -> dict:
    manager = Manager()

    df, df_train, df_valid, df_infer = read_csv_data(                                                                                                                                                   
    in_csv_train=in_csv_train,                                                                                                                                                                      
    in_csv_valid=in_csv_valid,                                                                                                                                                                      
    in_csv_infer=in_csv_infer,                                                                                                                                                                      
    ) 

    datasets = {}

    train_dataset = BrainDataset(shared_dict={},                                                                                                                                                             
                    list_data=df_train, list_shape=volume_shape, list_sub_shape=subvolume_shape,                                                                                                              
                    open_fn=open_fn, dict_transform=get_transforms(None, mode='train'),                                                                                                          
                    n_samples=100, mode='train', input_key="images",                                                                                                                                     
                    output_key="targets")
    valid_dataset = BrainDataset(shared_dict={},                                                                                                                                                             
                    list_data=df_valid, list_shape=volume_shape, list_sub_shape=subvolume_shape,                                                                                                              
                    open_fn=open_fn, dict_transform=get_transforms(None, mode='valid'),                                                                                                          
                    n_samples=100, mode='valid', input_key="images",                                                                                                                                     
                    output_key="targets")
    test_dataset = BrainDataset(shared_dict={},                                                                                                                                                             
                    list_data=df_infer, list_shape=volume_shape, list_sub_shape=subvolume_shape,                                                                                                              
                    open_fn=open_fn, dict_transform=get_transforms(None, mode='valid'),                                                                                                          
                    n_samples=100, mode='valid', input_key="images",                                                                                                                                     
                    output_key="targets")

    train_random_sampler = RandomSampler(data_source=train_dataset,                                                                                                                                   
                                          replacement=True,
                                          num_samples=80 * 128)

    valid_random_sampler = RandomSampler(data_source=valid_dataset,  
                                          replacement=True,
                                          num_samples=20*216)

    train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, sampler=train_random_sampler, 
                              num_workers=10, pin_memory=True)
    valid_loader = DataLoader(dataset=valid_dataset, batch_size=batch_size, sampler=valid_random_sampler, 
                              num_workers=10, pin_memory=True, drop_last=True)
    loaders = collections.OrderedDict()
    loaders["train"] = train_loader
    loaders["valid"] = valid_loader

    return loaders

In [None]:
loaders = get_loaders(0, [256, 256, 256], [38, 38, 38], 
                      "../data/dataset_train.csv", "../data/dataset_valid.csv", "../data/dataset_infer.csv", )

# Model

We'll be using the UNet defined in the model.py file for training

In [None]:

unet = UNet(n_channels=1, n_classes=30)

# Model Training

We'll train the model 30 epochs

An Adam Optimizer with a cosine annealing schedule starting at a learning rate of .01 is used for this experiment.

CrossEntropyLoss is the criterion/ loss function be minimized 

In [None]:

num_epochs = 30
logdir = "logs/unet"

optimizer = torch.optim.Adam(unet.parameters(), lr=0.01, weight_decay=0.0001)
scheduler = CosineAnnealingLR(optimizer, T_max=30)

runner = SupervisedRunner(input_key='images', input_target_key='labels', output_key='logits')

callbacks = [
    TensorboardLogger(),
    SchedulerCallback(reduced_metric='loss'),
    CustomDiceCallback(),
    CheckpointCallback(),
]

runner.train(model=unet, criterion=CrossEntropyLoss(), optimizer=optimizer, scheduler=scheduler, loaders=loaders,
            callbacks=callbacks, logdir=logdir, num_epochs=num_epochs, verbose=True)

1/30 * Epoch (train):   0% 0/640 [00:00<?, ?it/s]


Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.



1/30 * Epoch (train): 100% 640/640 [21:51<00:00,  2.05s/it, dice=0.952, loss=0.050]   
1/30 * Epoch (valid): 100% 270/270 [10:13<00:00,  2.27s/it, dice=0.970, loss=0.030]    
[2020-11-03 13:43:46,953] 
1/30 * Epoch 1 (_base): lr=0.0099 | momentum=0.9000
1/30 * Epoch 1 (train): dice=0.9164 | loss=0.3170
1/30 * Epoch 1 (valid): dice=0.9828 | loss=0.0175
2/30 * Epoch (train): 100% 640/640 [22:36<00:00,  2.12s/it, dice=0.968, loss=0.034]
2/30 * Epoch (valid): 100% 270/270 [10:22<00:00,  2.31s/it, dice=0.996, loss=0.004]
[2020-11-03 14:16:47,057] 
2/30 * Epoch 2 (_base): lr=0.0098 | momentum=0.9000
2/30 * Epoch 2 (train): dice=0.9314 | loss=0.2609
2/30 * Epoch 2 (valid): dice=0.9854 | loss=0.0161
3/30 * Epoch (train): 100% 640/640 [22:30<00:00,  2.11s/it, dice=0.984, loss=0.016]
3/30 * Epoch (valid): 100% 270/270 [10:12<00:00,  2.27s/it, dice=0.993, loss=0.007]
[2020-11-03 14:49:32,799] 
3/30 * Epoch 3 (_base): lr=0.0096 | momentum=0.9000
3/30 * Epoch 3 (train): dice=0.9407 | loss=0.2247
3/