# Note: How to reproduce nnUNet

### Build the file sturcture

1. Build a new directory to store both the network and data
2. Git clone the rep from (https://github.com/MIC-DKFZ/nnUNet.git)

In [17]:
import os 
import shutil
root_dir = os.getcwd()

# * default path of the new testing network
my_nnunet_dir = os.path.join(root_dir,'my_nnunet')
input_dir = os.path.join(root_dir, 'input')
ground_truth_dir = os.path.join(root_dir, 'ground_truth')

In [2]:
def make_if_dont_exist(folder_path,overwrite=False):
    """
    creates a folder if it does not exists
    input:
    folder_path : relative path of the folder which needs to be created
    over_write :(default: False) if True overwrite the existing folder
    """
    if os.path.exists(folder_path):

        if not overwrite:
            print(f'{folder_path} exists.')
        else:
            print(f"{folder_path} overwritten")
            shutil.rmtree(folder_path)
            os.makedirs(folder_path)

    else:
      os.makedirs(folder_path)
      print(f"{folder_path} created!")

In [3]:
# * go to that dir and make/overwrite the path
os.chdir(root_dir)
make_if_dont_exist('my_nnunet', overwrite=False)
make_if_dont_exist('input', overwrite=False)
make_if_dont_exist('ground_truth', overwrite=False)

os.chdir('my_nnunet')
print(f"Current working directory: {os.getcwd()}")

my_nnunet created!
input created!
ground_truth created!
Current working directory: f:\dataset\mindBoggle\nnUNet\my_nnunet


3. Follow the readme of the rep: using `pip install e .` to download all dependencies, after install, to change the content of the file that is import, we should change the file in the path `/opt/conda/lib/python3.10/site-packages/nnunetv2/`
4. (Select) download hidden layers config with `pip install --upgrade git+https://github.com/nanohanno/hiddenlayer.git@bugfix/get_trace_graph#egg=hiddenlayer` or 
```
git clone --branch bugfix/get_trace_graph https://github.com/nanohanno/hiddenlayer.git
cd hiddenlayer
pip install --upgrade .
```
5. setup the environment variables for program

### Make legal dataset for nnUNet

***Requirements***

* Training case:
  * images -- segmentations: has an *identifier == a* for the case. For each case
    * shapes and spacing are the same; channels are matched
    * input channels must be consistent: same order and all channels are present
  * images
    * naming format: `{CASE_IDENTIFIER}_{XXXX}.{FILE_ENDING}.`
    * arbitary input channels -> each input channel stored in *seperate image* 
    * Different input channels <-> same shape and spacing and co=registered(?)
    * channels / modality are identified by their *FIILE_NAME_ENDING*: a four-digit integer(XXXX)
    * There is a `dataset.json` connects channel names with identifiers in `channel_names` key
    * Except for RGB: RGB's three channels can be stored in one file
of the filename.
  * segmentations
    * naming format: `{CASE_IDENTIFER}.{FILE_ENDING}`
    * integer maps with each value representing a semantic class
    * share the same geometry(shape and spacing) with their corresponding images

* Data format
  * Same for all images and segmentations in train and test
  * converting everything to .nii.gz!
    * abstracting the input and output of images + segmentations through `BaseReaderWriter`
    * [own data format support](../nnunetv2/imageio/readme.md)
  * supports 2D input images: lossless compression

#### Dataset folder structure

The dataset directory structure is like this:

nnUNet_raw/
    ├── Dataset001_BrainTumour(Dataset<XXX>_<dataset_name>)
        ├── dataset.json
        ├── imagesTr: trainging cases image(all operations on)
            ├── la_003_0000.nii.gz
            ├── la_004_0000.nii.gz
            ├── ...
        ├── imagesTs: test cases  # optional
        └── labelsTr: images with ground truth for training cases
            ├── la_003.nii.gz
            ├── la_003.nii.gz
            ├── ...
    ├── Dataset002_Heart
    ├── Dataset003_Liver
    ├── Dataset004_Hippocampus
    ├── Dataset005_Prostate
    ├── ...

The case is just like different definition of shape, including the number of channels


#### dataset.json

* channel_name: id: name
* labels: name <-> id 
* other essential params: like the type of the file...

```
channel_names:
{
    0: 'T1',
    1: 'CT'
}
labels:
(regular)
{
    'background': 0,
    'left atrium': 1,
    'some other label': 2
}
(region based)
{
    'background': 0,
    'whole tumor': (1, 2, 3),
    'tumor core': (2, 3),
    'enhancing tumor': 3
}
```

Use [here](../nnunetv2/dataset_conversion/generate_dataset_json.py) to generate dataset.json automatically

In [30]:
# an example function for generate dataset json
from typing import Tuple
from batchgenerators.utilities.file_and_folder_operations import save_json, join

def generate_dataset_json(overwrite,
                          output_folder: str,
                          channel_names: dict,
                          labels: dict,
                          num_training_cases: int,
                          file_ending: str,
                          regions_class_order: Tuple[int, ...] = None,
                          dataset_name: str = None, reference: str = None, release: str = None, license: str = None,
                          description: str = None,
                          overwrite_image_reader_writer: str = None, **kwargs):
    json_exist = False
    if os.path.exists(os.path.join(output_folder, "dataset.json")):
        print("dataset.json already exists!")
        json_exist = True 
    if json_exist == False or overwrite == True:
        has_regions: bool = any([isinstance(i, (tuple, list)) and len(i) > 1 for i in labels.values()])
        if has_regions:
            assert regions_class_order is not None, f"You have defined regions but regions_class_order is not set. " \
                                                    f"You need that."
        # channel names need strings as keys
        # * reform the channel names dictionary to be "string": id
        # * it is the value of "channel names" in json
        keys = list(channel_names.keys())
        for k in keys:
            if not isinstance(k, str):
                channel_names[str(k)] = channel_names[k]
                del channel_names[k]

        # * reform labels as ints (values)
        for l in labels.keys():
            value = labels[l]
            if isinstance(value, (tuple, list)):
                value = tuple([int(i) for i in value])
                labels[l] = value
            else:
                labels[l] = int(labels[l])

        dataset_json = {
            'channel_names': channel_names,  # previously this was called 'modality'. I didn't like this so this is
            # channel_names now. Live with it.
            'labels': labels,
            'numTraining': num_training_cases,
            'file_ending': file_ending,
        }

        if dataset_name is not None:
            dataset_json['name'] = dataset_name
        if reference is not None:
            dataset_json['reference'] = reference
        if release is not None:
            dataset_json['release'] = release
        if license is not None:
            dataset_json['licence'] = license
        if description is not None:
            dataset_json['description'] = description
        if overwrite_image_reader_writer is not None:
            dataset_json['overwrite_image_reader_writer'] = overwrite_image_reader_writer
        if regions_class_order is not None:
            dataset_json['regions_class_order'] = regions_class_order

        dataset_json.update(kwargs)

        save_json(dataset_json, join(output_folder, 'dataset.json'), sort_keys=False)



In [15]:
# copy image from origin dataset
def copy_rename(old_dir, old_name, new_dir, new_name, delete_ori=False):
    shutil.copy(os.path.join(old_dir, old_name), new_dir)
    os.rename(os.path.join(new_dir, old_name), os.path.join(new_dir, new_name))
    if delete_ori:
        os.remove(os.path.join(old_dir, old_name))

In [52]:
# Check whether the images' name is the same as the
# requirement of nnUNet
def check_format(image_name):
    end = image_name.find(".nii.gz")
    channel_name = image_name[end - 4 : end]
    # print(channel_name)
    # To see whether the four last characters is effect
    for char in channel_name:
        if not(ord(char) >= 48 and (ord(char)) <= 57):
            # print(char, ord(char))
            return False
    return True 
def rename_for_channel(dir):
    for file_name in os.listdir(dir):
        if check_format(file_name) == False:
            new_file_name = file_name[:file_name.find(".nii.gz")] + "_0000.nii.gz"
            if os.path.exists(os.path.join(dir, new_file_name)):
                os.remove(os.path.join(dir, file_name))
            else:
                os.rename(os.path.join(dir, file_name), os.path.join(dir, new_file_name))


In [60]:

# * generate the right f
nnunum_raw_data = os.path.join(root_dir, "nnUNet_raw")
dataset_identifer = "Dataset001"
dataset_path = os.path.join(nnunum_raw_data, dataset_identifer)
train_case_path = os.path.join(dataset_path, "imagesTr")
label_case_path = os.path.join(dataset_path, "labelsTr")
test_case_path = os.path.join(dataset_path, "imagesTs")
make_if_dont_exist(dataset_path)
make_if_dont_exist(train_case_path)
make_if_dont_exist(test_case_path)
make_if_dont_exist(label_case_path)

f:\dataset\mindBoggle\nnUNet\nnUNet_raw\Dataset001 exists.
f:\dataset\mindBoggle\nnUNet\nnUNet_raw\Dataset001\imagesTr exists.
f:\dataset\mindBoggle\nnUNet\nnUNet_raw\Dataset001\imagesTs exists.
f:\dataset\mindBoggle\nnUNet\nnUNet_raw\Dataset001\labelsTr created!


In [61]:

# *if file exists in segementation and input at the same time, it is valid, add them in the train and lable dir
ground_truth_files = os.listdir(ground_truth_dir)
input_files = os.listdir(input_dir)
ground_truth_files = [file_name for file_name in ground_truth_files if file_name.endswith(".nii.gz")]
input_files = [file_name for file_name in input_files if file_name.endswith(".nii.gz")]

for file_name in input_files:
    if file_name in ground_truth_files:
        copy_rename(ground_truth_dir, file_name, train_case_path, file_name)
        copy_rename(input_dir, file_name, label_case_path, file_name)
    else:
        print(file_name)

In [66]:
# rename all the images
rename_for_channel(train_case_path)
# rename_for_channel(label_case_path)

generate_dataset_json(
    True,
    output_folder = dataset_path,
    channel_names = {0: "MRI"},
    labels = {"background": 0,
        "Cortical gray matter": 1,
        "Cortical White matter": 2,
        "Cerebellum gray" : 3,
        "Cerebellum white" : 4},
    num_training_cases = len(ground_truth_files),
    file_ending = ".nii.gz",
    dataset_name = dataset_identifer,
    overwrite_image_reader_writer='NibabelIOWithReorient' 
)

dataset.json already exists!


In [18]:

# * preprocess and result file
os.chdir(my_nnunet_dir)
preprocess_path = os.path.join(my_nnunet_dir, "nnUNet_preprocessed")
result_path = os.path.join(my_nnunet_dir, "nnUNet_results")
make_if_dont_exist(preprocess_path)
make_if_dont_exist(result_path)

f:\dataset\mindBoggle\nnUNet\my_nnunet\nnUNet_preprocessed exists.
f:\dataset\mindBoggle\nnUNet\my_nnunet\nnUNet_results exists.


### set environment variables

See [here](documentation/set_environment_variables.md).

env variables is for
* raw, preprocessed data and trained models' position

On Linux and Mac(permanent)
```bash
export nnUNet_raw="/media/fabian/nnUNet_raw"
export nnUNet_preprocessed="/media/fabian/nnUNet_preprocessed"
export nnUNet_results="/media/fabian/nnUNet_results"
```
or to execute this to change it temporarily


In [12]:
os.environ['nnUNet_raw'] = nnunum_raw_data
os.environ['nnUNet_preprocessed'] = preprocess_path
os.environ['nnUNet_results'] = result_path

In [38]:
nnUNetv2_plan_and_preprocess -d 1 --verify_dataset_integrity
# nnunetv2.experiment_planning.plan_and_preprocess_entrypoints
# position of the file 
# * nnUNetv2_plan_and_preprocess = "nnunetv2.experiment_planning.plan_and_preprocess_entrypoints:plan_and_preprocess_entry"
# * dataset_fingerprint.json: shape and spacing
# * nnUNetPlans.json: topoology
# * preprocesses data

SyntaxError: invalid syntax (467975115.py, line 1)

In [None]:
# train
# basic version
nnUNetv2_train DATASET_NAME_OR_ID UNET_CONFIGURATION FOLD 

#### train parameters

`nnUNetv2_train = "nnunetv2.run.run_training:run_training_entry"`
train's parameters settings
nnunetv2.training.nnUNetTrainer
* `UNET_CONFIGURATION`: requested U-Net configuration (defaults: 2d, 3d_fullres, 3d_lowres, 3d_cascade_lowres)
* `--npz`: save the softmax outputs during the final validation
* `--c`: continue a previous training

#### Process of training

[file place](nnunetv2/run/run_training.py) run.run_training.run_training_entry(): input argument => run_training
run_training: assign input argument to the training setting(Using `get_trainer_from_args`): enable ddp => 
* class `nnunetv2.training.nnUNetTrainer.nnUNetTrainer`: initialize configuration, parameters; hyperparameters; plans for topology(will determine the structure of network after)
*if enable training*
nnUNetTrainer.run_training => on_trian_start: (preperation of train)
1. initialize
* network: utils.get_network_from_plans: dimension, plan, type of network to exiplicit network(the network structure is in dynamic_network_architectures: how to change the network is in [Extend]("F:\dataset\mindBoggle\nnUNet\documentation\extending_nnunet.md"))
* nnUNetTrainer.configure_optimizers: optimizer and scheduler(PolyLRScheduler(training.lr_sscheduler))
* _build_loss: set the loss function as predifined `DC_and_CE_loss` + deepsupervision scale(?)(given from dataset): 
2. unpack dataset
3. get dataloader: 
* how to rotate + transformation(append all tranformation in the self defined (trainsform) or transform in library)
* use nnunet_dataset to get split of the dataset
* use base_data_loader to load the bbox of the image: only seg the class out
=> on_epoch_start: start to reord one epoch's info in logger
=> train for one epoch normally => one_epoch_end: collect all the result in a list => valisation_epoch_start => validation_step => valistaion_epoch_end => on_epoch_end(log down info)


#### postprocessing and inference

find best configuration [file place](nnunetv2/evaluation/find_best_configuration.py)

`nnUNetv2_find_best_convfiguration = "nnunetv2.evaluation.find_best_configuration:find_best_configuration_entry_point"`
It will return the best plan for the net(test among all the trained model) Parameters: 
* necessary param: "dataset_name_or_id
* chosen param: `-p`: plans `-c`: list of configurations(default is all of the mentioned configurations(need to change if there isn't enough config)), `-f`: fold to use(default 5 folds), '-tr': list of trainers


#### Extend network

* Quick and dirty: implement a new nnUNetTrainer class and overwrite its `build_network_architecture` function. 
  Make sure your architecture is compatible with deep supervision (if not, use `nnUNetTrainerNoDeepSupervision`
  as basis!) and that it can handle the patch sizes that are thrown at it! Your architecture should NOT apply any 
  nonlinearities at the end (softmax, sigmoid etc). nnU-Net does that!   
* The 'proper' (but difficult) way: Build a dynamically configurable architecture such as the `PlainConvUNet` class(Maybe too complex to implement)
  used by default. It needs to have some sort of GPU memory estimation method that can be used to evaluate whether 
  certain patch sizes and 
  topologies fit into a specified GPU memory target. Build a new `ExperimentPlanner` that can configure your new 
  class and communicate with its memory budget estimation. Run `nnUNetv2_plan_and_preprocess` while specifying your 
  custom `ExperimentPlanner` and a custom `plans_name`. Implement a nnUNetTrainer that can use the plans generated by 
  your `ExperimentPlanner` to instantiate the network architecture. Specify your plans and trainer when running `nnUNetv2_train`. 

#### Analyse how the nnFormer is changing the nnUNet

It is based on the V1 of nnUNet, so it needs some changes when reproduce. 

import network place: [here](F:\dataset\mindBoggle\nnFormer\nnformer\training\network_training\nnFormerTrainerV2_CascadeFullRes.py) `nnTrainer` every new network structure will have its own nnTrainer, which is 继承 from nnTrainer. All files in the `network_training` dic is different Trainer for different network(or training strategy), including the nnUNetTrainerV2. What need to rewrite can refer to the example

The import module is [class for network](F:\dataset\mindBoggle\nnFormer\nnformer\network_architecture\neural_network.py)`SegmentationNetwork`, the main structure of the net is rewritten and only some object of `SegmentationNetwork` is used.

To compare the difference in nnTrainer, we start from checking the difference between [run](F:\dataset\mindBoggle\nnFormer\nnformer\run\run_training.py)

The arguments for run_training(The same with the default version of nnUNetv1)
necessary
* network: i.e. configuration
* network_trainer: i.e. using network trainer, including all info about the functions used in training
* task: dataset_name_or_id
* fold
optional

After assign for the variables, 
`get_default_configuration`
* This function can get beck the needed file's name and the name of the network module. The network trianer will be built after return with other parameters grab by other func. Then using the netTrainer to train
`trainer_class`
`trainer.run_training()`
`load_checkpoint`
As long as the config for the trainer is right / the package loaded for the package is right

Then goes to the `trainer.run_training`
`do_ds`: whether do deep supervision
in `initialize_network`, network is initialized and load weight. Network is suceeded from the `neural network.segmentation.xxxnnFormer`, which seems don't need any important features from it.
For the `xxTrainer`, they are multi level success from the original network trainer. However, for this part, the new version is completely different. But for the old version, as all of the function is been packaged, most parts of train and validate is no need.

### Your self net using nnUNet frame

* nnTrainer
  * Mostly don't need to change much. Just succeed the trainer model and change the function that you want to rewrite. Here are some functions that may be incompatible with the whole frame
  * As long as suceed the origin class, all of the functions defined in that can use. We can also directly reload the funtion so that it can be replaced by ourselves.
* Network
  * for the initialization of the network, some of the parameters are predefined and is directly defined by the data and users. They are used in nnFormer, but definitely used in the final. Just first repeat what the assigned parameters are doing(in the nnFormerTrainer)