# ! Remember to set the runtime type to GPU !

# NVIDIA DALI on Colab

We will use this notebook to execute shell commands.
The first thing we do is clone the repository

In [None]:
!git clone https://github.com/eryl/aida-dali-workshop.git

In [None]:
!git -C /content/aida-dali-workshop/ pull   # update the codebase

In [None]:
!pip install monai nibabel > /dev/null

Now check which version of cuda is installed by running the command below

In [None]:
!nvcc --version

When we ran this, it says CUDA 12.2, so that's the DALI version we'll install (dali targets the major revisions, so either 11.0 or 12.0, expressed as 110 or 120).

In [None]:
!pip install nvidia-dali-cuda120

## Running scripts
The code we clone is in /content/aida-dali-workshop. We'll start by making that our working directory

In [None]:
%cd /content/aida-dali-workshop/

In [None]:
!pwd

## The first training script
We can now run the first training script. It will download the image dataset we'll use (the Oxford IIIT pets dataset).

In [None]:
!python examples/train_pet_resnet.py

What we're mostly interested in this case is how long the training epoch takes compared to our other data loading methods.

## The DALI version
Now try to run the DALI version and compare the training time

In [None]:
!python examples/train_pet_resnet_dali.py

Did you notice any difference in time it took to process the batches (how many iterations per second did the two methods achieve)?

## Pre-augmenting the data
We've looked at how we can offload the augmentation to the GPU using DALI. As we get more powerful GPUs this will probably become more important to gain speedups in utilization.

Another way to speed up dataloading is to perform the augmentation ahead of time. This only makes sense if you have plenty storage and will be using the training dataset to train multiple models, but if you plan to run large amounts of cross validation you will likely see significant speedups.

One downside to this method is that we need to generate different augmentations for each epoch (the core idea of data augmentations is that the exact same image should never occur multiple times in the training data). This means that storage requierments will increase with the planned number of epochs.

In [None]:
!python examples/train_pet_resnet_preaugmented.py

In [None]:
! du -sh data

# Hands-on

As a hands-on session, we will try to adapt an existing pytorch script to using DALI. Here you can chose to work on an experiment of your own, or try the 3D unet supplied in the examples (see below)

## 3D Unet
In this repository, there is a script called `examples/unet_training_array.py` which is taken from the monai examples. Here the challange is to take the existing training data augmentation pipeline and try to convert it to a DALI pipeline. While the resnet training example should serve as a rough sketch, the challange here will be in defining data augmentation steps which match those used by the original script.

In [None]:
# First we create a synthetic segmentation dataset
!python examples/unet_create_dataset.py


In [None]:
# Now we can run the training script
!python examples/unet_training_array.py

## Adapting to the DALI pipeline

Now try changing this pipeline to use NVIDIA dali for the training data loader. You will need to wrap the ImageDataset used in the script.