**ML TEMPLATE NOTEBOOK**

Use cases for the 'ml_template' repository.

**Running on google colab**

1. either copy or git clone the ml_template repository to your drive, e.g. `git clone https://github.com/purnelldj/ml_template.git`

2. open this notebook from your drive using colab

3. change runtime type to take advantage of GPU / TPU

4. mount your drive:

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Note: if you make any changes to files in drive you need to restart session (see the 'Runtime' menu) and then re-mount the drive.

5. change directory to the ml_template repository on your drive:

In [None]:
%cd /content/drive/MyDrive/colab/ml_template
# ctrl click on path above to open it to the left

6. install dependencies:

In [None]:
!pip install -r reqs_colab.txt

Note: there is some conflict with hydra-core that requires you to restart the runtime session. If you restart the session, you will need to change directory again:

In [None]:
%cd /content/drive/MyDrive/colab/ml_template

7. (optional) log in to wandb. You do not need to log in if logger.mode=offline (default)

In [None]:
import wandb
wandb.login()

**EXAMPLE 1: image classification using the EuroSAT dataset**

[The EuroSAT dataset](https://github.com/phelber/eurosat?tab=readme-ov-file) is a collection of 27,000 lebelled Sentinel-2 images. The dataset comes in RGB (3-channel) format or 13 spectral bands. Here is a subset of 5,000 images evenly split between then 10 classes (500 images each).

**Step 1: download data on drive**

In order to create an efficient dataloader, need to download dataset as a zip and then unzip to '/content'. [Here is a link to this issue on stack overflow.](https://stackoverflow.com/questions/59120853/google-colab-is-so-slow-while-reading-images-from-google-drive)

1. download zip:

In [None]:
!gdown 1ci8-w2Y0Z-hZaO-KyS4cFAiHKwfx4MMO

Note: you do not need to re-download the data if it is already in your drive. But you do need to unzip the data each time.

2. now unzip:

In [None]:
!unzip "eurosat_rgb.zip" -d "/content"

3. go to src/conf/main.yaml and set dataset.dir_parent=/content/

**Step 2: train / test model**

First lets visualize a sample image:

In [None]:
%run src/traintest.py dataset=eurosat_rgb visualize_data=True

For the EuroSAT dataset, you have three options for models to try: 'cnn', 'vit' and 'resnet'

First try visualizing model output with resnet

In [None]:
%run src/traintest.py dataset=eurosat_rgb model=resnet visualize_modelout=True

now train using default parameters, with logging to wandb:

In [None]:
%run src/traintest.py dataset=eurosat_rgb model=resnet logger.mode=online model.wandb_plots=True

this should result in a trainin accuracy of 0.889

try using ViT instead:

In [None]:
%run src/traintest.py dataset=eurosat_rgb model=vit trainer.max_epochs=10 model.optimizer.lr=2e-3 logger.mode=online

**EXAMPLE 2: image segmentation using satellite images of water bodies**

[This dataset on kaggle](https://www.kaggle.com/datasets/franciscoescobar/satellite-images-of-water-bodies/data) consists of thousands of Sentinel-2 images of waterbodies and corresponding water masks. I have uploaded a compressed version to my drive.

**Step 1: download data on drive**

1. download compressed directory

In [None]:
!gdown 1JTLSlcxxCANKs_LKZc0Bx5XBta_3sCDb

2. unzip

In [None]:
!unzip "waterbodies.zip" -d "/content"

3. go to src/conf/main.yaml and set dataset.dir_parent=/content/

**Step 2: train using UNet**

First plot output from model

In [None]:
%run src/traintest.py dataset=waterbodies model=unet visualize_modelout=True

now train

In [None]:
%run src/traintest.py dataset=waterbodies model=unet trainer.max_epochs=10 logger.mode=online

copy the checkpoint automatically saved to outputs and then evaluate on test set:

In [None]:
%run src/traintest.py dataset=waterbodies model=unet stage=test ckpt_path="path/to/checkpoint.ckpt"

note that you will need to manually rename the checkpoint as it saves with a '=' in the name by default.