<a href="https://colab.research.google.com/github/itzahs/SSL-for-RS/blob/main/2_TrainModel_SSL4RS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🚀 **Semi-Supervised Learning for Remote Sensing (SSL4RS) Workshop** 🛰️

## 📂 Section 1 - Get Data & Software: Dataset Download & Augmentation
## 🛠️ Section 2 - Train Model: Implementing FixMatch Algorithm with PyTorch
## 📊 Section 3 - Model Evaluation: Analyzing Accuracy & Computational Cost from Log Files
## 📈 Section 4 - Model Inference: Classification Accuracy and Embeddings Visualization


### 📚 Setting Up the Working Folder & Importing Required Packages


In [None]:
#  Monitor the GPU usage
!nvidia-smi

Fri Oct 20 08:15:14 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   55C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
from google.colab import drive
# Mount google drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Set working folder as default
%cd "/content/drive/MyDrive/SSL4RS"

/content/drive/MyDrive/SSL4RS


In [None]:
!ls

Classification-SemiCLS


### 🔧 **Code Modifications and Library Requirements**

This section provides an overview of essential code modifications and library requirements to ensure the software operates smoothly. Key points include:

#### **Code Adjustments**

To enable seamless operation within a Jupyter notebook environment, specific code adjustments are necessary. These adjustments are detailed below:

1. **Modifications to `./Classification-SemiCLS/dataset/builder.py`**

   - **Ignore `.ipynb_checkpoints`**: In `dataset/builder.py` (lines 82-92), implement code to ignore `.ipynb_checkpoints` as needed.

2. **Modifications to `./Classification-SemiCLS/train_semi.py`**

   - **Update 'labeled.next()'**: In `train_semi.py` (lines 444 & 450), replace `labeled.next()` with `next(labeled)`.

   - **Update 'unlabeled.next'**: In `train_semi.py` (lines 453 & 459), replace `unlabeled.next` with `next(unlabeled)`.

#### **Library Dependencies**

To ensure compatibility and proper functionality, it is essential to list the required libraries along with their versions.



In [None]:
# Modifications to ./Classification-SemiCLS/dataset/builder.py
builder_path = "./Classification-SemiCLS/dataset/builder.py"

# Define the content to be added
new_content = """
        import os
        # check if .ipynb_checkpoints is in root, and exclude it
        root = os.path.join(cfg.root, '')
        if os.path.isdir(os.path.join(root, ".ipynb_checkpoints")):
            exclude_dir = os.path.join(root, ".ipynb_checkpoints")
        else:
            exclude_dir = None
"""

# Read the content of the file
with open(builder_path, 'r') as file:
    content = file.read()

# Check if the new content is already present in the file
if new_content.strip() not in content:
    # Find the location to insert the new content after "else:"
    insert_index = content.find("else:")
    if insert_index == -1:
        raise ValueError("Failed to find the insertion point after 'else:'.")

    # Find the end of the line for the "else:" statement
    end_of_line = content.find("\n", insert_index)
    if end_of_line == -1:
        raise ValueError("Failed to find the end of the line for 'else:'.")

    # Insert the new content after the "else:" block and before the next line
    modified_content = content[:end_of_line] + "\n" + new_content + content[end_of_line:]

    # Write the modified content back to the file
    with open(builder_path, 'w') as file:
        file.write(modified_content)

    print("Modifications applied successfully.")
else:
    print("Content is already present in the file.")

Modifications applied successfully.


In [None]:
# Modifications to ./Classification-SemiCLS/train_semi.py
train_semi_path = "./Classification-SemiCLS/train_semi.py"

# Read the content of the file
with open(train_semi_path, 'r') as file:
    lines = file.readlines()

# Modify lines as needed
modified_lines = []
for line in lines:
    if "data_x = labeled_iter.next()" in line:
        modified_lines.append(line.replace(".next()", " = next(labeled_iter)\n"))
    elif "data_u = unlabeled_iter.next()" in line:
        modified_lines.append(line.replace(".next()", " = next(unlabeled_iter)\n"))
    elif "from mmcv import Config" in line:
        modified_lines.append(line.replace("from mmcv import Config", "from mmengine.config import Config"))
    else:
        modified_lines.append(line)

# Write the modified content back to the file
with open(train_semi_path, 'w') as file:
    file.writelines(modified_lines)

print("Modifications applied successfully.")


Modifications applied successfully.


In [None]:
## Install the required libraries
%%capture
!pip install mmcv-lite # the version used to be mmcv-full and now it's mmcv-lite
!pip install torch torchvision torchaudio
!pip install apex
!pip install tensorboardX
!pip install tensorboard
!pip install tensorrt
!pip install tqdm

### 🛠️ **Configuration for Training with FixMatch Algorithm**

The configuration code defines parameters for training a deep learning model on the UCM dataset using the FixMatch algorithm. Key details include:

- **Model Architecture**: The model used is a wideresnet28x2.
- **Hardware**: Training occurs on a single GPU with a batch size of 8.
- **Labeled Samples**: Only 4 labeled samples per class are used for training.

The training process comprises three main components:

1. **Train**: This section specifies the algorithm, the number of training steps, and the loss function.
2. **Model**: Details about the architecture of the model to be trained are provided.
3. **Data**: This section covers the loading and preprocessing of the UCM dataset, including the number of labeled samples, batch size, and data augmentation pipeline.

Other configurations encompass options like the learning rate scheduler, exponential moving average (EMA) of model parameters, automatic mixed precision (AMP) optimization, optimizer and logging/checkpointing settings.

#### Dataset builder and training syntax

For the code to work in a Jupyter notebook environment we need to make two modifications. First, Jupyter notebook creates .ipynb_checkpoints and we need to ignore them and then, make some changes in the code syntax.
1. In dataset/builder.py (lines 15): Add import os
2. In dataset/builder.py (lines 82-92): add check to ignore .ipynb_checkpoints.
3. In train_semi.py (lines 444 & 450): Modify labeled.next() for next(labeled).
4. In train_semi.py (lines 453 & 459): Modify unlabeled.next for next(unlabeled).


In [None]:
# Create the config file for UCM - 4 labeled examples per class - FixMatch

%%writefile ./Classification-SemiCLS/configs/fm_ucm.py

""" The Code is under Tencent Youtu Public Rule"""

train = dict(eval_step=5,#1024
             total_steps=5*20,#1024*512
             trainer=dict(type="FixMatch",
                          threshold=0.95,
                          T=1.,
                          lambda_u=1.,
                          loss_x=dict(
                              type="cross_entropy",
                              reduction="mean"),
                          loss_u=dict(
                              type="cross_entropy",
                              reduction="none"),
                          ))
num_classes = 21

model = dict(
     type="wideresnet",
     depth=28,
     widen_factor=2,
     dropout=0,
     num_classes=num_classes,
)

ucm_mean = (0.485, 0.456, 0.406)
ucm_std = (0.229, 0.224, 0.225)

data = dict(
    type="MyDataset",
    num_workers=4,
    num_labeled=84,
    num_classes=num_classes,
    batch_size=4,
    expand_labels=False,
    mu=7,

    root="./data/UCM/Images",
    labeled_names_file="./data/UCM/Images/UCM_train.txt",
    test_names_file="./data/UCM/Images/UCM_test.txt",
    lpipelines=[[
        dict(type="RandomHorizontalFlip"),
        dict(type="RandomResizedCrop", size=224, scale=(0.2, 1.0)),
        dict(type="ToTensor"),
        dict(type="Normalize", mean=ucm_mean, std=ucm_std)
    ]],
    upipelinse=[[
        dict(type="RandomHorizontalFlip"),
        dict(type="Resize", size=256),
        dict(type="CenterCrop", size=224),
        dict(type="ToTensor"),
        dict(type="Normalize", mean=ucm_mean, std=ucm_std)
        ],
        [
        dict(type="RandomHorizontalFlip"),
        dict(type="RandomResizedCrop", size=224, scale=(0.2, 1.0)),
        dict(type="RandAugmentMC", n=2, m=10),
        dict(type="ToTensor"),
        dict(type="Normalize", mean=ucm_mean, std=ucm_std)
    ]],
    vpipeline=[
        dict(type="Resize", size=256),
        dict(type="CenterCrop", size=224),
        dict(type="ToTensor"),
        dict(type="Normalize", mean=ucm_mean, std=ucm_std)
    ])

scheduler = dict(
    type='cosine_schedule_with_warmup',
    num_warmup_steps=0,
    num_training_steps=train['total_steps']
)

ema = dict(use=True, pseudo_with_ema=False, decay=0.999)
#apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3']."
#"See details at https://nvidia.github.io/apex/amp.html
amp = dict(use=False, opt_level="O1")

log = dict(interval=1)
ckpt = dict(interval=1)

# optimizer
optimizer = dict(type='SGD', lr=0.03, momentum=0.9, weight_decay=0.0005, nesterov=True)

Overwriting ./configs/fm_ucm.py


### 🚀 **Initiate the Training** 🏋️‍♂️


In [None]:
# Append the parent directory (Classification-SemiCLS) to sys.path:
# The list in Python that specifies the directories for modules and packages to import.
import sys
sys.path.append('/content/drive/MyDrive/SSL4RS/Classification-SemiCLS')

# Set working folder as default
%cd "/content/drive/MyDrive/SSL4RS/Classification-SemiCLS"

/content/drive/MyDrive/SSL4RS/Classification-SemiCLS


In [None]:
#Running UCM with Fixmatch baseline
!python3 ./train_semi.py --cfg /content/drive/MyDrive/SSL4RS/Classification-SemiCLS/configs/fm_ucm.py --gpu-id 0 --out /content/drive/MyDrive/SSL4RS/Classification-SemiCLS/results/fixmatch/fm_ucm --seed 5


2023-10-20 08:18:08.941184: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-20 08:18:37,644 - INFO - root -   {'cfg': '/content/drive/MyDrive/SSL4RS/Classification-SemiCLS/configs/fm_ucm.py', 'gpu_id': 0, 'out': '/content/drive/MyDrive/SSL4RS/Classification-SemiCLS/results/fixmatch/fm_ucm', 'pretrained': None, 'resume': '', 'seed': 5, 'use_BN': False, 'fp16': False, 'local_rank': -1, 'no_progress': False, 'other_args': '', 'writer': <torch.utils.tensorboard.writer.SummaryWriter object at 0x7b8c6c9e3be0>, 'amp': False, 'total_steps': 100, 'eval_steps': 5, 'world_size': 1, 'n_gpu': 1, 'device': device(type='cuda', index=0)}
2023-10-20 08:18:46,319 - INFO - models.wideresnet -   Model: WideResNet 28x2 proj Falsex64
2023-10-20 08:18:46,400 - INFO - r

In [None]:
#Running UCM with Fixmatch baseline
!python3 ./train_semi.py --cfg /content/drive/MyDrive/SSL4RS/Classification-SemiCLS/configs/fm_ucm.py --gpu-id 0 --out /content/drive/MyDrive/SSL4RS/Classification-SemiCLS/results/fixmatch/fm_ucm --seed 5


2023-10-20 08:26:13.330372: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-20 08:26:16,890 - INFO - root -   {'cfg': '/content/drive/MyDrive/SSL4RS/Classification-SemiCLS/configs/fm_ucm.py', 'gpu_id': 0, 'out': '/content/drive/MyDrive/SSL4RS/Classification-SemiCLS/results/fixmatch/fm_ucm', 'pretrained': None, 'resume': '', 'seed': 5, 'use_BN': False, 'fp16': False, 'local_rank': -1, 'no_progress': False, 'other_args': '', 'writer': <torch.utils.tensorboard.writer.SummaryWriter object at 0x7d7952075270>, 'amp': False, 'total_steps': 100, 'eval_steps': 5, 'world_size': 1, 'n_gpu': 1, 'device': device(type='cuda', index=0)}
2023-10-20 08:26:16,958 - INFO - models.wideresnet -   Model: WideResNet 28x2 proj Falsex64
2023-10-20 08:26:17,008 - INFO - r

### 🛠️ **Configuration for Training with CoMatch Algorithm**

In [None]:
# Create the config file for UCM - 4 labeled examples per class - CoMatch

%%writefile /content/drive/MyDrive/SSL4RS/Classification-SemiCLS/configs/comatch_ucm.py

""" The Code is under Tencent Youtu Public Rule"""

train = dict(
    eval_step=1024,
    total_steps=2**10*512,
    trainer=dict(
        type='CoMatch',
        threshold=0.95, #pseudolabel threshold
        queue_batch=5,  #memory buffer
        contrast_threshold=0.8, #similarity matrix
        da_len=32, #distribution alignment
        T=0.2, # temperature
        alpha=0.9,# 1-alpha for memory smoothed pseudo label
        lambda_u=1.0, #unlabeled loss
        lambda_c=1.0, #contrastive loss
        loss_x=dict(type="cross_entropy", reduction="mean"))) #supervised loss

num_classes = 21

model = dict(
     #type='wideresnet', #config for wideresnet purposes
     #depth=28,
     #widen_factor=2, #reducing number of filters for memory
     #dropout=0,
     type="resnet18", #config for resnet purposes
     width=1,
     in_channel=3,
     num_class=num_classes,
     proj=True,
     low_dim=64, # projection head
)

# Obtained from Imagenet
ucm_mean = [0.485, 0.456, 0.406]
ucm_std = [0.229, 0.224, 0.225]

data = dict(
    # Dataset configuration
    type="MyDataset", #customized dataset
    num_workers=4,
    num_labeled=84, #num_labeled/num_classes=labeled samples per class
    num_classes=num_classes,
    batch_size=32, #reducing batch for memory
    expand_labels=False,
    mu=7, #labeled to unlabeled data ratio

    #input data folder
    root="./data/UCM/Images", #"./Classification-SemiCLS/data/AID",
    labeled_names_file="./data/UCM/Images/UCM_train.txt", #"./Classification-SemiCLS/data/AID/AID_train.txt",
    test_names_file="./data/UCM/Images/UCM_test.txt", #"./Classification-SemiCLS/data/AID/AID_test.txt",

    # labeled data preprocessing
    lpipelines=[[
        dict(type="Resize", size=64),
        dict(type="RandomHorizontalFlip", p=0.5),
        #dict(type="RandomResizedCrop", size=224, scale=(0.2, 1.0)),
        dict(type="RandomResizedCrop", size=60, scale=(0.2, 1.0)),
        dict(type="ToTensor"),
        dict(type="Normalize", mean=ucm_mean, std=ucm_std)
    ]],

    # unlabeled data preprocessing
    upipelinse=[
        # weak augmentation
        [
        dict(type="Resize", size=64),
        dict(type="RandomHorizontalFlip"),
        #dict(type="Resize", size=256),
        #dict(type="CenterCrop", size=224),
        dict(type="CenterCrop", size=60),
        dict(type="ToTensor"),
        dict(type="Normalize", mean=ucm_mean, std=ucm_std)
        ],

        # strong augmentation 1
        [
            dict(type="Resize", size=64),
            dict(type="RandomHorizontalFlip"),
            #dict(type="RandomResizedCrop", size=224, scale=(0.2, 1.0)),
            dict(type="RandomResizedCrop", size=60, scale=(0.2, 1.0)),
            dict(type="RandAugmentMC", n=2, m=10),
            dict(type="ToTensor"),
            dict(type="Normalize", mean=ucm_mean, std=ucm_std)
        ],

        # strong augmentation 2
        [
            dict(type="Resize", size=64),
            #dict(type="RandomResizedCrop", size=224, scale=(0.2, 1.0)),
            dict(type="RandomResizedCrop", size=60, scale=(0.2, 1.0)),
            dict(type="RandomHorizontalFlip"),
            dict(type="RandomApply",
                    transforms=[
                        dict(type="ColorJitter",
                            brightness=0.4,
                            contrast=0.4,
                            saturation=0.4,
                            hue=0.1),
                    ],
                    p=0.8),
            dict(type="RandomGrayscale", p=0.2),
            dict(type="ToTensor")
        ]],

    # validation data preprocessing
    vpipeline=[
        dict(type="Resize", size=64),
        #dict(type="Resize", size=256),
        #dict(type="CenterCrop", size=224),
        dict(type="CenterCrop", size=60),
        dict(type="ToTensor"),
        dict(type="Normalize", mean=ucm_mean, std=ucm_std)
    ])

scheduler = dict(
    type='cosine_schedule_with_warmup',
    num_warmup_steps=0,
    num_training_steps=train['total_steps']
)

ema = dict(use=True, pseudo_with_ema=False, decay=0.999)
#apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3']." "See details at https://nvidia.github.io/apex/amp.html
amp = dict(use=False, opt_level="O1")

log = dict(interval=1)
#log = dict(interval=512)
ckpt = dict(interval=1)

# optimizer
optimizer = dict(type='SGD', lr=0.03, momentum=0.9, weight_decay=0.0005, nesterov=True)


Writing /content/drive/MyDrive/SSL4RS/Classification-SemiCLS/configs/comatch_ucm.py


In [None]:
#Running UCM with Fixmatch baseline
!python3 ./train_semi.py --cfg /content/drive/MyDrive/SSL4RS/Classification-SemiCLS/configs/comatch_ucm.py --gpu-id 0 --out /content/drive/MyDrive/SSL4RS/Classification-SemiCLS/results/comatch/comatch_ucm --seed 5


2023-10-20 08:20:22.324534: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-20 08:20:25,754 - INFO - root -   {'cfg': '/content/drive/MyDrive/SSL4RS/Classification-SemiCLS/configs/comatch_ucm.py', 'gpu_id': 0, 'out': '/content/drive/MyDrive/SSL4RS/Classification-SemiCLS/results/comatch/comatch_ucm', 'pretrained': None, 'resume': '', 'seed': 5, 'use_BN': False, 'fp16': False, 'local_rank': -1, 'no_progress': False, 'other_args': '', 'writer': <torch.utils.tensorboard.writer.SummaryWriter object at 0x7bdb2ab15f30>, 'amp': False, 'total_steps': 524288, 'eval_steps': 1024, 'world_size': 1, 'n_gpu': 1, 'device': device(type='cuda', index=0)}
2023-10-20 08:20:26,005 - INFO - root -   resnet18 Total params: 12.37M
2023-10-20 08:20:26,262 - INFO - root -