# Training with Pretrained ResNet Backbones (W&B Integration)

This notebook demonstrates iSAID instance segmentation training using **pretrained ResNet-50/ResNet-101** backbones with FPN from torchvision, instead of our custom EfficientNet + CBAM backbone.

**Features:**

- Uses torchvision's pretrained ResNet-50-FPN or ResNet-101-FPN Mask R-CNN
- Automatic logging of training/validation losses and metrics
- Learning rate scheduling (OneCycleLR or ReduceLROnPlateau)
- Validation predictions visualization
- Model checkpointing as W&B artifacts
- mAP, mean IoU, and overfitting gap metrics


## 1. Setup


In [1]:
!git clone https://github.com/michaelo-ponteski/isaid-instance-segmentation.git

Cloning into 'isaid-instance-segmentation'...
remote: Enumerating objects: 496, done.[K
remote: Counting objects: 100% (218/218), done.[K
remote: Compressing objects: 100% (148/148), done.[K
remote: Total 496 (delta 130), reused 139 (delta 70), pack-reused 278 (from 1)[K
Receiving objects: 100% (496/496), 5.50 MiB | 21.24 MiB/s, done.
Resolving deltas: 100% (258/258), done.


In [2]:
%cd isaid-instance-segmentation/
!git pull

/content/isaid-instance-segmentation
Already up to date.


In [3]:
import os
import sys
import gc
import numpy as np
import torch
from pathlib import Path

# Add project root to path
sys.path.insert(0, str(Path.cwd().parent))

# Set memory optimization for CUDA
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# Check device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
if device == "cuda":
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(
        f"Available memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB"
    )

Using device: cuda
GPU: NVIDIA A100-SXM4-40GB
Available memory: 42.5 GB


In [None]:
!pip install --upgrade wandb

In [4]:
# Install wandb if not available
try:
    import wandb
    print(f"wandb version: {wandb.__version__}") # Must be newest
except ImportError:
    print("Installing wandb...")
    !pip install --upgrade wandb
    import wandb

wandb version: 0.24.0


### Kaggle wandb API setup


In [5]:
wandb.login(
    key="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"
)

[34m[1mwandb[0m: [wandb.login()] Using explicit session credentials for https://api.wandb.ai.
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmichaelo-ponteski[0m ([33mmarek-olnk-put-pozna-[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [6]:
import importlib
import datasets.isaid_dataset
import training.transforms
import training.trainer

importlib.reload(datasets.isaid_dataset)
importlib.reload(training.transforms)
importlib.reload(training.trainer)

from datasets.isaid_dataset import iSAIDDataset
from training.transforms import get_transforms
from training.trainer import Trainer, create_datasets
from training.wandb_logger import ISAID_CLASS_LABELS

# Import torchvision's pretrained Mask R-CNN models
from torchvision.models.detection import (
    maskrcnn_resnet50_fpn,
    maskrcnn_resnet50_fpn_v2,
    MaskRCNN_ResNet50_FPN_Weights,
    MaskRCNN_ResNet50_FPN_V2_Weights,
)
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

print("All modules imported successfully!")
print(f"\niSAID Class Labels:")
for idx, name in ISAID_CLASS_LABELS.items():
    print(f"  {idx}: {name}")

All modules imported successfully!

iSAID Class Labels:
  0: background
  1: ship
  2: storage_tank
  3: baseball_diamond
  4: tennis_court
  5: basketball_court
  6: ground_track_field
  7: bridge
  8: large_vehicle
  9: small_vehicle
  10: helicopter
  11: swimming_pool
  12: roundabout
  13: soccer_ball_field
  14: plane
  15: harbor


## 2. Configuration


In [7]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("michaeloponteski/isaid-patches")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/michaeloponteski/isaid-patches?dataset_version_number=1...


100%|██████████| 41.1G/41.1G [32:13<00:00, 22.8MB/s]   

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/michaeloponteski/isaid-patches/versions/1


In [8]:
root_dir = path + "/iSAID_patches"

In [9]:
# Choose backbone: "resnet50" or "resnet101"
# Note: torchvision provides resnet50_fpn pretrained, for resnet101 we build it manually
BACKBONE_CHOICE = "resnet50"  # Options: "resnet50", "resnet50_v2", "resnet101"

# All hyperparameters in one place - this will be logged to W&B
HYPERPARAMETERS = {
    # Dataset
    "data_root": root_dir,
    "num_classes": 16,
    "image_size": 800,
    # Training
    "batch_size": 8,
    "val_batch_size": 8,
    "num_epochs": 20,
    "learning_rate": 0.0003,
    "weight_decay": 0.001,
    "momentum": 0.9,
    # Model Architecture
    "backbone": BACKBONE_CHOICE,
    "pretrained_backbone": True,
    "pretrained_coco": True,  # Use COCO pretrained weights
    # RPN Anchors (optimized for iSAID)
    "anchor_sizes": ((8, 16), (16, 32), (32, 64), (64, 128), (128, 256)),
    "aspect_ratios": ((0.5, 1.0, 2.0),) * 5,
    # W&B Logging
    "wandb_project": "isaid-resnet50-segmentation",
    "wandb_entity": "marek-olnk-put-pozna-",
    "wandb_log_freq": 20,  # Log every N batches
    "wandb_num_val_images": 4,  # Number of images for validation visualization
    "wandb_conf_threshold": 0.5,  # Confidence threshold for predictions
}

print("Hyperparameters:")
for k, v in HYPERPARAMETERS.items():
    print(f"  {k}: {v}")

Hyperparameters:
  data_root: /root/.cache/kagglehub/datasets/michaeloponteski/isaid-patches/versions/1/iSAID_patches
  num_classes: 16
  image_size: 800
  batch_size: 8
  val_batch_size: 8
  num_epochs: 20
  learning_rate: 0.0003
  weight_decay: 0.001
  momentum: 0.9
  backbone: resnet50
  pretrained_backbone: True
  pretrained_coco: True
  anchor_sizes: ((8, 16), (16, 32), (32, 64), (64, 128), (128, 256))
  aspect_ratios: ((0.5, 1.0, 2.0), (0.5, 1.0, 2.0), (0.5, 1.0, 2.0), (0.5, 1.0, 2.0), (0.5, 1.0, 2.0))
  wandb_project: isaid-resnet50-segmentation
  wandb_entity: marek-olnk-put-pozna-
  wandb_log_freq: 20
  wandb_num_val_images: 4
  wandb_conf_threshold: 0.5


## 3. Load Data


In [10]:
# Create datasets
train_dataset, val_dataset = create_datasets(
    data_root=HYPERPARAMETERS["data_root"],
    image_size=HYPERPARAMETERS["image_size"],
    subset_fraction=1.0,  # Use full dataset
)

print(f"Train samples: {len(train_dataset)}")
print(f"Val samples: {len(val_dataset)}")

Loading datasets...

DATASET STATISTICS: TRAIN

Image Counts:
   Original images:        28029
   Final images:           26431

Rejected Images (1598 total):
   - Too many boxes (>400): 230
   - Empty image excess:       1368

Box Distribution (final dataset):
   Empty images (0 boxes):  7929 (30.0%)
   Non-empty images:        18502

Box Count Statistics:
   Min:    0
   Max:    400
   Mean:   21.2
   Median: 3.0
   Std:    47.2

   Percentiles:
     25th: 0
     50th: 3
     75th: 18
     90th: 58
     95th: 106
     99th: 262


DATASET STATISTICS: VAL

Image Counts:
   Original images:        9512
   Final images:           8551

Rejected Images (961 total):
   - Too many boxes (>400): 62
   - Empty image excess:       899

Box Distribution (final dataset):
   Empty images (0 boxes):  2565 (30.0%)
   Non-empty images:        5986

Box Count Statistics:
   Min:    0
   Max:    400
   Mean:   22.0
   Median: 3.0
   Std:    48.0

   Percentiles:
     25th: 0
     50th: 3
     75th: 20

## 4. Create Model with Pretrained ResNet Backbone

We use torchvision's pretrained Mask R-CNN models and modify the prediction heads for our number of classes.


In [11]:
def create_maskrcnn_resnet(num_classes, backbone_type="resnet50", pretrained_coco=True):
    """
    Create Mask R-CNN with pretrained ResNet backbone.

    Args:
        num_classes: Number of classes (including background)
        backbone_type: "resnet50", "resnet50_v2", or "resnet101"
        pretrained_coco: Whether to use COCO pretrained weights

    Returns:
        Mask R-CNN model
    """
    if backbone_type == "resnet50":
        # ResNet-50 FPN (original)
        if pretrained_coco:
            weights = MaskRCNN_ResNet50_FPN_Weights.COCO_V1
            model = maskrcnn_resnet50_fpn(weights=weights)
        else:
            model = maskrcnn_resnet50_fpn(
                weights=None, weights_backbone="IMAGENET1K_V1"
            )

    elif backbone_type == "resnet50_v2":
        # ResNet-50 FPN V2 (improved, better performance)
        if pretrained_coco:
            weights = MaskRCNN_ResNet50_FPN_V2_Weights.COCO_V1
            model = maskrcnn_resnet50_fpn_v2(weights=weights)
        else:
            model = maskrcnn_resnet50_fpn_v2(
                weights=None, weights_backbone="IMAGENET1K_V1"
            )

    elif backbone_type == "resnet101":
        # ResNet-101 FPN - build manually using backbone_resnet with resnet101
        from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
        from torchvision.models.detection import MaskRCNN
        from torchvision.models import ResNet101_Weights

        # Create ResNet-101 FPN backbone
        backbone = resnet_fpn_backbone(
            backbone_name="resnet101",
            weights=ResNet101_Weights.IMAGENET1K_V1 if pretrained_coco else None,
            trainable_layers=5,  # Train all layers
        )

        # Create Mask R-CNN with ResNet-101 backbone
        model = MaskRCNN(
            backbone,
            num_classes=num_classes,
        )
        return model  # Already has correct num_classes

    else:
        raise ValueError(
            f"Unknown backbone type: {backbone_type}. Use 'resnet50', 'resnet50_v2', or 'resnet101'"
        )

    # For resnet50/resnet50_v2: Replace the pre-trained head with a new one for our num_classes
    # Get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features

    # Replace the box predictor
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # Replace the mask predictor
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    model.roi_heads.mask_predictor = MaskRCNNPredictor(
        in_features_mask, hidden_layer, num_classes
    )

    return model


# Create model
model = create_maskrcnn_resnet(
    num_classes=HYPERPARAMETERS["num_classes"],
    backbone_type=HYPERPARAMETERS["backbone"],
    pretrained_coco=HYPERPARAMETERS["pretrained_coco"],
)

# Print model summary
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Backbone: {HYPERPARAMETERS['backbone']}")
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
print(f"Model size: {total_params * 4 / 1e6:.1f} MB (FP32)")

Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /root/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth


100%|██████████| 170M/170M [00:00<00:00, 247MB/s] 


Backbone: resnet50
Total parameters: 43,997,743
Trainable parameters: 43,775,343
Model size: 176.0 MB (FP32)


## 5. Create Trainer with W&B Integration


In [12]:
# Create trainer with W&B integration
# The trainer handles all logging automatically!
trainer = Trainer(
    train_dataset=train_dataset,
    val_dataset=val_dataset,
    model=model,
    batch_size=HYPERPARAMETERS["batch_size"],
    val_batch_size=HYPERPARAMETERS["val_batch_size"],
    lr=HYPERPARAMETERS["learning_rate"],
    device=device,
    use_amp=True,
    num_workers=4,
    # W&B configuration
    wandb_project=HYPERPARAMETERS["wandb_project"],
    wandb_entity=HYPERPARAMETERS["wandb_entity"],
    wandb_tags=[
        "maskrcnn",
        HYPERPARAMETERS["backbone"],
        "pretrained",
        "trainer-integrated",
    ],
    wandb_notes=f"Training with {HYPERPARAMETERS['backbone']} backbone (COCO pretrained) + FPN",
    wandb_log_freq=HYPERPARAMETERS["wandb_log_freq"],
    wandb_num_val_images=HYPERPARAMETERS["wandb_num_val_images"],
    wandb_conf_threshold=HYPERPARAMETERS["wandb_conf_threshold"],
    hyperparameters=HYPERPARAMETERS,
)

print(f"\nW&B Run: {trainer.wandb_logger.run.name}")
print(f"URL: {trainer.wandb_logger.run.url}")

Using provided datasets: 26431 train, 8551 val samples
Using provided model
Optimizer parameter groups:
  Base params: 64 tensors, lr=3.00e-04
  RoI params:  20 tensors, lr=7.50e-05 (alpha=0.25)
Device: cuda
AMP enabled: True
Train samples: 26431
Val samples: 8551


W&B run initialized: ethereal-wildflower-1
View at: https://wandb.ai/marek-olnk-put-pozna-/isaid-resnet50-segmentation/runs/vhozj3u4
Selected 4 validation images for visualization
W&B logging enabled: https://wandb.ai/marek-olnk-put-pozna-/isaid-resnet50-segmentation/runs/vhozj3u4

W&B Run: ethereal-wildflower-1
URL: https://wandb.ai/marek-olnk-put-pozna-/isaid-resnet50-segmentation/runs/vhozj3u4


## 6. Training

The `Trainer.fit()` method handles everything:

- Training loop with gradient clipping and AMP
- Validation loss computation
- mAP and mean IoU metrics
- W&B logging (losses, gradients, predictions, checkpoints)
- Learning rate scheduling
- Best model saving


In [None]:
# Run training!
# All W&B logging happens automatically inside trainer.fit()
history = trainer.fit(
    epochs=HYPERPARAMETERS["num_epochs"],
    save_dir="checkpoints",
    compute_metrics_every=1,  # Compute mAP every epoch
    max_map_samples=200,  # Limit samples for faster mAP computation
)

print("\nTraining complete!")

Using ReduceLROnPlateau scheduler (steps on validation mAP)

Epoch 1/20 | LR: 3.00e-04


Train Epoch 1:   0%|          | 0/3303 [00:00<?, ?it/s]

Validation:   0%|          | 0/1069 [00:00<?, ?it/s]

Computing mAP metrics...

Epoch 1 Results (Time: 1380.3s):
  Losses:
    Train: 0.9477
    Val:   0.7755
  Performance Metrics:
    Train mAP@0.5: 0.4175
    Val mAP@0.5:   0.4154 (primary metric)
    Val Mean IoU:  0.4628
  Training Dynamics:
    Gradient Norm: nan
    Loss Variance: 0.160097
    mAP Gap (train-val): +0.0022
  Detailed Train Losses:
    loss_classifier: 0.2122
    loss_box_reg: 0.1920
    loss_mask: 0.3018
    loss_objectness: 0.1352
    loss_rpn_box_reg: 0.1065
-> New best model saved (by loss)
Model checkpoint logged as artifact: isaid-model-best-val-loss
-> New best val mAP@0.5: 0.4154
Model checkpoint logged as artifact: isaid-model-best-train-map
-> New best train mAP@0.5: 0.4175

Epoch 2/20 | LR: 3.00e-04


Train Epoch 2:   0%|          | 0/3303 [00:00<?, ?it/s]

In [None]:
# Create artifact for the final trained model
artifact = wandb.Artifact(
    name=f"isaid-maskrcnn-{HYPERPARAMETERS['backbone']}-final",
    type="model",
    description=f"Final trained Mask R-CNN ({HYPERPARAMETERS['backbone']}) after {HYPERPARAMETERS['num_epochs']} epochs",
    metadata={
        "backbone": HYPERPARAMETERS["backbone"],
        "num_classes": HYPERPARAMETERS["num_classes"],
        "pretrained_coco": HYPERPARAMETERS["pretrained_coco"],
        "final_train_loss": history["train/loss"][-1],
        "final_val_loss": history["val/loss"][-1],
        "final_val_mAP": history["val/mAP@0.5"][-1],
        "best_val_mAP": max(history["val/mAP@0.5"]),
    }
)

# Add model checkpoint files
artifact.add_file("checkpoints/best.pth", name="best_model.pth")
artifact.add_file("checkpoints/best_map.pth", name="best_map_model.pth")
artifact.add_file("checkpoints/last.pth", name="last_model.pth")

# Log the artifact
trainer.wandb_logger.run.log_artifact(artifact)

print(f"Model artifacts saved to W&B!")
print(f"  - best_model.pth (lowest val loss)")
print(f"  - best_map_model.pth (highest val mAP)")
print(f"  - last_model.pth (final epoch)")

## 7. Visualize Results


In [None]:
# Plot training history
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Loss curves
ax = axes[0, 0]
ax.plot(history["train/loss"], label="Train Loss")
ax.plot(history["val/loss"], label="Val Loss")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.set_title("Training & Validation Loss")
ax.legend()
ax.grid(True, alpha=0.3)

# mAP curves
ax = axes[0, 1]
ax.plot(history["train/mAP@0.5"], label="Train mAP@0.5")
ax.plot(history["val/mAP@0.5"], label="Val mAP@0.5")
ax.set_xlabel("Epoch")
ax.set_ylabel("mAP@0.5")
ax.set_title("mAP Performance")
ax.legend()
ax.grid(True, alpha=0.3)

# Learning rate
ax = axes[1, 0]
ax.plot(history["train/lr"])
ax.set_xlabel("Epoch")
ax.set_ylabel("Learning Rate")
ax.set_title("Learning Rate Schedule")
ax.set_yscale("log")
ax.grid(True, alpha=0.3)

# Gradient norm
ax = axes[1, 1]
ax.plot(history["train/grad_norm"])
ax.set_xlabel("Epoch")
ax.set_ylabel("Gradient Norm")
ax.set_title("Training Gradient Norm")
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 8. Visualize Predictions


In [None]:
# Visualize predictions on validation set
trainer.visualize_predictions(
    num_samples=5,
    score_threshold=0.5,
    mask_alpha=0.4,
)

## 9. Finish W&B Run


In [None]:
# Finish the W&B run
trainer.finish()

print(f"\nW&B run completed!")
print(f"View results at: {trainer.wandb_logger.run.url}")

## 10. Load Model from W&B Artifact (Optional)


In [None]:
# Example: Load best model from W&B artifacts
# Uncomment to use

# import wandb
# api = wandb.Api()
# artifact = api.artifact('YOUR_ENTITY/isaid-resnet-segmentation/isaid-model:best')
# artifact_dir = artifact.download()
#
# # Recreate model with same architecture
# model = create_maskrcnn_resnet(num_classes=16, backbone_type="resnet50")
# model.load_state_dict(torch.load(f"{artifact_dir}/best_model.pth"))
# model.eval()
# print("Model loaded from W&B artifact!")