# Mamba UAV Detector - Incremental Training Notebook

This notebook implements an **incremental training pipeline** for the Mamba-based UAV detector.

## Storage-Aware Workflow
Due to storage constraints (~10GB available), we train one UAV type at a time:
```
Download UAV-A ‚Üí Train ‚Üí Save checkpoint ‚Üí Delete UAV-A
Download UAV-B ‚Üí Load checkpoint ‚Üí Continue training ‚Üí Delete UAV-B
... repeat for all 12 UAV types ...
```

## Table of Contents
1. [Setup & Imports](#1-setup)
2. [Download Utilities](#2-download)
3. [Configuration](#3-config)
4. [Model Setup](#4-model)
5. [Training Loop (per UAV part)](#5-training)
6. [Cleanup](#6-cleanup)
7. [Evaluation](#7-evaluation)
8. [Export Model](#8-export)

<a id='1-setup'></a>
## 1. Setup & Imports

In [3]:
import os
import sys
import warnings
from pathlib import Path

warnings.filterwarnings('ignore')

print("1") # do not remove prints (–∫–æ—Å—Ç–∏–ª—å)
# Add project root to path
PROJECT_ROOT = Path(os.getcwd()).parent
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

# Core imports
import torch
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping

print("2")

# Local imports
from mamba.config import Config
from mamba.trainer import MambaDetectorModule
from mamba.dataset import (
    download_uav_part,
    cleanup_uav_part,
    get_available_uav_types,
    create_dataloaders,
    DATASET_URLS,
)

print("3")

# Device detection
if torch.cuda.is_available():
    DEVICE = 'cuda'
    os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
    print(f"üöÄ GPU: {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
    DEVICE = 'mps'
    print("üçé Using Apple MPS (with LSTM fallback for Mamba)")
else:
    DEVICE = 'cpu'
    print("üíª Using CPU")

print(f"\nüì¶ Available UAV types for download: {list(DATASET_URLS.keys())}")

1
2


ModuleNotFoundError: No module named 'xmltodict'

<a id='2-download'></a>
## 2. Download Utilities

Functions to download, extract, and cleanup UAV parts one at a time.

In [10]:
# Check what's currently downloaded
DATA_ROOT = PROJECT_ROOT / "data" / "MMFW-UAV" / "raw"
available = get_available_uav_types(str(DATA_ROOT))
print(f"üìÅ Currently downloaded UAV types: {available if available else 'None'}")

# Estimate storage per UAV type (~1-2GB each)
print("\nüíæ Estimated storage per UAV type:")
print("   - UAV-A to UAV-F: ~10GB each")
print("   - Total dataset: ~100GB")
print("   - With incremental training, you only need ~10GB at a time")

üìÅ Currently downloaded UAV types: ['A']

üíæ Estimated storage per UAV type:
   - UAV-A to UAV-F: ~10GB each
   - Total dataset: ~100GB
   - With incremental training, you only need ~10GB at a time


<a id='3-config'></a>
## 3. Configuration

Training hyperparameters. Adjust based on your hardware.

In [None]:
# Training configuration
config = Config()

# Data settings
config.data.data_root = str(DATA_ROOT)
config.data.batch_size = 2  # Reduce if OOM
config.data.num_workers = 2  # 0 for debugging
config.data.sequence_length = 8
config.data.stride = 5
config.data.img_size = 640
config.data.sensor_type = "Zoom"  # Zoom, Wide, or Infrared
config.data.view = "Top_Down"  # Top_Down, Horizontal, or Bottom_Up

# Model settings
config.model.mamba_type = "vision"
config.model.backbone = "mobilevit_s"
config.model.d_model = 256
config.model.d_state = 16
config.model.mamba_layers = 4

# Training settings
config.training.max_epochs = 10  # Epochs per UAV part
config.training.lr = 1e-3
config.training.weight_decay = 1e-4

# Checkpoint path
CHECKPOINT_DIR = PROJECT_ROOT / "outputs" / "checkpoints"
CHECKPOINT_DIR.mkdir(parents=True, exist_ok=True)
MOD_ARCH = "vis_mamba_simple_head"
LATEST_CHECKPOINT = CHECKPOINT_DIR / f"{MOD_ARCH}_latest.ckpt"

print(f"‚úÖ Configuration loaded")
print(f"   Checkpoint dir: {CHECKPOINT_DIR}")

‚úÖ Configuration loaded
   Checkpoint dir: /teamspace/studios/this_studio/Hot-Peppers-Company-Computer-Vision/outputs/checkpoints


<a id='4-model'></a>
## 4. Model Setup

Initialize model, optionally loading from checkpoint for continued training.

In [12]:
# Initialize or load model
if LATEST_CHECKPOINT.exists():
    print(f"üìÇ Loading checkpoint: {LATEST_CHECKPOINT}")
    model = MambaDetectorModule.load_from_checkpoint(
        str(LATEST_CHECKPOINT),
        config=config,
    )
    print("‚úÖ Model loaded from checkpoint")
else:
    print("üÜï Creating new model")
    model = MambaDetectorModule(config)
    print("‚úÖ New model created")

# Print model summary
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nüìä Model stats:")
print(f"   Total parameters: {total_params:,}")
print(f"   Trainable parameters: {trainable_params:,}")

üÜï Creating new model


‚úÖ New model created

üìä Model stats:
   Total parameters: 6,894,245
   Trainable parameters: 6,894,245


<a id='5-training'></a>
## 5. Training Loop (per UAV part)

**‚ö†Ô∏è Run this cell for each UAV part you want to train on:**
1. Change `CURRENT_UAV` to the UAV type you want to download and train
2. Run the cell - it will download, train, and save checkpoint
3. Optionally run the cleanup cell to delete downloaded data
4. Repeat with the next UAV type

In [13]:
# ===== CHANGE THIS FOR EACH PART =====
CURRENT_UAV = "A"  # Options: "A", "B", "C", "D", "E", "F"
# =====================================

print(f"\n{'='*60}")
print(f"üéØ TRAINING ON UAV-{CURRENT_UAV}")
print(f"{'='*60}\n")

# Step 1: Download this UAV part
print("üì• Step 1: Downloading data...")
try:
    uav_path = download_uav_part(CURRENT_UAV, output_dir=str(DATA_ROOT))
except Exception as e:
    print(f"‚ùå Download failed: {e}")
    raise


üéØ TRAINING ON UAV-A

üì• Step 1: Downloading data...
‚úÖ UAV-A already exists at: /teamspace/studios/this_studio/Hot-Peppers-Company-Computer-Vision/data/MMFW-UAV/raw/Fixed-wing-UAV-A
   Skipping download. Delete folder to re-download.


In [16]:
import gc

print(f"old GPU memory allocated: {torch.cuda.memory_allocated(0)/1024**3:.2f} GB")
print(f"old GPU memory reserved: {torch.cuda.memory_reserved(0)/1024**3:.2f} GB")

# Add before trainer.fit()
torch.cuda.empty_cache()
gc.collect()

# Also check what's using memory
print(f"new GPU memory allocated: {torch.cuda.memory_allocated(0)/1024**3:.2f} GB")
print(f"new GPU memory reserved: {torch.cuda.memory_reserved(0)/1024**3:.2f} GB")
print(f"new GPU memory stats: {torch.cuda.memory_summary(0)}")

old GPU memory allocated: 21.95 GB
old GPU memory reserved: 22.02 GB
new GPU memory allocated: 0.00 GB
new GPU memory reserved: 22.02 GB
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 2            |        cudaMalloc retries: 2         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |      0 B   |  22511 MiB |  68792 MiB |  68792 MiB |
|       from large pool |      0 B   |  22458 MiB |  68641 MiB |  68641 MiB |
|       from small pool |      0 B   |     53 MiB |    150 MiB |    150 MiB |
|---------------------------------------------------------------------------|
| Active memory         |      0 B   |  22511 MiB |  68792 MiB |  68792 MiB |
|       from large pool |      0 B   |  22458 MiB |  68641 MiB |  68641 MiB |
|    

In [17]:
# ===== CHANGE THIS FOR EACH PART =====
EPOCHS_THIS_PART = 31  # Epochs to train on this part
# =====================================

print(f"\n{'='*60}")
print(f"üéØ TRAINING ON UAV-{CURRENT_UAV}")
print(f"{'='*60}\n")

from mamba.dataset import create_dataloaders

# Step 2: Create dataloaders for this UAV part
print(f"\nüìÅ Step 2: Creating dataloaders for UAV-{CURRENT_UAV}...")
train_loader, val_loader, test_loader = create_dataloaders(
    data_root=str(DATA_ROOT),
    batch_size=config.data.batch_size,
    num_workers=config.data.num_workers,
    sequence_length=config.data.sequence_length,
    stride=config.data.stride,
    img_size=config.data.img_size,
    sensor_type=config.data.sensor_type,
    view=config.data.view,
    uav_types=[CURRENT_UAV],  # Only train on this UAV part!
)
print(f"   Train batches: {len(train_loader)}")
print(f"   Val batches: {len(val_loader)}")

# Step 3: Setup trainer
print(f"\nüèãÔ∏è Step 3: Setting up trainer...")
checkpoint_callback = ModelCheckpoint(
    dirpath=str(CHECKPOINT_DIR),
    filename=f"uav-{CURRENT_UAV}-" + "{epoch:02d}-{val_loss:.4f}",
    save_top_k=1,
    monitor="val_loss",
    mode="min",
    save_last=True,
)

early_stopping = EarlyStopping(
    monitor="val_loss",
    patience=5,
    mode="min",
)

trainer = pl.Trainer(
    max_epochs=EPOCHS_THIS_PART,
    accelerator="auto",
    devices=1,
    precision="16-mixed" if DEVICE == "cuda" else 32, # 16-mixed does not support sigmoid, switched to logits
    callbacks=[checkpoint_callback, early_stopping],
    enable_progress_bar=True,
    gradient_clip_val=1.0,
    log_every_n_steps=10,
)

# Step 4: Train!
print(f"\nüöÄ Step 4: Training for {EPOCHS_THIS_PART} epochs...")
trainer.fit(model, train_loader, val_loader)

# Step 5: Save checkpoint for next part
print(f"\nüíæ Step 5: Saving checkpoint...")
trainer.save_checkpoint(str(LATEST_CHECKPOINT))
print(f"‚úÖ Saved to: {LATEST_CHECKPOINT}")

print(f"\n{'='*60}")
print(f"‚úÖ FINISHED TRAINING ON UAV-{CURRENT_UAV}")
print(f"   Best val_loss: {checkpoint_callback.best_model_score:.4f}")
print(f"{'='*60}")

Using 16bit Automatic Mixed Precision (AMP)



üéØ TRAINING ON UAV-A


üìÅ Step 2: Creating dataloaders for UAV-A...
grabbed frames,  1451 0_1_000001.jpg
grabbed frames,  1451 0_1_000001.jpg
grabbed frames,  1451 0_1_000001.jpg
   Train batches: 101
   Val batches: 22

üèãÔ∏è Step 3: Setting up trainer...


GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
üí° Tip: For seamless cloud logging and experiment tracking, try installing [litlogger](https://pypi.org/project/litlogger/) to enable LitLogger, which logs metrics and artifacts automatically to the Lightning Experiments platform.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]



üöÄ Step 4: Training for 31 epochs...



  | Name      | Type             | Params | Mode | FLOPs
--------------------------------------------------------------
0 | model     | MambaUAVDetector | 6.9 M  | eval | 0    
1 | criterion | DetectionLoss    | 0      | eval | 0    
--------------------------------------------------------------
6.9 M     Trainable params
0         Non-trainable params
6.9 M     Total params
27.577    Total estimated model params size (MB)
0         Modules in train mode
470       Modules in eval mode
0         Total Flops


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`weights_only` was not set, defaulting to `False`.



üíæ Step 5: Saving checkpoint...
‚úÖ Saved to: /teamspace/studios/this_studio/Hot-Peppers-Company-Computer-Vision/outputs/checkpoints/van_mamba_simple_head_latest.ckpt

‚úÖ FINISHED TRAINING ON UAV-A
   Best val_loss: inf


<a id='6-cleanup'></a>
## 6. Cleanup (Optional)

Delete downloaded data to free up storage before downloading the next UAV part.

In [None]:
# Delete the UAV part we just trained on to free space
cleanup_uav_part(CURRENT_UAV, data_dir=str(DATA_ROOT))

# Verify deletion
available = get_available_uav_types(str(DATA_ROOT))
print(f"\nüìÅ Currently downloaded UAV types: {available if available else 'None'}")

### üìã Training Progress Tracker

Keep track of which UAV parts you've trained on:

| UAV | Status | Epochs | Notes |
|-----|--------|--------|-------|
| A   | ‚¨ú Not started |  |  |
| B   | ‚¨ú Not started |  |  |
| C   | ‚¨ú Not started |  |  |
| D   | ‚¨ú Not started |  |  |
| E   | ‚¨ú Not started |  |  |
| F   | ‚¨ú Not started |  |  |

After each training run, update the status to ‚úÖ Completed.

<a id='7-evaluation'></a>
## 7. Evaluation

Evaluate the final model after training on all UAV parts.

In [None]:
# Load best model
if LATEST_CHECKPOINT.exists():
    print(f"üìÇ Loading final model from: {LATEST_CHECKPOINT}")
    model = MambaDetectorModule.load_from_checkpoint(
        str(LATEST_CHECKPOINT),
        config=config,
    )
    model.eval()
    print("‚úÖ Model loaded for evaluation")
else:
    print("‚ùå No checkpoint found. Train the model first.")

In [None]:
# Visualize predictions on a sample
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

# Make sure a UAV part is downloaded for testing
available = get_available_uav_types(str(DATA_ROOT))
if not available:
    print("‚ö†Ô∏è No UAV data available. Download a part first.")
else:
    # Create test loader with available data
    _, _, test_loader = create_dataloaders(
        data_root=str(DATA_ROOT),
        batch_size=1,
        num_workers=0,
        sequence_length=config.data.sequence_length,
        stride=config.data.stride,
        img_size=config.data.img_size,
        sensor_type=config.data.sensor_type,
        view=config.data.view,
        uav_types=available,
    )
    
    # Get a sample batch
    images, targets = next(iter(test_loader))
    
    # Run inference
    with torch.no_grad():
        predictions = model(images.to(DEVICE))
    
    # Visualize first frame of sequence
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Denormalize for visualization
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    
    for i, ax in enumerate(axes[:min(3, images.shape[1])]):
        img = images[0, i].cpu().numpy().transpose(1, 2, 0)
        img = (img * std + mean).clip(0, 1)
        
        ax.imshow(img)
        ax.set_title(f"Frame {i+1}")
        
        # Draw predicted bbox
        pred = predictions[0, i].cpu().numpy()
        if pred[4] > 0.5:  # confidence threshold
            x_center, y_center, w, h = pred[:4]
            x_center *= config.data.img_size
            y_center *= config.data.img_size
            w *= config.data.img_size
            h *= config.data.img_size
            
            rect = patches.Rectangle(
                (x_center - w/2, y_center - h/2), w, h,
                linewidth=2, edgecolor='r', facecolor='none'
            )
            ax.add_patch(rect)
        
        ax.axis('off')
    
    plt.tight_layout()
    plt.savefig(str(PROJECT_ROOT / 'outputs' / 'sample_prediction.png'), dpi=150)
    plt.show()
    print(f"\nüì∏ Saved visualization to: outputs/sample_prediction.png")

<a id='8-export'></a>
## 8. Export Model

Export the trained model for deployment.

In [None]:
# Export to TorchScript
EXPORT_DIR = PROJECT_ROOT / "outputs" / "exported"
EXPORT_DIR.mkdir(parents=True, exist_ok=True)

if LATEST_CHECKPOINT.exists():
    model = MambaDetectorModule.load_from_checkpoint(
        str(LATEST_CHECKPOINT),
        config=config,
    )
    model.eval()
    
    # Create dummy input
    dummy_input = torch.randn(
        1, config.data.sequence_length, 3,
        config.data.img_size, config.data.img_size
    )
    
    # Export to TorchScript
    print("üì¶ Exporting to TorchScript...")
    try:
        scripted = torch.jit.trace(model.model, dummy_input)
        torchscript_path = EXPORT_DIR / "mamba_detector.pt"
        scripted.save(str(torchscript_path))
        print(f"‚úÖ Saved TorchScript model to: {torchscript_path}")
    except Exception as e:
        print(f"‚ö†Ô∏è TorchScript export failed: {e}")
        print("   This is expected with dynamic Mamba layers. Use checkpoint instead.")
    
    # Also save as PyTorch checkpoint (more reliable)
    torch_path = EXPORT_DIR / "mamba_detector_final.ckpt"
    torch.save({
        'model_state_dict': model.state_dict(),
        'config': config,
    }, torch_path)
    print(f"‚úÖ Saved PyTorch checkpoint to: {torch_path}")
else:
    print("‚ùå No checkpoint found. Train the model first.")

---

## üéâ Training Complete!

### Summary
- Model trained on all UAV parts using incremental download-train-delete workflow
- Final checkpoint saved to `outputs/checkpoints/latest.ckpt`
- Exported model saved to `outputs/exported/`

### Next Steps
1. **Deploy to Lightning AI**: Upload the exported model for cloud inference
2. **Hyperparameter Tuning**: Run `tune.py` for automated optimization
3. **Multi-GPU Training**: Use `train.py` with DDP for faster training