# üé¨ YOWO Training on Google Colab

**Model**: `yowo_v2_x3d_m_yolo11m_multitask`

| GPU | Batch Size | Accumulation | Effective Batch |
|-----|------------|--------------|-----------------|
| T4 (16GB) | 6 | 4 | 24 |
| A100 (40GB) | 32 | 2 | 64 |
| A100 (80GB) | 64 | 2 | 128 |

**Training**: AMP (Mixed Precision) enabled for ~1.5-2x speedup!


In [None]:
# Cell 1: Check GPU & Auto-Configure Batch Size
import torch
print("=" * 60)
print("üîç GPU Detection & Configuration")
print("=" * 60)

if not torch.cuda.is_available():
    raise RuntimeError("‚ùå No GPU! Go to Runtime > Change runtime type > GPU")

gpu_name = torch.cuda.get_device_name(0)
gpu_memory_gb = torch.cuda.get_device_properties(0).total_memory / 1e9

print(f"‚úÖ GPU: {gpu_name}")
print(f"‚úÖ VRAM: {gpu_memory_gb:.1f} GB")

# Auto-configure batch size based on GPU (WITH AMP - larger batches possible!)
# AMP uses ~40% less memory, so we can increase batch sizes
if "A100" in gpu_name:
    if gpu_memory_gb > 45:  # A100 80GB
        BATCH_SIZE, ACCUMULATE = 64, 2  # Effective: 128
    else:  # A100 40GB
        BATCH_SIZE, ACCUMULATE = 32, 2  # Effective: 64
elif "T4" in gpu_name:
    BATCH_SIZE, ACCUMULATE = 6, 4  # Effective: 24
elif "V100" in gpu_name:
    BATCH_SIZE, ACCUMULATE = 16, 2  # Effective: 32
else:
    BATCH_SIZE, ACCUMULATE = 4, 4  # Safe default

print(f"\nüì¶ Auto-configured (with AMP): batch={BATCH_SIZE}, accum={ACCUMULATE}, effective={BATCH_SIZE*ACCUMULATE}")


In [None]:
# Cell 2: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

import os
TAR_PATH = "/content/drive/MyDrive/yooowo/frames.tar"
if os.path.exists(TAR_PATH):
    size_gb = os.path.getsize(TAR_PATH) / 1e9
    print(f"‚úÖ Found frames.tar ({size_gb:.2f} GB)")
else:
    print(f"‚ùå frames.tar not found at {TAR_PATH}")


In [None]:
# Cell 3: Clone Repository & Install Dependencies
%cd /content
!rm -rf yowo
!git clone https://github.com/michelsedgh/yowo.git
%cd yowo
!pip install -q torch torchvision opencv-python thop scipy matplotlib numpy imageio pytorchvideo ultralytics tensorboard
print("‚úÖ Repository cloned and dependencies installed!")


In [None]:
# Cell 4: Extract Dataset (5-15 min depending on size)
import os, time

DATA_ROOT = "/content/yowo/data/ActionGenome"
FRAMES_DIR = os.path.join(DATA_ROOT, "frames")
TAR_PATH = "/content/drive/MyDrive/yooowo/frames.tar"

os.makedirs(DATA_ROOT, exist_ok=True)

if os.path.exists(FRAMES_DIR) and len(os.listdir(FRAMES_DIR)) > 100:
    print(f"‚úÖ Already extracted ({len(os.listdir(FRAMES_DIR))} videos)")
else:
    print("üì¶ Extracting frames.tar to local SSD...")
    start = time.time()
    !tar -xf "{TAR_PATH}" -C "{DATA_ROOT}"
    print(f"‚úÖ Done in {(time.time()-start)/60:.1f} min")
    if os.path.exists(FRAMES_DIR):
        print(f"‚úÖ Extracted {len(os.listdir(FRAMES_DIR))} videos")


In [None]:
# Cell 5: Verify Dataset Structure
print("üìÇ ActionGenome directory:")
!ls -la /content/yowo/data/ActionGenome/

print("\nüìÇ Annotations (from git repo):")
!ls /content/yowo/data/ActionGenome/annotations/

print("\nüìÇ Sample video directories:")
!ls /content/yowo/data/ActionGenome/frames/ | head -5

print("\nüìä Total frames:")
!find /content/yowo/data/ActionGenome/frames -name "*.jpg" -o -name "*.png" 2>/dev/null | wc -l

# Quick sanity check
import os
ann_path = "/content/yowo/data/ActionGenome/annotations"
frames_path = "/content/yowo/data/ActionGenome/frames"
if os.path.exists(os.path.join(ann_path, "person_bbox.pkl")) and os.path.exists(frames_path):
    print("\n‚úÖ Dataset ready for training!")
else:
    print("\n‚ö†Ô∏è Missing annotations or frames!")


In [None]:
# Cell 6: NOTE - TensorBoard is NOT implemented in train.py
# Training progress is shown via console output every 10 iterations
# You'll see output like:
# [Epoch: 1/10][Iter: 100/5000][lr: 0.0001][losses: 12.45][loss_conf: 2.10][loss_cls: 8.20][loss_reg: 2.15][time: 0.85]
#
# What to watch:
# - losses: Total loss, should DECREASE from ~10-20 to ~3-5
# - loss_conf: Objectness/confidence loss
# - loss_cls: Classification loss (objects + actions + relations)
# - loss_reg: Bounding box regression loss
# - time: Seconds per 10 iterations

print("üìä Training will output progress to console every 10 iterations")
print("Watch for 'losses' to decrease over time!")


## üöÄ Training Commands

**Understanding Training Output (printed every 10 iterations):**
```
[Epoch: 1/10][Iter: 100/5000][lr: 0.0001][losses: 12.45][loss_conf: 2.10][loss_cls: 8.20][loss_reg: 2.15][time: 0.85]
```

| Metric | Meaning | Good Sign |
|--------|---------|-----------|
| `losses` | Total loss | **DECREASES** from ~15 ‚Üí ~3-5 |
| `loss_conf` | Confidence/objectness | Decreases |
| `loss_cls` | Classification (obj+action+rel) | Decreases |
| `loss_reg` | Bounding box accuracy | Decreases |
| `time` | Seconds per 10 iterations | T4: ~0.5-1.0s, A100: ~0.1-0.3s |

**Note:** No mAP evaluation during training (evaluator not implemented for charades_ag).
Model checkpoints are saved after each epoch to `/content/yowo/weights/`.


In [None]:
# Cell 7: üöÄ TRAIN! (Main training cell)
# Batch size and accumulation are auto-configured from Cell 1
# AMP (Automatic Mixed Precision) enabled for ~1.5-2x faster training!

# Build command with auto-configured batch size + AMP
cmd = f"""python train.py \
    -d charades_ag \
    -v yowo_v2_x3d_m_yolo11m_multitask \
    --cuda \
    --amp \
    -bs {BATCH_SIZE} \
    -accu {ACCUMULATE} \
    --max_epoch 10 \
    --root /content/yowo/data \
    -K 16 \
    -lr 0.0001 \
    --num_workers 2 \
    --save_folder /content/yowo/weights"""

print(f"üöÄ Training with AMP: batch={BATCH_SIZE}, accum={ACCUMULATE}, effective={BATCH_SIZE*ACCUMULATE}")
print(f"üìä Command:\n{cmd}\n")
print("=" * 60)

!{cmd}


In [None]:
# Cell 9: Save Weights to Google Drive (after training)
import shutil, os

DRIVE_SAVE_PATH = "/content/drive/MyDrive/yooowo/weights"
os.makedirs(DRIVE_SAVE_PATH, exist_ok=True)

weights_dir = "/content/yowo/weights/charades_ag/yowo_v2_x3d_m_yolo11m_multitask"
if os.path.exists(weights_dir):
    for w in os.listdir(weights_dir):
        if w.endswith('.pth'):
            shutil.copy2(os.path.join(weights_dir, w), os.path.join(DRIVE_SAVE_PATH, w))
            print(f"‚úÖ Saved {w} to Drive")
else:
    print("‚ö†Ô∏è No weights found yet")


## üß™ Optional: Quick 1-Epoch Test

Run this first to verify everything works before full training:


In [None]:
# Quick test - 1 epoch (uncomment to run)
# !python train.py -d charades_ag -v yowo_v2_x3d_m_yolo11m_multitask --cuda -bs 4 --max_epoch 1 --root /content/yowo/data -K 16 --num_workers 2


## üìà Resume Training from Checkpoint


In [None]:
# Resume from checkpoint (uncomment and modify path)
# CHECKPOINT = "/content/yowo/weights/charades_ag/yowo_v2_x3d_m_yolo11m_multitask/yowo_v2_x3d_m_yolo11m_multitask_epoch_5.pth"
# !python train.py -d charades_ag -v yowo_v2_x3d_m_yolo11m_multitask --cuda -bs {BATCH_SIZE} -accu {ACCUMULATE} --max_epoch 20 --root /content/yowo/data -K 16 -r {CHECKPOINT} --eval


## üîß Troubleshooting

| Problem | Solution |
|---------|----------|
| OOM Error | Reduce `BATCH_SIZE` to 2, increase `ACCUMULATE` to 8 |
| Training slow | Increase batch size if GPU memory allows |
| Loss not decreasing | Try lr=0.0005 (higher) or lr=0.00005 (lower) |
| `loss is NAN !!` | Reduce learning rate to 0.00005 |
| Loss stuck high | Verify dataset extracted correctly, check annotations |

## üìÅ Output Files

After training:
- **Weights**: `/content/yowo/weights/charades_ag/yowo_v2_x3d_m_yolo11m_multitask/`
- **Checkpoints**: `yowo_v2_x3d_m_yolo11m_multitask_epoch_N.pth`

**‚ö†Ô∏è IMPORTANT:** Run Cell 8 to copy weights to Google Drive before the runtime disconnects!
