# 🏆 MABe Mouse Behavior Detection - GPU Training Setup

This notebook sets up the complete MABe competition solution with GPU-accelerated training.

**Features:**
- Dual-branch architecture (TCN + Transformer)
- Multi-scale temporal modeling
- Cross-agent attention
- GPU-optimized training
- Complete data pipeline

## 🚀 Quick Start
1. Enable GPU: Runtime → Change runtime type → GPU
2. Run all cells below
3. Train with GPU acceleration!

## 📋 Setup Instructions

1. **Enable GPU**: Go to Runtime → Change runtime type → Hardware accelerator → GPU
2. **Upload Data**: Upload MABe data files to `/content/MABe-data/`
3. **Run All Cells**: Execute this notebook step by step
4. **Train Model**: Start training with GPU acceleration


In [None]:
# @title 1. Environment Setup
import os
import sys
from pathlib import Path

print("🐭 Setting up MABe environment...")
print(f"Python: {sys.version}")
print(f"Working directory: {os.getcwd()}")

# Create solution directory
SOLUTION_DIR = "/content/mabe-solution"
os.makedirs(SOLUTION_DIR, exist_ok=True)
os.chdir(SOLUTION_DIR)

print(f"✅ Solution directory: {SOLUTION_DIR}")

# Clone repository (replace with your GitHub URL)
GITHUB_REPO = "YOUR_USERNAME/MABe-mouse-behavior-detection"  # CHANGE THIS!

print(f"📥 Cloning repository: https://github.com/{GITHUB_REPO}")
!git clone https://github.com/{GITHUB_REPO}.git .

print("✅ Repository cloned successfully!")
print(f"Contents: {len(os.listdir('.'))} files")


In [None]:
# @title 2. Install Dependencies
print("📦 Installing dependencies...")

# Upgrade pip
!pip install --upgrade pip

# Install PyTorch with CUDA support
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install core ML dependencies
!pip install pytorch-lightning hydra-core wandb
!pip install pandas numpy scipy scikit-learn
!pip install pyarrow matplotlib seaborn tqdm joblib

# Install additional dependencies for the solution
!pip install torchmetrics

print("✅ Core dependencies installed")

# Verify GPU availability
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
else:
    print("❌ No GPU detected! Please enable GPU in Runtime settings.")


In [None]:
# @title 3. Setup Data Paths and Configuration
print("📁 Setting up data paths...")

# Update configuration for Colab environment
import yaml

# Update main config
config_path = "configs/config.yaml"
if os.path.exists(config_path):
    with open(config_path, 'r') as f:
        config = yaml.safe_load(f)

    # Update data paths for Colab (you'll need to upload the data)
    config['data_dir'] = "/content/MABe-data"  # Upload your data here
    config['data']['train_csv'] = "/content/MABe-data/train.csv"
    config['data']['train_tracking_dir'] = "/content/MABe-data/train_tracking"
    config['data']['train_annotation_dir'] = "/content/MABe-data/train_annotation"

    with open(config_path, 'w') as f:
        yaml.dump(config, f, default_flow_style=False)

    print("✅ Configuration updated for Colab")
else:
    print("❌ Main config file not found")

# Create data directory
!mkdir -p /content/MABe-data
print("✅ Data directory created: /content/MABe-data")
print("\n📤 Upload Instructions:")
print("1. Go to the Files tab (left sidebar)")
print("2. Upload these files to /content/MABe-data/:")
print("   - train.csv")
print("   - train_tracking/ (folder)")
print("   - train_annotation/ (folder)")
print("   - test.csv")
print("   - test_tracking/ (folder)")
print("\n💡 Alternative: Mount Google Drive and update paths in config")


In [None]:
# @title 4. Test Implementation
print("🧪 Testing implementation...")

# Test architecture
print("Testing dual-branch architecture...")
!python test_architecture_simple.py

# Test data pipeline (without actual data)
print("\nTesting data pipeline concepts...")
try:
    !python test_basic.py
except Exception as e:
    print(f"Note: Data tests require actual MABe data files: {e}")

print("✅ Implementation tests complete")

# Show model architecture
print("\n🏗️ Model Architecture Summary:")
print("  - Dual-branch: TCN (local) + Transformer (global)")
print("  - Local branch: 4-layer dilated TCN, 33-frame receptive field")
print("  - Global branch: 6-layer transformer, 2048-frame context")
print("  - Cross-agent attention: Multi-mouse interaction modeling")
print("  - Temporal consistency: Loss to reduce prediction flickering")


In [None]:
# @title 5. Training Configuration
print("⚙️ Setting up training configuration...")

# Create optimized config for Colab
colab_config = """
defaults:
  - data: default
  - model: dual_branch
  - training: phase2

experiment_name: colab_gpu_training
seed: 42

# GPU-optimized settings
training:
  batch_size: 8  # Smaller for Colab GPU memory
  max_epochs: 20
  val_check_interval: 0.5
  accumulate_grad_batches: 4  # Larger effective batch
  mixed_precision: true
  gradient_checkpointing: true

# Model optimizations
model:
  global_branch:
    n_layers: 4  # Reduced for Colab
    d_model: 256
  local_branch:
    layers: 3    # Reduced for Colab
    hidden_dim: 512

# Data optimizations
data:
  window_sizes: [256, 512, 1024]
  overlap: 0.5
  positive_sampling_ratio: 0.3
  max_windows_per_video: 30  # Reduced for memory

wandb:
  enabled: false  # Disable for Colab unless you want to use it
"""

with open('configs/colab_config.yaml', 'w') as f:
    f.write(colab_config)

print("✅ Colab-optimized configuration created")
print("📋 Configuration features:")
print("  - GPU-optimized batch sizes and memory usage")
print("  - Mixed precision training for speed")
print("  - Gradient checkpointing for memory efficiency")
print("  - Reduced model size for Colab constraints")
print("\n💡 To train: python train.py --config-path configs --config-name colab_config")


In [None]:
# @title 6. Start Training (Run this after uploading data)
print("🚀 Ready to start training!")

# Check if data is available
data_dir = "/content/MABe-data"
required_files = ["train.csv"]

data_ready = all(os.path.exists(f"{data_dir}/{f}") for f in required_files)

if data_ready:
    print("✅ Data files detected - ready for training!")
    print(f"📁 Data directory: {data_dir}")

    # Show training command
    print("\n🔥 Training Command:")
    print("Run this in a new cell:")
    print("```python")
    print("!python train.py --config-path configs --config-name colab_config")
    print("```")

    print("\n📊 Training Features:")
    print("  - GPU acceleration with mixed precision")
    print("  - Multi-scale windowing (256, 512, 1024 frames)")
    print("  - Dual-branch architecture (TCN + Transformer)")
    print("  - Positive-aware sampling for rare behaviors")
    print("  - Temporal consistency loss")
    print("  - Real-time validation and checkpointing")

else:
    print("❌ Data not ready yet")
    print("Please upload MABe data files first:")
    print("1. train.csv")
    print("2. train_tracking/ folder")
    print("3. train_annotation/ folder")
    print("4. test.csv")
    print("5. test_tracking/ folder")
    print("\n💡 Alternative: Mount Google Drive")
    print("```python")
    print("from google.colab import drive")
    print("drive.mount('/content/drive')")
    print("# Update config paths to use /content/drive/MyDrive/MABe-data/")
    print("```")


In [None]:
# @title 7. Setup Summary and Next Steps
print("🎉 MABe Solution Setup Complete!")
print("=" * 60)

# Check all components
components = {
    "Repository": os.path.exists(".git"),
    "Dependencies": True,  # We just installed them
    "GPU Support": torch.cuda.is_available() if 'torch' in globals() else False,
    "Configuration": os.path.exists("configs/config.yaml"),
    "Models": os.path.exists("models/dual_branch.py"),
    "Data Pipeline": os.path.exists("data/pipeline.py"),
    "Training Script": os.path.exists("train.py")
}

print("📋 Component Status:")
for component, status in components.items():
    status_icon = "✅" if status else "❌"
    print(f"  {status_icon} {component}")

print("\n🚀 Next Steps:")
print("1. Upload MABe data files to /content/MABe-data/")
print("2. Run training: !python train.py --config-path configs --config-name colab_config")
print("3. Monitor training progress with TensorBoard or Weights & Biases")
print("4. Submit predictions to Kaggle leaderboard")

print("\n💡 Tips for Success:")
print("  - Use smaller batch sizes if you run out of GPU memory")
print("  - Enable gradient checkpointing for larger models")
print("  - Monitor validation F1 scores for early stopping")
print("  - Save checkpoints regularly for long training runs")

print("\n🏆 You're ready to train a state-of-the-art MABe model with GPU acceleration!")
print("=" * 60)
