# 📝 Paper 1: Storm-Graph Transformer (SGT)

**Title:** "Physics-Informed Graph Neural Networks with Transformers for Severe Weather Nowcasting"

**Core Innovation:**
- Hybrid GNN-Transformer-Physics architecture
- Treats storms as discrete graph nodes (not continuous fields)
- Physics-constrained predictions (conservation laws)
- Interpretable attention (which storms matter?)

**Timeline:** Week 1-3 (Oct 10-31)
**Target:** ArXiv + NeurIPS workshop

---

## Notebook Sections:
1. Setup & Data Verification
2. Multimodal Data Loading
3. Architecture Implementation
4. Training Pipeline
5. Evaluation & Analysis

---

## 1. Setup & Data Verification

In [None]:
# Mount Google Drive
from google.colab import drive
import os

drive.mount('/content/drive')

DRIVE_ROOT = "/content/drive/MyDrive/SEVIR_Data"
print(f"✓ Drive mounted: {DRIVE_ROOT}")

In [None]:
# Check GPU
!nvidia-smi

import torch
print(f"\nPyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory/1e9:.1f} GB")

In [None]:
# Install dependencies
!pip install -q torch-geometric torch-scatter torch-sparse
!pip install -q h5py pandas tqdm matplotlib lpips scikit-image

print("✓ Dependencies installed")

In [None]:
# Clone/pull latest repo
import os
if not os.path.exists('/content/stormfusion-sevir'):
    !git clone https://github.com/syedhaliz/stormfusion-sevir.git
else:
    !cd stormfusion-sevir && git pull

# Add to path
import sys
sys.path.insert(0, '/content/stormfusion-sevir')

print("✓ Repo ready")

### Verify SEVIR Modalities

In [None]:
# Run verification script
from pathlib import Path

SEVIR_ROOT = f"{DRIVE_ROOT}/data/sevir"
CATALOG_PATH = f"{DRIVE_ROOT}/data/SEVIR_CATALOG.csv"

print("="*70)
print("SEVIR DATA VERIFICATION")
print("="*70)

modalities = {
    'vil': 'VIL (Radar)',
    'ir069': 'GOES-16 C09 (Water Vapor 6.9μm)',
    'ir107': 'GOES-16 C13 (IR Window 10.7μm)',
    'lght': 'GOES-16 GLM (Lightning)'
}

available = []
missing = []

for mod, desc in modalities.items():
    mod_path = Path(SEVIR_ROOT) / mod / "2019"
    if mod_path.exists():
        h5_files = list(mod_path.glob("*.h5"))
        available.append(mod)
        print(f"✅ {mod:8s} - {desc}")
        print(f"   Files: {len(h5_files)} HDF5 files\n")
    else:
        missing.append(mod)
        print(f"❌ {mod:8s} - {desc}")
        print(f"   Missing: {mod_path}\n")

print("="*70)
print(f"Available: {len(available)}/4 modalities")
if available:
    print(f"  {', '.join(available)}")
if missing:
    print(f"\n⚠️  Missing: {', '.join(missing)}")
    print(f"   Download from: https://sevir.mit.edu/")

print("\n" + "="*70)

## 2. Multimodal Data Loading

In [None]:
# Import multimodal dataset
from stormfusion.data.sevir_multimodal import (
    SEVIRMultiModalDataset,
    build_multimodal_index,
    multimodal_collate_fn
)

print("✓ Multimodal dataset imported")

In [None]:
# Build index (use ALL 541 events)
TRAIN_IDS = f"{DRIVE_ROOT}/data/samples/all_train_ids.txt"
VAL_IDS = f"{DRIVE_ROOT}/data/samples/all_val_ids.txt"

train_index = build_multimodal_index(CATALOG_PATH, TRAIN_IDS, SEVIR_ROOT)
val_index = build_multimodal_index(CATALOG_PATH, VAL_IDS, SEVIR_ROOT)

print(f"\n📊 Dataset:")
print(f"  Train: {len(train_index)} events")
print(f"  Val: {len(val_index)} events")

In [None]:
# Create datasets
train_dataset = SEVIRMultiModalDataset(
    train_index,
    sevir_root=SEVIR_ROOT,
    catalog_path=CATALOG_PATH,
    input_steps=12,
    output_steps=6,
    normalize=True,
    augment=True
)

val_dataset = SEVIRMultiModalDataset(
    val_index,
    sevir_root=SEVIR_ROOT,
    catalog_path=CATALOG_PATH,
    input_steps=12,
    output_steps=6,
    normalize=True,
    augment=False
)

print("✓ Datasets created")

In [None]:
# Test data loading
print("Testing data loading...\n")

inputs, outputs = train_dataset[0]

print("Input shapes:")
for modality, data in inputs.items():
    print(f"  {modality:8s}: {tuple(data.shape)} (T, H, W)")

print("\nOutput shapes:")
for modality, data in outputs.items():
    print(f"  {modality:8s}: {tuple(data.shape)} (T, H, W)")

print("\n✅ Multimodal loading successful!")

In [None]:
# Visualize sample
import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(2, 4, figsize=(16, 8))

# Show last input frame for each modality
for i, modality in enumerate(['vil', 'ir069', 'ir107', 'lght']):
    data = inputs[modality][-1].numpy()  # Last timestep
    axes[0, i].imshow(data, cmap='viridis', vmin=data.min(), vmax=data.max())
    axes[0, i].set_title(f'{modality.upper()} (input t=12)')
    axes[0, i].axis('off')

# Show VIL predictions
for i in range(4):
    data = outputs['vil'][i].numpy()
    axes[1, i].imshow(data, cmap='viridis', vmin=0, vmax=1)
    axes[1, i].set_title(f'VIL (pred t+{(i+1)*5}min)')
    axes[1, i].axis('off')

plt.tight_layout()
plt.savefig('/content/multimodal_sample.png', dpi=150, bbox_inches='tight')
plt.show()

print("✓ Visualization saved to /content/multimodal_sample.png")

## 3. Architecture Implementation

**Status:** Building modules progressively

### Module Plan:
1. ✅ Multimodal Encoder (CNN per modality)
2. ⏳ Storm Cell Detector (graph construction)
3. ⏳ GNN Module (storm interactions)
4. ⏳ Transformer Module (spatiotemporal attention)
5. ⏳ Physics Decoder (conservation laws)

**Note:** Modules will be added as they're implemented in the repo

In [None]:
# Placeholder: Will import architecture modules as they're built

# from stormfusion.models.sgt import StormGraphTransformer
# from stormfusion.models.sgt.encoder import MultiModalEncoder
# from stormfusion.models.sgt.detector import StormCellDetector
# from stormfusion.models.sgt.gnn import StormGNN
# from stormfusion.models.sgt.transformer import SpatioTemporalTransformer
# from stormfusion.models.sgt.decoder import PhysicsDecoder

print("⏳ Architecture modules coming soon...")
print("   Check repo for latest updates")

## 4. Training Pipeline (Placeholder)

Will be activated once architecture is implemented

In [None]:
# Training configuration
CONFIG = {
    'model': {
        'hidden_dim': 128,
        'num_gnn_layers': 3,
        'num_tf_layers': 4,
        'num_heads': 8
    },
    'training': {
        'batch_size': 4,
        'lr': 1e-4,
        'epochs': 20,
        'lambda_physics': 0.1,
        'lambda_extreme': 2.0  # Stage 4 insight: weight extreme events
    }
}

print("Training config:")
import json
print(json.dumps(CONFIG, indent=2))

## 5. Status & Next Steps

In [None]:
print("="*70)
print("PAPER 1 STATUS")
print("="*70)

status = {
    'Architecture Design': '✅ Complete',
    'Multimodal Data Loader': '✅ Complete',
    'Storm Cell Detection': '⏳ In Progress',
    'GNN Module': '⏳ Pending',
    'Transformer Module': '⏳ Pending',
    'Physics Decoder': '⏳ Pending',
    'Training Pipeline': '⏳ Pending',
    'Evaluation': '⏳ Pending'
}

for item, state in status.items():
    print(f"{item:30s} {state}")

print("\n" + "="*70)
print("TIMELINE")
print("="*70)
print("\nWeek 1 (Oct 10-17): Core modules")
print("Week 2 (Oct 17-24): Full training")
print("Week 3 (Oct 24-31): Experiments + baselines")
print("Week 4 (Oct 31-Nov 7): Paper writing")

print("\n✅ Data loading verified - ready to build architecture!")

---

## 📚 References

**Architecture Design:**
- See: `docs/PAPER1_ARCHITECTURE.md` in repo

**Multimodal Data:**
- SEVIR: Veillette et al., NeurIPS 2020
- 4 modalities: VIL, IR069, IR107, GLM
- 541 events (432 train / 109 val)

**Novel Contributions:**
1. GNN-Transformer hybrid for weather nowcasting
2. Physics-informed graph construction
3. Interpretable attention mechanisms
4. Extreme event focus (Stage 4 insights)

---

*This notebook will be updated as modules are implemented.*
*Check git repo for latest code: https://github.com/syedhaliz/stormfusion-sevir*