# Cosmos Predict2 Full Pipeline on Colab

This notebook runs both T5 encoding and Cosmos Predict2 inference on Colab GPUs.

**Important**: Cosmos-Predict2 requires Python 3.10, which is Colab's default. We'll use the system Python directly.

**Requirements:**
- Google Colab with GPU runtime (A100, T4, or V100)
- Python 3.10 (Colab default)

**Note:** Select `Runtime > Change runtime type > GPU` (A100 recommended)

## 1. Verify Python Version

First, let's make sure we're using Python 3.10 (required by Cosmos-Predict2).

In [None]:
import sys
import os

# Check Python version
python_version = sys.version_info
print(f"🐍 Python version: {python_version.major}.{python_version.minor}.{python_version.micro}")

if python_version.major == 3 and python_version.minor == 10:
    print("✅ Python 3.10 detected - compatible with Cosmos-Predict2")
else:
    print(f"❌ Python {python_version.major}.{python_version.minor} detected")
    print("⚠️ Cosmos-Predict2 requires Python 3.10")
    print("Please use a Colab runtime with Python 3.10")
    raise RuntimeError("Incompatible Python version")

## 2. Install uv Package Manager (Optional but Faster)

We'll use `uv` for faster installation, but with the system Python directly.

In [None]:
# Choose installation method
USE_UV = True  # Set to False to use pip instead

if USE_UV:
    print("📦 Installing uv package manager for faster installation...")
    !curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Add uv to PATH
    os.environ['PATH'] = f"{os.path.expanduser('~/.local/bin')}:{os.environ['PATH']}"
    
    # Verify uv installation
    !uv --version
    print("✅ uv installed - will use for faster package installation")
else:
    print("Using standard pip for installation")

## 3. Install Cosmos-Predict2 and Dependencies

In [None]:
%%capture install_output

if USE_UV:
    print("Installing with uv (faster)...")
    # Use uv with system Python
    !uv pip install --system --upgrade pip
    !uv pip install --system "cosmos-predict2[cu126]" --extra-index-url https://nvidia-cosmos.github.io/cosmos-dependencies/cu126_torch260/simple
    !uv pip install --system transformers accelerate bitsandbytes
    !uv pip install --system decord einops "imageio[ffmpeg]"
    !uv pip install --system opencv-python-headless pillow
else:
    print("Installing with pip...")
    !pip install --upgrade pip
    !pip install "cosmos-predict2[cu126]" --extra-index-url https://nvidia-cosmos.github.io/cosmos-dependencies/cu126_torch260/simple
    !pip install transformers accelerate bitsandbytes
    !pip install decord einops "imageio[ffmpeg]"
    !pip install opencv-python-headless pillow

print("Installation complete")

In [None]:
# Show installation summary
print("📦 Installation Summary:")
print("="*50)

if USE_UV:
    print("✅ Used uv for faster installation")
else:
    print("✅ Used pip for installation")

# Check if there were any errors
install_text = install_output.stdout
if "error" in install_text.lower() or "failed" in install_text.lower():
    print("⚠️ Warning: Some errors during installation")
    print("You may need to restart runtime")
else:
    print("✅ All packages installed successfully")

## 4. Verify Installation and GPU

In [None]:
import torch
from datetime import datetime

# Check GPU
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    
    print("🖥️ GPU Information:")
    print(f"  Device: {gpu_name}")
    print(f"  Memory: {gpu_memory:.1f} GB")
    print(f"  CUDA Version: {torch.version.cuda}")
    print(f"  PyTorch Version: {torch.__version__}")
    
    # Classify GPU
    if "A100" in gpu_name:
        print("  🚀 High-end GPU detected - can use largest models")
    elif "T4" in gpu_name or "V100" in gpu_name:
        print("  ✅ Mid-range GPU detected - good performance expected")
    else:
        print("  ⚠️ Lower-end GPU - may need smaller models")
else:
    print("❌ No GPU detected! Cosmos-Predict2 requires GPU")
    gpu_memory = 0
    gpu_name = "CPU"

# Test Cosmos import
try:
    from cosmos_predict2.inference import Video2WorldPipeline
    print("\n✅ Cosmos Predict2 imported successfully")
except ImportError as e:
    print(f"\n❌ Import error: {e}")
    print("Try restarting runtime: Runtime > Restart runtime")

## 5. Mount Google Drive (Recommended)

Save outputs to Drive to prevent data loss if session disconnects.

In [None]:
# Mount Google Drive
mount_drive = True  # Set to False to skip Drive mounting

if mount_drive:
    from google.colab import drive
    drive.mount('/content/drive')
    
    # Create output directory
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    drive_output_dir = f"/content/drive/MyDrive/cosmos_outputs_{timestamp}"
    os.makedirs(drive_output_dir, exist_ok=True)
    
    print(f"✅ Drive mounted")
    print(f"📁 Output directory: {drive_output_dir}")
    print("💾 Outputs will be auto-saved to Drive")
    
    # Save session info
    with open(f"{drive_output_dir}/session_info.txt", "w") as f:
        f.write(f"Session started: {datetime.now()}\n")
        f.write(f"GPU: {gpu_name} ({gpu_memory:.1f} GB)\n")
        f.write(f"Python: {sys.version}\n")
        f.write(f"Installation method: {'uv' if USE_UV else 'pip'}\n")
else:
    print("⚠️ Drive not mounted - outputs may be lost if session disconnects")
    drive_output_dir = None

## 6. Download Model Checkpoints

In [None]:
from huggingface_hub import snapshot_download

# Auto-select model size based on GPU
if gpu_memory >= 40:  # A100
    MODEL_SIZE = "14B"
    print("🚀 Selected Cosmos-14B (largest model)")
elif gpu_memory >= 24:  # A10G, 3090
    MODEL_SIZE = "5B"
    print("Selected Cosmos-5B (medium model)")
elif gpu_memory >= 15:  # T4, V100
    MODEL_SIZE = "2B"
    print("Selected Cosmos-2B (small model)")
else:
    MODEL_SIZE = "2B"
    print("⚠️ Limited GPU - using smallest model (2B)")

print(f"\n📥 Downloading Cosmos-Predict2-{MODEL_SIZE} checkpoint...")
print("This will take 2-5 minutes...")

# Download checkpoint
checkpoint_dir = snapshot_download(
    repo_id=f"nvidia/Cosmos-Predict2-{MODEL_SIZE}-Video2World",
    cache_dir="/content/cosmos_checkpoints",
    resume_download=True
)

print(f"✅ Checkpoint downloaded")

# Find model files
import glob
model_files = glob.glob(f"{checkpoint_dir}/**/*.pt", recursive=True)
if model_files:
    print(f"Found {len(model_files)} model file(s)")
    for f in model_files[:3]:
        print(f"  - {os.path.basename(f)}")

## 7. Initialize T5 Text Encoder

In [None]:
from transformers import T5EncoderModel, T5Tokenizer

# Auto-select T5 model
if gpu_memory >= 40:
    t5_model = "google-t5/t5-11b"
    print("Using T5-11B (best quality)")
elif gpu_memory >= 24:
    t5_model = "google-t5/t5-3b"
    print("Using T5-3B (good quality)")
elif gpu_memory >= 15:
    t5_model = "google/flan-t5-xl"
    print("Using Flan-T5-XL (efficient)")
else:
    t5_model = "google/flan-t5-base"
    print("Using Flan-T5-Base (minimal)")

print(f"Loading {t5_model}...")

# Load T5
tokenizer = T5Tokenizer.from_pretrained(t5_model)
model = T5EncoderModel.from_pretrained(t5_model).half().to("cuda")
model.eval()

print(f"✅ T5 encoder loaded")
print(f"💾 Memory used: {torch.cuda.memory_allocated()/1024**3:.2f} GB")

def encode_text(text):
    """Encode text to embeddings."""
    inputs = tokenizer(text, return_tensors="pt", max_length=77, 
                      padding="max_length", truncation=True).to("cuda")
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state

## 8. Encode Prompts

In [None]:
# Define prompts
prompts = [
    "A robotic arm picks up white paper and places it into a red square target area.",
    "Robot gripper grasps paper and moves it to designated zone.",
    "Automated paper handling: robot transfers sheet to target.",
]

# Encode prompts
print("📝 Encoding prompts...")
encoded_prompts = {}

for i, prompt in enumerate(prompts, 1):
    print(f"  [{i}/{len(prompts)}] {prompt[:50]}...")
    encoded_prompts[prompt] = encode_text(prompt)

print(f"\n✅ Encoded {len(prompts)} prompts")

## 9. Load Cosmos Pipeline

In [None]:
from cosmos_predict2.inference import (
    Video2WorldPipeline,
    get_cosmos_predict2_video2world_pipeline,
)

print(f"🚀 Loading Cosmos-{MODEL_SIZE} pipeline...")

# Get config
config = get_cosmos_predict2_video2world_pipeline(model_size=MODEL_SIZE)

# Find model file
model_path = None
for fps in ['16fps', '10fps']:
    potential = os.path.join(checkpoint_dir, f"model-720p-{fps}.pt")
    if os.path.exists(potential):
        model_path = potential
        break

if not model_path and model_files:
    model_path = model_files[0]

if model_path:
    config['dit_checkpoint_path'] = model_path
    print(f"Using: {os.path.basename(model_path)}")

# Load pipeline
cosmos_pipe = Video2WorldPipeline.from_config(config)
cosmos_pipe = cosmos_pipe.to("cuda")
cosmos_pipe.eval()

print(f"✅ Pipeline loaded")
print(f"💾 Total memory: {torch.cuda.memory_allocated()/1024**3:.2f} GB")

## 10. Create Input Image

In [None]:
import numpy as np
import cv2
from PIL import Image as PILImage
from IPython.display import display

def create_test_image():
    """Create test image."""
    img = np.zeros((720, 1280, 3), dtype=np.uint8)
    
    # Background
    img[:, :] = [100, 80, 60]
    
    # White paper
    cv2.rectangle(img, (400, 300), (600, 450), (255, 255, 255), -1)
    
    # Red target
    cv2.rectangle(img, (800, 300), (950, 450), (50, 50, 200), -1)
    
    cv2.imwrite("input.jpg", cv2.cvtColor(img, cv2.COLOR_RGB2BGR))
    return "input.jpg"

input_path = create_test_image()
print("✅ Created test input")

img = PILImage.open(input_path)
display(img)

## 11. Generate Video

In [None]:
import torch
from einops import rearrange
import time

# Configure generation
if gpu_memory >= 40:
    num_frames = 121
elif gpu_memory >= 24:
    num_frames = 61
else:
    num_frames = 16

print(f"Generating {num_frames} frames...")

# Load input
img = PILImage.open(input_path)
frames = np.array(img)[np.newaxis, ...]
frames_tensor = torch.from_numpy(frames).float() / 255.0
frames_tensor = rearrange(frames_tensor, "t h w c -> 1 c t h w").to("cuda")

# Generate
start = time.time()
with torch.no_grad():
    with torch.cuda.amp.autocast():
        output = cosmos_pipe(
            frames_tensor,
            encoded_prompts[prompts[0]],
            num_frames=num_frames,
            fps=16,
            seed=42
        )

print(f"✅ Generated in {time.time()-start:.1f}s")

## 12. Save and Display

In [None]:
import imageio
import base64
from IPython.display import HTML

# Convert to video
if isinstance(output, torch.Tensor):
    video = output.cpu().numpy()
else:
    video = output

if video.ndim == 5:
    video = video[0]
if video.shape[0] == 3:
    video = np.transpose(video, (1, 2, 3, 0))
if video.max() <= 1.0:
    video = (video * 255).astype(np.uint8)

# Save
output_path = "output.mp4"
writer = imageio.get_writer(output_path, fps=16)
for frame in video:
    writer.append_data(frame)
writer.close()

print(f"✅ Saved: {output_path}")

# Backup to Drive
if drive_output_dir:
    import shutil
    drive_path = f"{drive_output_dir}/output.mp4"
    shutil.copy2(output_path, drive_path)
    print(f"☁️ Backed up to Drive")

# Display
with open(output_path, 'rb') as f:
    video_data = f.read()
encoded = base64.b64encode(video_data).decode('ascii')
display(HTML(f'''
<video width="640" controls autoplay loop>
    <source src="data:video/mp4;base64,{encoded}" type="video/mp4">
</video>
'''))

# Download option
from google.colab import files
if input("Download? (y/n): ").lower() == 'y':
    files.download(output_path)

## Troubleshooting

### Common Issues:

1. **Python Version Error**: 
   - Cosmos-Predict2 requires Python 3.10
   - Google Colab should have Python 3.10 by default
   - If you see Python 3.12, try a different runtime

2. **Import Error**:
   - Restart runtime: `Runtime > Restart runtime`
   - Then run cells again from the beginning

3. **OOM (Out of Memory)**:
   - Reduce `num_frames`
   - Use smaller models
   - Clear GPU cache: `torch.cuda.empty_cache()`

4. **Installation Fails**:
   - Try using pip instead of uv (set `USE_UV = False`)
   - Check your Python version is 3.10

### GPU Memory Guide:

| GPU | Memory | T5 Model | Cosmos Model | Max Frames |
|-----|--------|----------|--------------|------------|
| A100 | 40GB | T5-11B | Cosmos-14B | 121 |
| V100 | 16GB | Flan-T5-XL | Cosmos-2B | 16-30 |
| T4 | 16GB | Flan-T5-XL | Cosmos-2B | 16 |

### Tips:
- Always mount Google Drive to save outputs
- Use A100 for best results
- Clear cache between generations
- Restart runtime if imports fail