# Cosmos Predict2 on Google Colab (Python 3.10 Required)

This notebook runs Cosmos Predict2 on Google Colab.

**⚠️ IMPORTANT**: Cosmos-Predict2 requires Python 3.10, but Colab now defaults to Python 3.12.

## How to Use Python 3.10 Runtime:

### Option 1: Use Fallback Runtime (Recommended)
1. Go to **Runtime → Change runtime type**
2. Under **Runtime version**, select **Use fallback runtime version**
3. Select **GPU** (A100, V100, or T4)
4. Click **Save**
5. The runtime will restart with Python 3.10

### Option 2: Use Direct URL
Use this URL to create a new notebook with Python 3.10:
```
https://colab.research.google.com/#create=true&kernelspec=python3.10
```

After setting up Python 3.10, run the cells below.

## Step 1: Verify Python 3.10

**Run this first to check your Python version!**

In [None]:
import sys
import os

# Check Python version
python_version = sys.version_info
print("="*60)
print(f"🐍 Current Python version: {python_version.major}.{python_version.minor}.{python_version.micro}")
print("="*60)

if python_version.major == 3 and python_version.minor == 10:
    print("✅ Python 3.10 detected - Compatible with Cosmos-Predict2!")
    print("\n👍 You can proceed with the installation.")
elif python_version.major == 3 and python_version.minor in [11, 12]:
    print(f"\n❌ Python {python_version.major}.{python_version.minor} detected")
    print("\n⚠️ Cosmos-Predict2 requires Python 3.10 due to flash-attn dependencies.")
    print("\n📝 TO FIX THIS:")
    print("   1. Click: Runtime → Change runtime type")
    print("   2. Enable: 'Use fallback runtime version'")
    print("   3. Select: GPU (A100, V100, or T4)")
    print("   4. Click: Save")
    print("   5. The runtime will restart with Python 3.10")
    print("\n   Then run this cell again to verify.")
    print("\n" + "="*60)
    raise RuntimeError(
        "Incompatible Python version. Please switch to Python 3.10 using fallback runtime."
    )
else:
    print(f"❌ Unexpected Python version: {python_version.major}.{python_version.minor}")
    raise RuntimeError("Unsupported Python version")

## Step 2: Check GPU

Verify GPU is available and has sufficient memory.

In [None]:
import torch

print("🖥️ GPU Check:")
print("="*60)

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    
    print(f"✅ GPU Found: {gpu_name}")
    print(f"   Memory: {gpu_memory:.1f} GB")
    print(f"   CUDA Version: {torch.version.cuda}")
    print(f"   PyTorch Version: {torch.__version__}")
    
    # GPU recommendations
    if "A100" in gpu_name:
        print("\n🚀 Excellent! A100 can run the largest models.")
    elif "V100" in gpu_name:
        print("\n✅ Good! V100 can run medium-sized models.")
    elif "T4" in gpu_name:
        print("\n⚠️ T4 detected - will use smaller models for better performance.")
    else:
        print(f"\n📝 {gpu_name} detected - will auto-configure for best performance.")
else:
    print("❌ No GPU detected!")
    print("\n📝 TO FIX THIS:")
    print("   1. Go to: Runtime → Change runtime type")
    print("   2. Set Hardware accelerator to: GPU")
    print("   3. GPU type: A100, V100, or T4")
    print("   4. Enable: 'Use fallback runtime version' (for Python 3.10)")
    print("   5. Click: Save")
    raise RuntimeError("GPU required for Cosmos-Predict2")

print("\n" + "="*60)

## Step 3: Install Dependencies

Install Cosmos-Predict2 and required packages. We'll use `uv` for faster installation if available.

In [None]:
# Install uv for faster package installation (optional but recommended)
print("📦 Setting up package manager...\n")

try:
    # Try to install uv
    !curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null
    os.environ['PATH'] = f"{os.path.expanduser('~/.local/bin')}:{os.environ['PATH']}"
    
    # Check if uv is available
    !uv --version
    USE_UV = True
    print("\n✅ Will use 'uv' for faster installation (3-10x faster than pip)")
except:
    USE_UV = False
    print("ℹ️ Will use standard pip for installation")

print("\nThis installation will take 2-5 minutes...")

In [None]:
%%capture install_output

# Install Cosmos-Predict2 and dependencies
if USE_UV:
    # Using uv (faster)
    !uv pip install --system --upgrade pip setuptools wheel
    !uv pip install --system torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    !uv pip install --system "cosmos-predict2[cu126]" --extra-index-url https://nvidia-cosmos.github.io/cosmos-dependencies/cu126_torch260/simple
    !uv pip install --system transformers accelerate bitsandbytes
    !uv pip install --system decord einops "imageio[ffmpeg]" opencv-python-headless pillow
else:
    # Using pip (standard)
    !pip install --upgrade pip setuptools wheel
    !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    !pip install "cosmos-predict2[cu126]" --extra-index-url https://nvidia-cosmos.github.io/cosmos-dependencies/cu126_torch260/simple
    !pip install transformers accelerate bitsandbytes
    !pip install decord einops "imageio[ffmpeg]" opencv-python-headless pillow

In [None]:
# Check installation status
print("📦 Installation Status:")
print("="*60)

# Check for errors in installation
install_text = install_output.stdout.lower()
if "error" in install_text or "failed" in install_text:
    print("⚠️ Some packages had installation warnings/errors.")
    print("   This is often okay. Let's test the import...\n")
else:
    print("✅ All packages installed without errors\n")

# Test Cosmos import
try:
    from cosmos_predict2.inference import Video2WorldPipeline
    print("✅ Cosmos-Predict2 imported successfully!")
    print("\n🎉 Installation complete! You're ready to use Cosmos-Predict2.")
except ImportError as e:
    print(f"❌ Import failed: {e}")
    print("\n📝 TO FIX THIS:")
    print("   1. Try: Runtime → Restart runtime")
    print("   2. Run the cells again from the beginning")
    print("   3. Make sure you're using Python 3.10 (fallback runtime)")
    raise

print("\n" + "="*60)

## Step 4: Mount Google Drive (Optional but Recommended)

Save your outputs to Google Drive to prevent data loss.

In [None]:
from datetime import datetime
from google.colab import drive

# Mount Drive
mount_drive = True  # Set to False to skip

if mount_drive:
    drive.mount('/content/drive')
    
    # Create output directory
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    drive_output_dir = f"/content/drive/MyDrive/cosmos_outputs_{timestamp}"
    os.makedirs(drive_output_dir, exist_ok=True)
    
    print(f"✅ Google Drive mounted")
    print(f"📁 Outputs will be saved to: {drive_output_dir}")
    
    # Save session info
    with open(f"{drive_output_dir}/session_info.txt", "w") as f:
        f.write(f"Session started: {datetime.now()}\n")
        f.write(f"Python version: {sys.version}\n")
        f.write(f"GPU: {gpu_name if 'gpu_name' in locals() else 'Unknown'}\n")
        f.write(f"GPU Memory: {gpu_memory if 'gpu_memory' in locals() else 0:.1f} GB\n")
else:
    print("⚠️ Google Drive not mounted - outputs will be lost if session disconnects")
    drive_output_dir = None

## Step 5: Download Model Checkpoint

Download the Cosmos-Predict2 model. Size is auto-selected based on your GPU.

In [None]:
from huggingface_hub import snapshot_download
import glob

# Auto-select model size based on GPU memory
if 'gpu_memory' not in locals():
    gpu_memory = 16  # Default assumption

if gpu_memory >= 40:  # A100
    MODEL_SIZE = "14B"
    print("🚀 Using Cosmos-14B (largest, best quality)")
elif gpu_memory >= 24:  # A10G, 3090
    MODEL_SIZE = "5B"
    print("✅ Using Cosmos-5B (medium, good quality)")
else:  # T4, V100, or less
    MODEL_SIZE = "2B"
    print("📦 Using Cosmos-2B (small, efficient)")

print(f"\nDownloading Cosmos-Predict2-{MODEL_SIZE} checkpoint...")
print("This will take 2-5 minutes...\n")

# Download checkpoint
checkpoint_dir = snapshot_download(
    repo_id=f"nvidia/Cosmos-Predict2-{MODEL_SIZE}-Video2World",
    cache_dir="/content/cosmos_checkpoints",
    resume_download=True
)

print(f"\n✅ Checkpoint downloaded successfully")

# Find model files
model_files = glob.glob(f"{checkpoint_dir}/**/*.pt", recursive=True)
if model_files:
    print(f"Found {len(model_files)} model file(s)")
    model_path = model_files[0]
    print(f"Will use: {os.path.basename(model_path)}")
else:
    print("⚠️ No .pt files found - will use default path")
    model_path = None

## Step 6: Quick Test - Generate a Video

Let's do a quick test to generate a video and verify everything works.

In [None]:
# Import required libraries
import numpy as np
import torch
from PIL import Image as PILImage
from transformers import T5EncoderModel, T5Tokenizer
from cosmos_predict2.inference import (
    Video2WorldPipeline,
    get_cosmos_predict2_video2world_pipeline,
)
from einops import rearrange
import imageio

print("Loading models...\n")

# 1. Load T5 text encoder
t5_model = "google/flan-t5-base" if gpu_memory < 24 else "google/flan-t5-xl"
print(f"Loading T5 encoder ({t5_model})...")
tokenizer = T5Tokenizer.from_pretrained(t5_model)
text_encoder = T5EncoderModel.from_pretrained(t5_model).half().to("cuda")
text_encoder.eval()
print("✅ T5 loaded\n")

# 2. Load Cosmos pipeline
print(f"Loading Cosmos-{MODEL_SIZE} pipeline...")
config = get_cosmos_predict2_video2world_pipeline(model_size=MODEL_SIZE)
if model_path:
    config['dit_checkpoint_path'] = model_path

cosmos_pipe = Video2WorldPipeline.from_config(config)
cosmos_pipe = cosmos_pipe.to("cuda")
cosmos_pipe.eval()
print("✅ Cosmos pipeline loaded\n")

print(f"💾 GPU memory used: {torch.cuda.memory_allocated()/1024**3:.2f} GB")
print("\nReady to generate!")

In [None]:
# Create a simple test image
print("Creating test input...\n")

# Create a simple colored image
img = np.ones((720, 1280, 3), dtype=np.uint8) * 100
img[200:520, 400:880] = [200, 150, 100]  # Add a colored rectangle
test_image = PILImage.fromarray(img)
test_image.save("test_input.jpg")

# Display the input
from IPython.display import display
display(test_image)
print("\n✅ Test input created")

In [None]:
# Generate a video
print("🎬 Generating video...\n")

# 1. Encode text prompt
prompt = "A robotic arm moves across the table"
inputs = tokenizer(prompt, return_tensors="pt", max_length=77, 
                  padding="max_length", truncation=True).to("cuda")
with torch.no_grad():
    text_embeddings = text_encoder(**inputs).last_hidden_state

# 2. Prepare input image
frames = np.array(test_image)[np.newaxis, ...]
frames_tensor = torch.from_numpy(frames).float() / 255.0
frames_tensor = rearrange(frames_tensor, "t h w c -> 1 c t h w").to("cuda")

# 3. Generate video (fewer frames for quick test)
num_frames = 16  # Quick test with 16 frames
print(f"Generating {num_frames} frames...")

with torch.no_grad():
    with torch.cuda.amp.autocast():
        output = cosmos_pipe(
            frames_tensor,
            text_embeddings,
            num_frames=num_frames,
            fps=8,
            seed=42
        )

print("✅ Video generated!\n")

# 4. Save video
if isinstance(output, torch.Tensor):
    video = output.cpu().numpy()
else:
    video = output

if video.ndim == 5:
    video = video[0]
if video.shape[0] == 3:
    video = np.transpose(video, (1, 2, 3, 0))
if video.max() <= 1.0:
    video = (video * 255).astype(np.uint8)

# Save
output_path = "test_output.mp4"
writer = imageio.get_writer(output_path, fps=8)
for frame in video:
    writer.append_data(frame)
writer.close()

print(f"✅ Saved: {output_path}")

# Save to Drive if mounted
if drive_output_dir:
    import shutil
    drive_path = f"{drive_output_dir}/test_output.mp4"
    shutil.copy2(output_path, drive_path)
    print(f"☁️ Backed up to Google Drive")

# Display video
import base64
from IPython.display import HTML

with open(output_path, 'rb') as f:
    video_data = f.read()
encoded = base64.b64encode(video_data).decode('ascii')

display(HTML(f'''
<video width="640" controls autoplay loop>
    <source src="data:video/mp4;base64,{encoded}" type="video/mp4">
</video>
'''))

print("\n🎉 Success! Cosmos-Predict2 is working correctly.")

## 🎉 Setup Complete!

Cosmos-Predict2 is now ready to use! You can:

1. **Generate longer videos**: Increase `num_frames` (16, 61, or 121 depending on GPU)
2. **Use different prompts**: Change the text prompt for different motions
3. **Use your own images**: Upload your own input images
4. **Batch process**: Generate multiple videos with different prompts

### Memory Guide:

| GPU | Max Frames | Quality |
|-----|------------|------|
| A100 | 121 | Best |
| V100 | 61 | Good |
| T4 | 16-30 | OK |

### Tips:
- Always save to Google Drive to prevent data loss
- Clear GPU cache between generations: `torch.cuda.empty_cache()`
- Use smaller models if you run out of memory

### Troubleshooting:
- **Python version error**: Make sure you're using the fallback runtime (Python 3.10)
- **Import errors**: Restart runtime and run cells again
- **Out of memory**: Reduce `num_frames` or restart runtime to clear memory