# Cosmos Predict2 on Google Colab - Complete Solution

This notebook provides a complete solution for running Cosmos-Predict2 on Google Colab, which now uses Python 3.12 (or 3.11 in fallback).

**Problem**: 
- Cosmos-Predict2 requires Python 3.10 (flash-attn only has cp310 wheels)
- Colab default runtime: Python 3.12
- Colab fallback runtime: Python 3.11
- Neither works with Cosmos-Predict2!

**Solution**: We'll install Python 3.10 using deadsnakes PPA and use it with pip.

**Requirements**:
- Google Colab with GPU (A100, V100, or T4)
- About 5-10 minutes for setup

## Step 1: Check Current Environment

In [None]:
import sys
import os

print("🔍 Current Environment Check:")
print("="*60)

# Check current Python version
current_version = sys.version_info
print(f"Current Python: {current_version.major}.{current_version.minor}.{current_version.micro}")
print(f"Python path: {sys.executable}")

# Check GPU
try:
    import torch
    if torch.cuda.is_available():
        gpu_name = torch.cuda.get_device_name(0)
        gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
        print(f"\nGPU: {gpu_name} ({gpu_memory:.1f} GB)")
        print(f"CUDA: {torch.version.cuda}")
    else:
        print("\n⚠️ No GPU detected")
        gpu_memory = 0
except:
    print("\nTorch not yet installed")
    gpu_memory = 0

print("\n" + "="*60)

if current_version.minor != 10:
    print(f"\n⚠️ Python {current_version.major}.{current_version.minor} detected.")
    print("We need Python 3.10 for Cosmos-Predict2.")
    print("\n📝 Next step: Install Python 3.10 alongside the system Python.")
else:
    print("\n✅ Python 3.10 already available!")

## Step 2: Install Python 3.10

We'll install Python 3.10 using the deadsnakes PPA, which provides Python versions for Ubuntu.

In [None]:
%%bash
# Install Python 3.10 from deadsnakes PPA
echo "📦 Installing Python 3.10..."
echo "This will take about 1-2 minutes..."
echo ""

# Add deadsnakes PPA
apt-get update -qq
apt-get install -qq software-properties-common
add-apt-repository -y ppa:deadsnakes/ppa

# Install Python 3.10 and required packages
apt-get update -qq
apt-get install -qq python3.10 python3.10-venv python3.10-dev python3.10-distutils

# Install pip for Python 3.10
curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10

echo ""
echo "✅ Python 3.10 installation complete!"

# Verify installation
echo ""
echo "Verification:"
python3.10 --version
python3.10 -m pip --version

## Step 3: Create Python 3.10 Virtual Environment

In [None]:
%%bash
# Create a virtual environment with Python 3.10
echo "🔧 Creating Python 3.10 virtual environment..."

# Create venv
python3.10 -m venv /content/cosmos_env

# Activate and verify
source /content/cosmos_env/bin/activate
which python
python --version

echo "✅ Virtual environment created with Python 3.10"

## Step 4: Install Cosmos-Predict2 in Python 3.10 Environment

In [None]:
%%bash
# Install packages using Python 3.10
echo "📦 Installing Cosmos-Predict2 and dependencies..."
echo "This will take 3-5 minutes..."
echo ""

# Use the virtual environment's pip
source /content/cosmos_env/bin/activate

# Upgrade pip
python -m pip install --upgrade pip setuptools wheel

# Install PyTorch with CUDA support
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install Cosmos-Predict2
python -m pip install "cosmos-predict2[cu126]" --extra-index-url https://nvidia-cosmos.github.io/cosmos-dependencies/cu126_torch260/simple

# Install additional dependencies
python -m pip install transformers accelerate bitsandbytes
python -m pip install decord einops "imageio[ffmpeg]" opencv-python-headless pillow

echo ""
echo "✅ Installation complete!"

## Step 5: Configure Jupyter to Use Python 3.10

In [None]:
# Configure the notebook to use Python 3.10 from our virtual environment
import sys
import os

# Add the virtual environment to sys.path
venv_path = '/content/cosmos_env'
python_path = f'{venv_path}/bin/python3.10'
site_packages = f'{venv_path}/lib/python3.10/site-packages'

# Update sys.path
if site_packages not in sys.path:
    sys.path.insert(0, site_packages)

# Set environment variables
os.environ['VIRTUAL_ENV'] = venv_path
os.environ['PATH'] = f"{venv_path}/bin:{os.environ['PATH']}"

print("🔧 Configuring notebook to use Python 3.10 environment...")
print(f"Virtual env: {venv_path}")
print(f"Site packages: {site_packages}")

# Now we need to restart the kernel with the new Python
# But first, let's verify we can import from the venv
try:
    # This should work after adding to sys.path
    import torch
    print(f"\n✅ PyTorch version: {torch.__version__}")
    print(f"CUDA available: {torch.cuda.is_available()}")
except ImportError as e:
    print(f"\n⚠️ Import error: {e}")
    print("You may need to restart the kernel.")

## Step 6: Test Cosmos-Predict2 Import

In [None]:
# Use subprocess to run Python 3.10 directly
import subprocess
import json

# Test script
test_script = """
import sys
import json

result = {}
result['python_version'] = f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"

try:
    import torch
    result['torch_version'] = torch.__version__
    result['cuda_available'] = torch.cuda.is_available()
    if torch.cuda.is_available():
        result['gpu_name'] = torch.cuda.get_device_name(0)
except Exception as e:
    result['torch_error'] = str(e)

try:
    from cosmos_predict2.inference import Video2WorldPipeline
    result['cosmos_import'] = 'success'
except Exception as e:
    result['cosmos_import'] = f'failed: {e}'

print(json.dumps(result))
"""

# Run test in Python 3.10
result = subprocess.run(
    ['/content/cosmos_env/bin/python', '-c', test_script],
    capture_output=True,
    text=True
)

if result.returncode == 0:
    test_results = json.loads(result.stdout)
    
    print("🔍 Python 3.10 Environment Test:")
    print("="*60)
    print(f"Python version: {test_results.get('python_version')}")
    print(f"PyTorch version: {test_results.get('torch_version', 'Not installed')}")
    print(f"CUDA available: {test_results.get('cuda_available', False)}")
    if 'gpu_name' in test_results:
        print(f"GPU: {test_results['gpu_name']}")
    
    cosmos_status = test_results.get('cosmos_import', 'unknown')
    if cosmos_status == 'success':
        print("\n✅ Cosmos-Predict2 imported successfully!")
    else:
        print(f"\n❌ Cosmos import: {cosmos_status}")
else:
    print("❌ Test failed:")
    print(result.stderr)

## Step 7: Create Helper Function to Run Python 3.10 Code

In [None]:
import subprocess
import tempfile
import os

def run_python310(code, return_output=False):
    """
    Run code using Python 3.10 from the virtual environment.
    
    Args:
        code: Python code to execute
        return_output: If True, return the output as string
    
    Returns:
        Output string if return_output=True, else None
    """
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(code)
        temp_file = f.name
    
    try:
        result = subprocess.run(
            ['/content/cosmos_env/bin/python', temp_file],
            capture_output=True,
            text=True
        )
        
        if result.returncode != 0:
            print("❌ Error:")
            print(result.stderr)
            return None
        
        if return_output:
            return result.stdout
        else:
            print(result.stdout)
    finally:
        os.unlink(temp_file)

# Test the helper function
test_code = """
import sys
print(f"Python version: {sys.version}")
print("Hello from Python 3.10!")
"""

print("Testing helper function...\n")
run_python310(test_code)

## Step 8: Complete Example - Generate a Video with Cosmos-Predict2

In [None]:
# Complete script to generate a video using Python 3.10
cosmos_script = """
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

print("🚀 Starting Cosmos-Predict2 video generation...\n")

# Import libraries
import torch
import numpy as np
from PIL import Image
from transformers import T5EncoderModel, T5Tokenizer
from cosmos_predict2.inference import (
    Video2WorldPipeline,
    get_cosmos_predict2_video2world_pipeline,
)
from einops import rearrange
import imageio
from huggingface_hub import snapshot_download

# Check GPU
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"GPU: {gpu_name} ({gpu_memory:.1f} GB)")
else:
    print("No GPU detected")
    gpu_memory = 16

# Auto-select model size
if gpu_memory >= 40:
    MODEL_SIZE = "14B"
elif gpu_memory >= 24:
    MODEL_SIZE = "5B"
else:
    MODEL_SIZE = "2B"

print(f"Using Cosmos-{MODEL_SIZE} model\n")

# Download checkpoint
print("Downloading checkpoint...")
checkpoint_dir = snapshot_download(
    repo_id=f"nvidia/Cosmos-Predict2-{MODEL_SIZE}-Video2World",
    cache_dir="/content/cosmos_checkpoints",
    resume_download=True
)
print("Checkpoint downloaded\n")

# Load T5 encoder
print("Loading T5 encoder...")
t5_model = "google/flan-t5-base"
tokenizer = T5Tokenizer.from_pretrained(t5_model)
text_encoder = T5EncoderModel.from_pretrained(t5_model).half().to("cuda")
text_encoder.eval()
print("T5 loaded\n")

# Load Cosmos pipeline
print("Loading Cosmos pipeline...")
config = get_cosmos_predict2_video2world_pipeline(model_size=MODEL_SIZE)
cosmos_pipe = Video2WorldPipeline.from_config(config)
cosmos_pipe = cosmos_pipe.to("cuda")
cosmos_pipe.eval()
print("Cosmos pipeline loaded\n")

# Create test input
print("Creating test input...")
img = np.ones((720, 1280, 3), dtype=np.uint8) * 100
img[200:520, 400:880] = [200, 150, 100]
Image.fromarray(img).save("/content/test_input.jpg")

# Encode prompt
prompt = "A robotic arm moves across the table"
inputs = tokenizer(prompt, return_tensors="pt", max_length=77,
                  padding="max_length", truncation=True).to("cuda")
with torch.no_grad():
    text_embeddings = text_encoder(**inputs).last_hidden_state

# Prepare input
frames = np.array(Image.open("/content/test_input.jpg"))[np.newaxis, ...]
frames_tensor = torch.from_numpy(frames).float() / 255.0
frames_tensor = rearrange(frames_tensor, "t h w c -> 1 c t h w").to("cuda")

# Generate video
print("Generating video...")
with torch.no_grad():
    with torch.cuda.amp.autocast():
        output = cosmos_pipe(
            frames_tensor,
            text_embeddings,
            num_frames=8,  # Quick test
            fps=8,
            seed=42
        )

# Save video
if isinstance(output, torch.Tensor):
    video = output.cpu().numpy()
else:
    video = output

if video.ndim == 5:
    video = video[0]
if video.shape[0] == 3:
    video = np.transpose(video, (1, 2, 3, 0))
if video.max() <= 1.0:
    video = (video * 255).astype(np.uint8)

# Save
writer = imageio.get_writer("/content/output.mp4", fps=8)
for frame in video:
    writer.append_data(frame)
writer.close()

print("\n✅ Video saved to /content/output.mp4")
print(f"Generated {len(video)} frames")
"""

# Run the script
print("Running Cosmos-Predict2 generation script...")
print("This will take 2-3 minutes for first run (downloading models)\n")
print("="*60)
run_python310(cosmos_script)

## Step 9: Display the Generated Video

In [None]:
# Display the generated video
import os
from IPython.display import HTML
import base64

if os.path.exists("/content/output.mp4"):
    with open("/content/output.mp4", "rb") as f:
        video_data = f.read()
    
    encoded = base64.b64encode(video_data).decode('ascii')
    
    display(HTML(f'''
    <video width="640" controls autoplay loop>
        <source src="data:video/mp4;base64,{encoded}" type="video/mp4">
    </video>
    '''))
    
    print("\n🎉 Success! Cosmos-Predict2 is working!")
    print("\nYou can now:")
    print("1. Modify the prompt for different outputs")
    print("2. Increase num_frames for longer videos")
    print("3. Use your own input images")
else:
    print("❌ Video not found. Check the output from the previous cell.")

## 🎉 Complete!

You now have Cosmos-Predict2 running on Google Colab with Python 3.10!

### How This Works:
1. We installed Python 3.10 alongside Colab's Python
2. Created a virtual environment with Python 3.10
3. Installed all packages in that environment
4. Use `run_python310()` helper to execute code

### Tips:
- All Cosmos code must run through `run_python310()`
- Save outputs to `/content/` to access from notebook
- Mount Google Drive to prevent data loss

### Troubleshooting:
- If imports fail, check the virtual environment installation
- For OOM errors, reduce `num_frames` or model size
- The first run downloads models (2-5 minutes)

### Alternative Approach:
If this doesn't work, consider:
1. Using a local environment with Python 3.10
2. Using a cloud service that supports custom environments
3. Building custom wheels for Python 3.11/3.12 (advanced)