# Cosmos-Predict **2.5** on Google Colab with Micromamba (Python 3.10)

This notebook sets up **Cosmos-Predict 2.5** following the [official setup guide](https://github.com/nvidia-cosmos/cosmos-predict2.5/blob/main/docs/setup.md).

## Key Requirements

**Hardware & Driver:**
- **GPU**: NVIDIA Ampere or newer (RTX 30 Series, A100, H100, Blackwell)
- **Driver**: >=570.124.06 (CUDA 12.8.1 compatible)
- **VRAM**: ~32.5 GB for 2B@720p/16fps, ~24 GB for 2B@480p/10fps

**Software:**
- **Python**: 3.10 (required - project specifically targets this version)
- **OS**: Linux x86-64 with glibc>=2.31
- **PyTorch**: CUDA 12.8 compatible version

## Why Micromamba?

- **Python 3.10**: Colab defaults to 3.10+ but Micromamba ensures clean environment
- **No uv**: `uv` doesn't work well with Colab's virtual environment system
- **Fast & Clean**: Conda-compatible package manager for isolated environments
- **CUDA Control**: Easy pinning of CUDA/PyTorch versions

## Setup Time

- **Initial setup**: ~5-10 minutes (environment + dependencies)
- **First inference**: +5-10 minutes (checkpoint auto-download)
- **Subsequent runs**: ~1-2 minutes

## Colab Runtime

- Select **GPU runtime**: Runtime ‚Üí Change runtime type ‚Üí GPU
- **Recommended**: A100 (40GB or 80GB)
- **Minimum**: GPU with Ampere architecture and sufficient VRAM

## Step 1: Check Current Environment

In [None]:
import sys, os, platform
print("üîç Current Colab Environment:\n" + "="*60)
print(f"Python: {sys.version}")
print(f"Executable: {sys.executable}")
print(f"Platform: {platform.platform()}")

# GPU probe
try:
    import torch
    gpu_ok = torch.cuda.is_available()
except Exception:
    gpu_ok = False

if gpu_ok:
    name = torch.cuda.get_device_name(0)
    props = torch.cuda.get_device_properties(0)
    vram_gb = props.total_memory / 1024**3
    print(f"\nGPU: {name}")
    print(f"VRAM: {vram_gb:.1f} GB")
else:
    print("\n‚ö†Ô∏è No GPU detected ‚Äî enable GPU in Runtime ‚Üí Change runtime type ‚Üí T4/V100/A100‚Ä¶ (A100+ recommended)")

print("="*60)
print("\nWe will create a **Python 3.10** env with Micromamba and install Torch (cu126) + Cosmos‚ÄëPredict 2.5.")

## Step 2: (Optional) Mount Google Drive for Autosave

In [None]:
from google.colab import drive
from datetime import datetime
import os, sys

MOUNT_DRIVE = True
if MOUNT_DRIVE:
    try:
        drive.mount('/content/drive')
        ts = datetime.now().strftime('%Y%m%d_%H%M%S')
        out_dir = f"/content/drive/MyDrive/cosmos_outputs_{ts}"
        os.makedirs(out_dir, exist_ok=True)
        os.environ['DRIVE_OUTPUT_DIR'] = out_dir
        print(f"‚úÖ Drive mounted | Session dir: {out_dir}")
        with open(f"{out_dir}/session.txt", 'w') as f:
            f.write(f"Session: {datetime.now()}\n")
            f.write(f"Colab Python: {sys.version}\n")
    except Exception as e:
        print(f"‚ö†Ô∏è Could not mount Drive: {e}\nFiles will be local only.")
else:
    print("‚è≠Ô∏è Skipping Drive mount ‚Äî outputs will be ephemeral.")

## Step 3: Install Micromamba

In [None]:
%%bash
set -euo pipefail
echo "üì¶ Installing Micromamba..."
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba -O > /usr/local/bin/micromamba 2>/dev/null
chmod +x /usr/local/bin/micromamba
export MAMBA_ROOT_PREFIX=/content/micromamba
mkdir -p "$MAMBA_ROOT_PREFIX"
/usr/local/bin/micromamba --version
echo "‚úÖ Micromamba installed"

## Step 4: Create a **Python 3.10** Environment

In [None]:
%%bash
set -euo pipefail
export MAMBA_ROOT_PREFIX=/content/micromamba
echo "üêç Creating env: cosmos310 (Python 3.10)"
/usr/local/bin/micromamba create -y -n cosmos310 python=3.10 pip -c conda-forge
/usr/local/bin/micromamba run -n cosmos310 python --version
echo "‚úÖ Env created"

## Step 5: Install **PyTorch (CUDA 12.8.1 compatible)** and **Cosmos‚ÄëPredict 2.5**

Following the [official setup guide](https://github.com/nvidia-cosmos/cosmos-predict2.5/blob/main/docs/setup.md):
- Requires NVIDIA driver >=570.124.06 (compatible with CUDA 12.8.1)
- Python 3.10 (verified in Step 4)
- Ampere+ architecture GPUs (A100, RTX 30 series or newer)

In [None]:
%%bash
set -euo pipefail
export MAMBA_ROOT_PREFIX=/content/micromamba

echo "? Checking NVIDIA driver version..."
nvidia-smi | grep "CUDA Version:" || echo "‚ö†Ô∏è Driver check failed"

echo ""
echo "?üì¶ Installing PyTorch with CUDA 12.8.1 support"
/usr/local/bin/micromamba run -n cosmos310 python - <<'PY'
import sys, subprocess

def pip(args):
    cmd = [sys.executable, '-m', 'pip'] + args
    print('>>>', ' '.join(cmd))
    subprocess.check_call(cmd)

# Upgrade pip first
pip(['install', '--upgrade', 'pip'])

# Install PyTorch with CUDA 12.8 support
# Using cu128 wheels from PyTorch (CUDA 12.8.1 compatible)
pip([
    'install', '--upgrade',
    'torch', 'torchvision', 'torchaudio',
    '--index-url', 'https://download.pytorch.org/whl/cu128'
])

print("‚úÖ PyTorch installation complete")
PY

echo ""
echo "üì• Cloning Cosmos-Predict 2.5 repository"
if [ ! -d "/content/cosmos-predict2.5" ]; then
    git clone https://github.com/nvidia-cosmos/cosmos-predict2.5.git /content/cosmos-predict2.5
    echo "‚úÖ Repository cloned"
else
    echo "‚ÑπÔ∏è Repository already exists"
fi

echo ""
echo "üì¶ Installing Cosmos-Predict 2.5 and dependencies"
cd /content/cosmos-predict2.5

/usr/local/bin/micromamba run -n cosmos310 python - <<'PY'
import sys, subprocess, os

def pip(args):
    cmd = [sys.executable, '-m', 'pip'] + args
    print('>>>', ' '.join(cmd))
    subprocess.check_call(cmd)

# Install the package in editable mode from the cloned repo
os.chdir('/content/cosmos-predict2.5')

# Install core dependencies first
pip(['install', '--upgrade', 
     'transformers', 'accelerate', 'safetensors', 'huggingface_hub'])

# Install the package (this will install from pyproject.toml)
pip(['install', '-e', '.'])

# Install additional runtime dependencies
pip(['install', 
     'decord', 'einops', 'imageio[ffmpeg]', 
     'opencv-python-headless', 'pillow', 'numpy'])

print("‚úÖ Cosmos-Predict 2.5 installation complete")
PY

echo ""
echo "üîé Verifying installation"
/usr/local/bin/micromamba run -n cosmos310 python - <<'PY'
import torch
print('='*60)
print('Python:', sys.version.split()[0])
print('Torch:', torch.__version__)
print('CUDA:', torch.version.cuda)
print('CUDA available:', torch.cuda.is_available())

if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))
    cc_major, cc_minor = torch.cuda.get_device_capability(0)
    print(f'Compute Capability: {cc_major}.{cc_minor}')
    print('BF16 supported (Ampere+ expected):', cc_major >= 8)
    
    # Check driver
    import subprocess
    try:
        result = subprocess.run(['nvidia-smi', '--query-gpu=driver_version', '--format=csv,noheader'], 
                              capture_output=True, text=True)
        driver = result.stdout.strip()
        print(f'NVIDIA Driver: {driver}')
        driver_num = float(driver.split('.')[0])
        if driver_num >= 570:
            print('‚úÖ Driver version compatible with CUDA 12.8.1')
        else:
            print(f'‚ö†Ô∏è Driver {driver} may not support CUDA 12.8.1 (requires >=570.124.06)')
    except:
        pass

print('='*60)
PY

echo ""
echo "‚úÖ Installation complete!"

## Step 6: Verify Cosmos‚ÄëPredict 2.5 import

In [None]:
%%bash
export MAMBA_ROOT_PREFIX=/content/micromamba
/usr/local/bin/micromamba run -n cosmos310 python - <<'PY'
try:
    import cosmos_predict2
    from cosmos_predict2 import inference
    print('‚úÖ cosmos_predict2 imported')
    # Show available helpers if present
    print('inference attrs:', [a for a in dir(inference) if 'Video' in a or 'World' in a][:8])
except Exception as e:
    print('‚ùå Import failed:', e)
PY

## Step 7: Helpers to run code inside the **cosmos310** env

In [None]:
import subprocess, tempfile, os, json, glob, shutil

def run_cosmos_py(code, echo=True):
    with tempfile.NamedTemporaryFile('w', suffix='.py', delete=False) as f:
        f.write(code)
        path = f.name
    try:
        env = os.environ.copy(); env['MAMBA_ROOT_PREFIX'] = '/content/micromamba'
        cmd = ['/usr/local/bin/micromamba','run','-n','cosmos310','python',path]
        res = subprocess.run(cmd, text=True, capture_output=True, env=env)
        if echo:
            print(res.stdout)
            if res.returncode != 0:
                print(res.stderr)
        return res
    finally:
        try: os.unlink(path)
        except: pass

def run_cosmos_cmd(cmd):
    full = f"export MAMBA_ROOT_PREFIX=/content/micromamba && /usr/local/bin/micromamba run -n cosmos310 {cmd}"
    return subprocess.run(full, shell=True)

## Step 8: **Download Model Checkpoints** (2 options)

## Step 8: **Download Model Checkpoints**

According to the official docs, checkpoints are automatically downloaded during inference and post-training to the Hugging Face cache (controlled by `HF_HOME` environment variable).

You can also pre-download using the checkpoint downloader script:

```bash
# For 2B Video2World model (default 720p/16fps)
python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B

# For lower VRAM (24GB GPUs): 480p/10fps variant
python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B --resolution 480 --fps 10
```

Alternatively, checkpoints will auto-download on first inference run.

In [None]:
%%bash
set -euo pipefail
export MAMBA_ROOT_PREFIX=/content/micromamba

echo "? Setting HF_HOME for checkpoint cache"
export HF_HOME=/content/hf_cache
mkdir -p $HF_HOME

echo "‚ÑπÔ∏è Checkpoints will be automatically downloaded during inference"
echo "Cache location: $HF_HOME"
echo ""
echo "Optional: Pre-download checkpoints now"
echo "Running download script for 2B Video2World (720p/16fps)..."

/usr/local/bin/micromamba run -n cosmos310 bash -c "
export HF_HOME=$HF_HOME
cd /content/cosmos-predict2.5
python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B 2>&1 || echo '‚ö†Ô∏è Auto-download will occur during first inference'
"

echo ""
echo "‚úÖ Checkpoint setup complete"

## Step 9: Generate a Test Video (CLI example, BF16)

In [None]:
%%bash
set -euo pipefail
export MAMBA_ROOT_PREFIX=/content/micromamba
export HF_HOME=/content/hf_cache

echo "üñºÔ∏è Creating a simple test input image"
python - <<'PY'
from PIL import Image
import numpy as np
# Create 720p image (1280x720)
img = np.ones((720,1280,3), dtype=np.uint8)*96
# Add a colored rectangle
img[220:500, 420:860] = [210,150,100]
Image.fromarray(img).save('/content/input0.jpg')
print('‚úÖ Created /content/input0.jpg')
PY

echo ""
echo "üé¨ Running Video2World inference (2B model, BF16)"
echo "Note: First run will auto-download checkpoints (~several GB)"
echo ""

/usr/local/bin/micromamba run -n cosmos310 bash -c "
export HF_HOME=$HF_HOME
cd /content/cosmos-predict2.5
python -m examples.video2world \
    --model_size 2B \
    --input_path /content/input0.jpg \
    --num_conditional_frames 1 \
    --prompt 'A robotic arm moves smoothly across the table, picking up objects' \
    --save_path /content/cosmos_output.mp4 \
    --use_bf16
"

if [ -f /content/cosmos_output.mp4 ]; then
    echo ""
    echo "‚úÖ Video generation successful: /content/cosmos_output.mp4"
else
    echo ""
    echo "‚ö†Ô∏è CLI inference failed. Will try direct pipeline approach below."
fi

### Plan B: Direct pipeline call (kept close to your original). Uses BF16 autocast.

In [None]:
gen_py = r'''
import os, sys
os.environ['HF_HOME'] = '/content/hf_cache'
sys.path.insert(0, '/content/cosmos-predict2.5')

import numpy as np
import torch
from PIL import Image
from einops import rearrange
import imageio

print("Loading Cosmos-Predict 2.5 pipeline...")
from cosmos_predict2.inference import Video2WorldPipeline, get_cosmos_predict2_video2world_pipeline

# Get default configuration for 2B model
config = get_cosmos_predict2_video2world_pipeline(model_size='2B')

print("Initializing pipeline on CUDA with BF16...")
pipe = Video2WorldPipeline.from_config(config).to('cuda').eval()

# Prepare input
if not os.path.exists('/content/input0.jpg'):
    print("Creating test input image...")
    img = np.ones((720,1280,3), dtype=np.uint8)*96
    img[220:500, 420:860] = [210,150,100]
    Image.fromarray(img).save('/content/input0.jpg')

print("Loading input image...")
img = Image.open('/content/input0.jpg')
frames = np.array(img)[None,...]  # Add time dimension
frames = torch.from_numpy(frames).float() / 255.0
frames = rearrange(frames, 't h w c -> 1 c t h w').to('cuda')

prompt = 'A robotic arm moves smoothly across the table, picking up objects'
num_frames = 16
fps = 16
seed = 42

print(f"Generating {num_frames} frames at {fps} FPS...")
print(f"Prompt: {prompt}")

with torch.inference_mode():
    with torch.autocast('cuda', dtype=torch.bfloat16):
        out = pipe(
            frames, 
            prompt=prompt, 
            num_frames=num_frames, 
            fps=fps, 
            seed=seed
        )

print("Processing output...")
video = out if isinstance(out, np.ndarray) else out.detach().cpu().numpy()

# Handle different output formats
if video.ndim == 5:  # (batch, channels, frames, height, width)
    video = video[0]
if video.shape[0] == 3:  # (channels, frames, height, width)
    video = np.transpose(video, (1,2,3,0))  # -> (frames, height, width, channels)

# Normalize to uint8
if video.max() <= 1.0:
    video = (video * 255).astype(np.uint8)
else:
    video = video.astype(np.uint8)

print(f"Writing video: {video.shape}")
writer = imageio.get_writer('/content/cosmos_output_direct.mp4', fps=fps)
for frame in video:
    writer.append_data(frame)
writer.close()

print('‚úÖ Video saved: /content/cosmos_output_direct.mp4')
'''

run_cosmos_py(gen_py)

## Step 10: Display the Generated Video

In [None]:
import os, base64
from IPython.display import HTML, Image, display

print("üìä Results:")
print("="*60)

# Show input
if os.path.exists('/content/input0.jpg'):
    print('\nüì∏ Input Image:')
    display(Image('/content/input0.jpg', width=640))

# Find and display output video
video_paths = [
    '/content/cosmos_output.mp4',
    '/content/cosmos_output_direct.mp4',
    '/content/cosmos_output_bf16.mp4'
]

found_videos = [p for p in video_paths if os.path.exists(p)]

if found_videos:
    for video_path in found_videos:
        file_size = os.path.getsize(video_path) / (1024 * 1024)  # MB
        print(f'\nüé• Generated Video: {os.path.basename(video_path)} ({file_size:.2f} MB)')
        
        # Display video
        with open(video_path, 'rb') as f:
            video_data = f.read()
        encoded = base64.b64encode(video_data).decode('ascii')
        display(HTML(f"""
            <video width='640' height='360' controls autoplay loop>
                <source src='data:video/mp4;base64,{encoded}' type='video/mp4'>
            </video>
        """))
    
    print('\n' + '='*60)
    print('üéâ SUCCESS! Video generation complete.')
    print('='*60)
else:
    print('\n‚ùå No output video found.')
    print('Check the logs above for errors (OOM, missing checkpoints, driver issues).')
    print('\nTroubleshooting:')
    print('- Ensure GPU has enough VRAM (~32.5 GB for 720p/16fps)')
    print('- Check NVIDIA driver >=570.124.06 for CUDA 12.8.1')
    print('- Try lower resolution: --resolution 480 --fps 10')

## Step 11: Save Results to Google Drive (if mounted)

In [None]:
import os, shutil

out_dir = os.environ.get('DRIVE_OUTPUT_DIR')
if out_dir and os.path.exists(out_dir):
    print("‚òÅÔ∏è Backing up to Google Drive...")
    print("="*60)
    
    files_to_backup = [
        '/content/input0.jpg',
        '/content/cosmos_output.mp4',
        '/content/cosmos_output_direct.mp4',
        '/content/cosmos_output_bf16.mp4'
    ]
    
    backed_up = []
    for src_path in files_to_backup:
        if os.path.exists(src_path):
            filename = os.path.basename(src_path)
            dst_path = os.path.join(out_dir, filename)
            shutil.copy2(src_path, dst_path)
            file_size = os.path.getsize(dst_path) / (1024 * 1024)
            backed_up.append(f"  ‚úÖ {filename} ({file_size:.2f} MB)")
    
    if backed_up:
        print('\n'.join(backed_up))
        print("="*60)
        print(f"üìÅ All files saved to: {out_dir}")
    else:
        print("‚ö†Ô∏è No files found to backup")
else:
    print('‚ÑπÔ∏è Google Drive not mounted - outputs are local only')
    print('Files will be lost when runtime disconnects')

## ‚úÖ Notes, Tips & Troubleshooting

### System Requirements (from official docs)
- **GPU**: NVIDIA Ampere architecture or newer (RTX 30 Series, A100, H100, etc.)
- **Driver**: >=570.124.06 (compatible with CUDA 12.8.1)
- **Python**: 3.10 (required)
- **OS**: Linux x86-64 with glibc>=2.31

### VRAM Requirements
- **2B Model @ 720p/16fps**: ~32.5 GB VRAM (A100 recommended)
- **2B Model @ 480p/10fps**: ~24 GB VRAM (RTX 3090/4090 class)
- Use `--resolution 480 --fps 10` in the download script for lower VRAM

### Common Issues

**Out of Memory (OOM) errors:**
- Use 2B model instead of 14B
- Lower resolution/fps: `--resolution 480 --fps 10`
- Ensure no other GPU processes are running

**CUDA driver version insufficient:**
- Update NVIDIA drivers to latest version >=570.124.06
- Check with: `nvidia-smi | grep "CUDA Version:"`

**Import errors:**
- Verify Python 3.10: `python --version`
- Reinstall in cosmos310 environment
- Check all dependencies installed correctly

**Checkpoint download issues:**
- Set `HF_HOME` environment variable for custom cache location
- Checkpoints auto-download during first inference
- Requires Hugging Face access for some models

### Performance Tips
- Always use `--use_bf16` flag for BF16 precision (Ampere+ required)
- BF16 is faster and uses less VRAM than FP32
- First run will be slower due to checkpoint downloads

For more help, see [GitHub Issues](https://github.com/nvidia-cosmos/cosmos-predict2.5/issues)