# Cosmos-Predict **2.5** on Google Colab with Micromamba (Python 3.10)

This notebook sets up a **Python 3.10** environment and runs **Cosmos‑Predict 2.5**.

**What changed from 2.0 → 2.5**
- Targets **Ampere/Hopper/Blackwell** GPUs; Volta/Turing are not supported. Use **BF16** for inference.
- 2B 720p/16fps Video2World typically needs ~**32.5 GB VRAM**; use **480p/10fps** on ~24GB cards.

**Why Micromamba?**
- Fast, clean Conda-compatible envs
- Lets us run **Python 3.10** alongside Colab’s default
- Easy to pin CUDA/Torch wheels

**Requirements**
- Colab **GPU** runtime (**A100/H100/H200** recommended). L4/Ada may work but is not officially listed.
- ~5–10 minutes for initial setup

## Step 1: Check Current Environment

In [None]:
import sys, os, platform
print("🔍 Current Colab Environment:\n" + "="*60)
print(f"Python: {sys.version}")
print(f"Executable: {sys.executable}")
print(f"Platform: {platform.platform()}")

# GPU probe
try:
    import torch
    gpu_ok = torch.cuda.is_available()
except Exception:
    gpu_ok = False

if gpu_ok:
    name = torch.cuda.get_device_name(0)
    props = torch.cuda.get_device_properties(0)
    vram_gb = props.total_memory / 1024**3
    print(f"\nGPU: {name}")
    print(f"VRAM: {vram_gb:.1f} GB")
else:
    print("\n⚠️ No GPU detected — enable GPU in Runtime → Change runtime type → T4/V100/A100… (A100+ recommended)")

print("="*60)
print("\nWe will create a **Python 3.10** env with Micromamba and install Torch (cu126) + Cosmos‑Predict 2.5.")

## Step 2: (Optional) Mount Google Drive for Autosave

In [None]:
from google.colab import drive
from datetime import datetime
import os, sys

MOUNT_DRIVE = True
if MOUNT_DRIVE:
    try:
        drive.mount('/content/drive')
        ts = datetime.now().strftime('%Y%m%d_%H%M%S')
        out_dir = f"/content/drive/MyDrive/cosmos_outputs_{ts}"
        os.makedirs(out_dir, exist_ok=True)
        os.environ['DRIVE_OUTPUT_DIR'] = out_dir
        print(f"✅ Drive mounted | Session dir: {out_dir}")
        with open(f"{out_dir}/session.txt", 'w') as f:
            f.write(f"Session: {datetime.now()}\n")
            f.write(f"Colab Python: {sys.version}\n")
    except Exception as e:
        print(f"⚠️ Could not mount Drive: {e}\nFiles will be local only.")
else:
    print("⏭️ Skipping Drive mount — outputs will be ephemeral.")

## Step 3: Install Micromamba

In [None]:
%%bash
set -euo pipefail
echo "📦 Installing Micromamba..."
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba -O > /usr/local/bin/micromamba 2>/dev/null
chmod +x /usr/local/bin/micromamba
export MAMBA_ROOT_PREFIX=/content/micromamba
mkdir -p "$MAMBA_ROOT_PREFIX"
/usr/local/bin/micromamba --version
echo "✅ Micromamba installed"

## Step 4: Create a **Python 3.10** Environment

In [None]:
%%bash
set -euo pipefail
export MAMBA_ROOT_PREFIX=/content/micromamba
echo "🐍 Creating env: cosmos310 (Python 3.10)"
/usr/local/bin/micromamba create -y -n cosmos310 python=3.10 pip -c conda-forge
/usr/local/bin/micromamba run -n cosmos310 python --version
echo "✅ Env created"

## Step 5: Install **PyTorch (cu126)** and **Cosmos‑Predict 2.5**

In [None]:
%%bash
set -euo pipefail
export MAMBA_ROOT_PREFIX=/content/micromamba
echo "📦 Installing PyTorch cu126 (Torch 2.6 line)"
/usr/local/bin/micromamba run -n cosmos310 python - <<'PY'
import sys, subprocess
def pip(args):
    cmd = [sys.executable, '-m', 'pip'] + args
    print('>>>', ' '.join(cmd))
    subprocess.check_call(cmd)

# Torch cu126 wheels + matching torchvision/torchaudio
pip([
  'install', '--upgrade',
  '--index-url', 'https://download.pytorch.org/whl/cu126',
  'torch==2.6.0', 'torchvision==0.21.0', 'torchaudio==2.6.0'
])

# Faster path with NVIDIA prebuilt deps (flash-attn, TE, natten) if available
USE_PREBUILT='1'
cosmos_pkg = 'cosmos-predict2[cu126]>=2.5,<2.6'
if USE_PREBUILT == '1':
    pip(['install', cosmos_pkg,
         '--extra-index-url', 'https://nvidia-cosmos.github.io/cosmos-dependencies/cu126_torch260/simple'])
else:
    pip(['install', 'cosmos-predict2>=2.5,<2.6'])

# Quality-of-life deps
pip(['install', 'transformers', 'accelerate', 'decord', 'einops', 'imageio[ffmpeg]', 'opencv-python-headless', 'pillow', 'huggingface_hub'])
print("✅ Installs complete")
PY

echo "\n🔎 Torch & CUDA check"
/usr/local/bin/micromamba run -n cosmos310 python - <<'PY'
import torch
print('Torch:', torch.__version__)
print('CUDA:', torch.version.cuda)
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))
    # BF16 capability probe
    cc_major = torch.cuda.get_device_capability(0)[0]
    print('BF16 supported (Ampere+ expected):', cc_major >= 8)
PY

## Step 6: Verify Cosmos‑Predict 2.5 import

In [None]:
%%bash
export MAMBA_ROOT_PREFIX=/content/micromamba
/usr/local/bin/micromamba run -n cosmos310 python - <<'PY'
try:
    import cosmos_predict2
    from cosmos_predict2 import inference
    print('✅ cosmos_predict2 imported')
    # Show available helpers if present
    print('inference attrs:', [a for a in dir(inference) if 'Video' in a or 'World' in a][:8])
except Exception as e:
    print('❌ Import failed:', e)
PY

## Step 7: Helpers to run code inside the **cosmos310** env

In [None]:
import subprocess, tempfile, os, json, glob, shutil

def run_cosmos_py(code, echo=True):
    with tempfile.NamedTemporaryFile('w', suffix='.py', delete=False) as f:
        f.write(code)
        path = f.name
    try:
        env = os.environ.copy(); env['MAMBA_ROOT_PREFIX'] = '/content/micromamba'
        cmd = ['/usr/local/bin/micromamba','run','-n','cosmos310','python',path]
        res = subprocess.run(cmd, text=True, capture_output=True, env=env)
        if echo:
            print(res.stdout)
            if res.returncode != 0:
                print(res.stderr)
        return res
    finally:
        try: os.unlink(path)
        except: pass

def run_cosmos_cmd(cmd):
    full = f"export MAMBA_ROOT_PREFIX=/content/micromamba && /usr/local/bin/micromamba run -n cosmos310 {cmd}"
    return subprocess.run(full, shell=True)

## Step 8: **Download Model Checkpoints** (2 options)

**Option A (recommended): repo downloader.** Works with multiple resolutions/fps.

```bash
python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B            # 720p/16fps by default
python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B --resolution 480 --fps 10   # 24GB GPUs
```

**Option B (fallback): Hugging Face snapshot (Cosmos‑Predict2.5‑2B)**

In [None]:
%%bash
set -euo pipefail
export MAMBA_ROOT_PREFIX=/content/micromamba
echo "📥 Cloning cosmos-predict2.5 repo (for downloader/scripts)"
git clone --depth 1 https://github.com/nvidia-cosmos/cosmos-predict2.5 /content/cosmos-predict2.5 || true

echo "🔧 Installing repo (editable) to expose scripts/*"
/usr/local/bin/micromamba run -n cosmos310 python -m pip install -e /content/cosmos-predict2.5

echo "📦 Downloading 2B Video2World (default 720p/16fps). For 24GB GPUs, rerun with --resolution 480 --fps 10"
/usr/local/bin/micromamba run -n cosmos310 python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B --checkpoint_dir /content/cosmos_ckpts || true
echo "✅ Download step attempted (see logs above)."

In [None]:
# Fallback snapshot (if the script didn't fetch). You must have HF access for the model.
fallback_code = r'''
from huggingface_hub import snapshot_download
import os
dst = '/content/cosmos_ckpts_fallback'
os.makedirs(dst, exist_ok=True)
repo_id = 'nvidia/Cosmos-Predict2.5-2B'
path = snapshot_download(repo_id=repo_id, cache_dir=dst, resume_download=True)
print('✅ HF snapshot at:', path)
open('/content/cosmos_ckpt_path.txt','w').write(path)
'''
run_cosmos_py(fallback_code)

## Step 9: Generate a Test Video (CLI example, BF16)

In [None]:
%%bash
set -euo pipefail
export MAMBA_ROOT_PREFIX=/content/micromamba
echo "🖼️ Creating a simple input image"
python - <<'PY'
from PIL import Image
import numpy as np
img = np.ones((704,1280,3), dtype=np.uint8)*96
img[220:500, 420:860] = [210,150,100]
Image.fromarray(img).save('/content/input0.jpg')
print('saved /content/input0.jpg')
PY

echo "🎬 Running examples.video2world (2B, 720p/16fps). If you OOM, download 480p/10fps and add --resolution 480 --fps 10."
/usr/local/bin/micromamba run -n cosmos310 python - <<'PY'
import os, subprocess, sys
cmd = [sys.executable, '-m', 'examples.video2world',
       '--model_size','2B',
       '--input_path','/content/input0.jpg',
       '--num_conditional_frames','1',
       '--prompt','A robotic arm moves smoothly across the table, picking up objects',
       '--save_path','/content/cosmos_output.mp4',
       '--use_bf16']
print('>>>', ' '.join(cmd))
try:
    subprocess.check_call(cmd)
    print('✅ Video saved: /content/cosmos_output.mp4')
except subprocess.CalledProcessError as e:
    print('❌ CLI failed (likely missing checkpoint variant). Will try direct pipeline next.')
PY

### Plan B: Direct pipeline call (kept close to your original). Uses BF16 autocast.

In [None]:
gen_py = r'''
import os, glob, numpy as np, torch
from PIL import Image
from einops import rearrange
from cosmos_predict2.inference import Video2WorldPipeline, get_cosmos_predict2_video2world_pipeline
import imageio

ckpt_dir = '/content/cosmos_ckpts' if os.path.exists('/content/cosmos_ckpts') else None
if not ckpt_dir and os.path.exists('/content/cosmos_ckpts_fallback'):
    ckpt_dir = open('/content/cosmos_ckpt_path.txt').read().strip()
print('Using checkpoint root:', ckpt_dir)

config = get_cosmos_predict2_video2world_pipeline(model_size='2B')
if ckpt_dir:
    cands = glob.glob(os.path.join(ckpt_dir,'**','*.pt'), recursive=True) or \
            glob.glob(os.path.join(ckpt_dir,'**','*.safetensors'), recursive=True)
    if cands:
        config['dit_checkpoint_path'] = cands[0]
        print('Model checkpoint:', os.path.basename(cands[0]))

pipe = Video2WorldPipeline.from_config(config).to('cuda').eval()

# Read/create input
if not os.path.exists('/content/input0.jpg'):
    from PIL import Image
    img = np.ones((704,1280,3), dtype=np.uint8)*96
    Image.fromarray(img).save('/content/input0.jpg')
img = Image.open('/content/input0.jpg')
frames = np.array(img)[None,...]
frames = torch.from_numpy(frames).float()/255.0
frames = rearrange(frames, 't h w c -> 1 c t h w').to('cuda')

prompt = 'A robotic arm moves smoothly across the table, picking up objects'
num_frames = 16

with torch.inference_mode():
    with torch.autocast('cuda', dtype=torch.bfloat16):
        out = pipe(frames, prompt=prompt, num_frames=num_frames, fps=16, seed=42)

video = out if isinstance(out, np.ndarray) else out.detach().cpu().numpy()
if video.ndim == 5: video = video[0]
if video.shape[0] == 3: video = np.transpose(video, (1,2,3,0))
if video.max() <= 1.0: video = (video*255).astype(np.uint8)
writer = imageio.get_writer('/content/cosmos_output_bf16.mp4', fps=16)
for f in video: writer.append_data(f)
writer.close()
print('✅ Saved /content/cosmos_output_bf16.mp4')
'''
run_cosmos_py(gen_py)

## Step 10: Display the Generated Video

In [None]:
import os, base64
from IPython.display import HTML, Image, display
if os.path.exists('/content/input0.jpg'):
    print('📸 Input image:')
    display(Image('/content/input0.jpg', width=400))
paths = [p for p in ['/content/cosmos_output.mp4','/content/cosmos_output_bf16.mp4'] if os.path.exists(p)]
if paths:
    p = paths[-1]
    print('🎥 Generated video:', p)
    data = open(p,'rb').read(); enc = base64.b64encode(data).decode('ascii')
    display(HTML(f"<video width='640' height='360' controls autoplay loop><source src='data:video/mp4;base64,{enc}' type='video/mp4'></video>"))
    print('🎉 Success')
else:
    print('❌ No video found. Check logs above for OOM or missing checkpoints.')

## Step 11: Save Results to Google Drive (if mounted)

In [None]:
import os, shutil
out_dir = os.environ.get('DRIVE_OUTPUT_DIR')
if out_dir and os.path.exists(out_dir):
    for p in ['/content/cosmos_output.mp4','/content/cosmos_output_bf16.mp4','/content/input0.jpg']:
        if os.path.exists(p):
            dst = os.path.join(out_dir, os.path.basename(p))
            shutil.copy2(p, dst)
            print('☁️ Saved to Drive:', dst)
else:
    print('ℹ️ Drive not mounted; skipping backup.')

## ✅ Notes, Tips & Troubleshooting

- **Python version**: Project scripts and docs assume **Python 3.10** (Conda examples even symlink headers under `python3.10`).
- **GPU & precision**: 2.5 officially lists **Ampere/Hopper/Blackwell** and **BF16** for inference; 2B 720p/16fps needs ~**32.5 GB** VRAM.
- **Lower VRAM path**: use the downloader with `--resolution 480 --fps 10` for ~24 GB class cards.
- **If `examples.video2world` fails**: fall back to the direct pipeline cell (Plan B), which sets BF16 autocast.
- **If wheels conflict**: comment out the extra index line and expect source builds as described in NVIDIA’s docs (flash‑attn/TE/NATTEN).