# Cosmos Predict2 on Google Colab with Micromamba (Python 3.10)

This notebook uses Micromamba to create a Python 3.10 environment for Cosmos-Predict2.

**Why Micromamba?**
- Fast and lightweight conda alternative
- Can install Python 3.10 alongside any Colab runtime
- Clean environment management
- Works with Colab's Python 3.11/3.12

**Requirements:**
- Google Colab with GPU (A100, V100, or T4)
- ~5 minutes for initial setup

## Step 1: Check Current Environment

In [None]:
import sys
import os

print("🔍 Current Colab Environment:")
print("="*60)
print(f"Python version: {sys.version}")
print(f"Python path: {sys.executable}")

# Check GPU
try:
    import torch
    if torch.cuda.is_available():
        gpu_name = torch.cuda.get_device_name(0)
        gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
        print(f"\nGPU: {gpu_name}")
        print(f"Memory: {gpu_memory:.1f} GB")
        print(f"CUDA: {torch.version.cuda}")
    else:
        print("\n⚠️ No GPU detected - Please enable GPU in Runtime settings")
        gpu_memory = 0
        gpu_name = "None"
except:
    print("\nPyTorch not installed in Colab environment (this is fine)")
    gpu_memory = 16  # Default assumption
    gpu_name = "Unknown"

print("="*60)
print("\n✅ We'll install Python 3.10 with Micromamba for Cosmos-Predict2")

## Step 2: Install Micromamba

In [None]:
from google.colab import drive
from datetime import datetime
import os

# Mount Google Drive for backup
mount_drive = True  # Set to False only if you don't want backups

if mount_drive:
    try:
        drive.mount('/content/drive')
        
        # Create session directory
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        drive_output_dir = f"/content/drive/MyDrive/cosmos_outputs_{timestamp}"
        os.makedirs(drive_output_dir, exist_ok=True)
        
        print(f"✅ Google Drive mounted successfully")
        print(f"📁 Session directory: {drive_output_dir}")
        print(f"💾 All outputs will be auto-saved here")
        
        # Save session start info
        with open(f"{drive_output_dir}/session.txt", 'w') as f:
            f.write(f"Session started: {datetime.now()}\n")
            f.write(f"Colab Python: {sys.version}\n")
            if 'gpu_name' in locals():
                f.write(f"GPU: {gpu_name}\n")
        
        # Store for later use
        os.environ['DRIVE_OUTPUT_DIR'] = drive_output_dir
        
    except Exception as e:
        print(f"⚠️ Could not mount Drive: {e}")
        print("Your work will only be saved locally (may be lost if session disconnects)")
        drive_output_dir = None
else:
    print("⚠️ Skipping Drive mount - no automatic backups!")
    print("⚠️ Generated files will be LOST if session disconnects!")
    drive_output_dir = None

## Step 3: Install Micromamba

In [None]:
%%bash
echo "📦 Installing Micromamba..."
echo ""

# Download and install micromamba
curl -L https://micro.mamba.pm/api/micromamba/linux-64/latest | \
  tar -xvj bin/micromamba -O > /usr/local/bin/micromamba 2>/dev/null

chmod +x /usr/local/bin/micromamba

# Initialize micromamba
export MAMBA_ROOT_PREFIX=/content/micromamba
mkdir -p $MAMBA_ROOT_PREFIX

# Verify installation
/usr/local/bin/micromamba --version

echo ""
echo "✅ Micromamba installed successfully!"

## Step 4: Create Python 3.10 Environment

In [None]:
%%bash
echo "🐍 Creating Python 3.10 environment..."
echo ""

export MAMBA_ROOT_PREFIX=/content/micromamba

# Create environment with Python 3.10
/usr/local/bin/micromamba create -y -n cosmos310 python=3.10 pip -c conda-forge

# Verify Python version
echo ""
echo "Verifying Python version:"
/usr/local/bin/micromamba run -n cosmos310 python --version

echo ""
echo "✅ Python 3.10 environment created!"

## Step 5: Install Cosmos-Predict2 Dependencies

In [None]:
%%bash
echo "📦 Installing PyTorch and CUDA dependencies..."
echo "This will take 2-3 minutes..."
echo ""

export MAMBA_ROOT_PREFIX=/content/micromamba

# Install PyTorch with CUDA support
/usr/local/bin/micromamba run -n cosmos310 pip install \
  torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu121

echo ""
echo "✅ PyTorch installed"

# Verify CUDA
echo ""
echo "Checking CUDA availability:"
/usr/local/bin/micromamba run -n cosmos310 python -c \
  "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'PyTorch version: {torch.__version__}')"

In [None]:
%%bash
echo "📦 Installing Cosmos-Predict2..."
echo "This will take 3-4 minutes..."
echo ""

export MAMBA_ROOT_PREFIX=/content/micromamba

# Install Cosmos-Predict2
/usr/local/bin/micromamba run -n cosmos310 pip install \
  "cosmos-predict2[cu126]" \
  --extra-index-url https://nvidia-cosmos.github.io/cosmos-dependencies/cu126_torch260/simple

echo ""
echo "✅ Cosmos-Predict2 installed"

# Install additional dependencies
echo ""
echo "Installing additional dependencies..."
/usr/local/bin/micromamba run -n cosmos310 pip install \
  transformers accelerate bitsandbytes \
  decord einops "imageio[ffmpeg]" \
  opencv-python-headless pillow

echo ""
echo "✅ All dependencies installed!"

## Step 6: Verify Installation

In [None]:
%%bash
echo "🔍 Verifying installation..."
echo "="*60
echo ""

export MAMBA_ROOT_PREFIX=/content/micromamba

# Run verification script
/usr/local/bin/micromamba run -n cosmos310 python -c "
import sys
print(f'Python version: {sys.version}')
print(f'Python path: {sys.executable}')
print()

# Check PyTorch
try:
    import torch
    print(f'✅ PyTorch {torch.__version__}')
    print(f'   CUDA available: {torch.cuda.is_available()}')
    if torch.cuda.is_available():
        print(f'   GPU: {torch.cuda.get_device_name(0)}')
except ImportError as e:
    print(f'❌ PyTorch import failed: {e}')

# Check Cosmos-Predict2
try:
    from cosmos_predict2.inference import Video2WorldPipeline
    print('✅ Cosmos-Predict2 imported successfully!')
except ImportError as e:
    print(f'❌ Cosmos-Predict2 import failed: {e}')

# Check other dependencies
try:
    import transformers
    print(f'✅ Transformers {transformers.__version__}')
except:
    print('❌ Transformers not found')
"

echo ""
echo "="*60

## Step 7: Create Helper Functions

In [None]:
import subprocess
import tempfile
import os
import json

def run_cosmos(code, return_output=False, verbose=True):
    """
    Run Python code in the Cosmos Python 3.10 environment.
    
    Args:
        code: Python code string to execute
        return_output: If True, return output as string
        verbose: If True, print output
    
    Returns:
        Output string if return_output=True, else None
    """
    # Write code to temporary file
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(code)
        temp_file = f.name
    
    try:
        # Run with micromamba
        cmd = [
            '/usr/local/bin/micromamba', 'run', '-n', 'cosmos310',
            'python', temp_file
        ]
        
        env = os.environ.copy()
        env['MAMBA_ROOT_PREFIX'] = '/content/micromamba'
        
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            env=env
        )
        
        if result.returncode != 0:
            print(f"❌ Error running code:")
            print(result.stderr)
            return None
        
        if verbose and result.stdout:
            print(result.stdout)
        
        if return_output:
            return result.stdout
            
    finally:
        # Clean up temp file
        if os.path.exists(temp_file):
            os.unlink(temp_file)

def run_cosmos_command(command):
    """
    Run a shell command in the Cosmos environment.
    
    Args:
        command: Shell command to execute
    """
    full_cmd = f"export MAMBA_ROOT_PREFIX=/content/micromamba && /usr/local/bin/micromamba run -n cosmos310 {command}"
    !{full_cmd}

# Test the helper
print("Testing helper function...\n")
test_code = """
import sys
print(f"✅ Running in Python {sys.version_info.major}.{sys.version_info.minor}")
print("Hello from Cosmos environment!")
"""
run_cosmos(test_code)

## Step 8: Download Model Checkpoints

In [None]:
# Download models using the Cosmos environment
download_code = f"""
from huggingface_hub import snapshot_download
import torch

# Detect GPU and select model size
if torch.cuda.is_available():
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    gpu_name = torch.cuda.get_device_name(0)
    print(f"GPU: {{gpu_name}} ({{gpu_memory:.1f}} GB)")
else:
    gpu_memory = 16
    print("No GPU detected, using defaults")

# Select model size
if gpu_memory >= 40:
    MODEL_SIZE = "14B"
elif gpu_memory >= 24:
    MODEL_SIZE = "5B"
else:
    MODEL_SIZE = "2B"

print(f"\nSelected Cosmos-{{MODEL_SIZE}} model")
print("Downloading checkpoint (this may take 2-5 minutes)...\n")

# Download
checkpoint_dir = snapshot_download(
    repo_id=f"nvidia/Cosmos-Predict2-{{MODEL_SIZE}}-Video2World",
    cache_dir="/content/cosmos_checkpoints",
    resume_download=True
)

print(f"\n✅ Checkpoint downloaded to: {{checkpoint_dir}}")

# Save config for later use
with open('/content/cosmos_config.txt', 'w') as f:
    f.write(f"{{MODEL_SIZE}},{{checkpoint_dir}}")
"""

print("📥 Downloading Cosmos model checkpoint...\n")
run_cosmos(download_code)

# Read the config
if os.path.exists('/content/cosmos_config.txt'):
    with open('/content/cosmos_config.txt', 'r') as f:
        MODEL_SIZE, checkpoint_dir = f.read().strip().split(',')
    print(f"\nUsing Cosmos-{MODEL_SIZE} model")

## Step 9: Generate a Test Video

In [None]:
# Complete generation script
generation_script = f"""
import os
import torch
import numpy as np
from PIL import Image
from transformers import T5EncoderModel, T5Tokenizer
from cosmos_predict2.inference import (
    Video2WorldPipeline,
    get_cosmos_predict2_video2world_pipeline,
)
from einops import rearrange
import imageio

print("🚀 Starting Cosmos-Predict2 generation...\n")

# Read config
with open('/content/cosmos_config.txt', 'r') as f:
    MODEL_SIZE, checkpoint_dir = f.read().strip().split(',')

print(f"Using Cosmos-{{MODEL_SIZE}} model")

# Load T5 text encoder
print("Loading T5 text encoder...")
t5_model = "google/flan-t5-base"  # Small model for testing
tokenizer = T5Tokenizer.from_pretrained(t5_model)
text_encoder = T5EncoderModel.from_pretrained(t5_model).half().to("cuda")
text_encoder.eval()
print("✅ T5 loaded\n")

# Load Cosmos pipeline
print("Loading Cosmos pipeline...")
config = get_cosmos_predict2_video2world_pipeline(model_size=MODEL_SIZE)

# Find model file
import glob
model_files = glob.glob(f"{{checkpoint_dir}}/**/*.pt", recursive=True)
if model_files:
    config['dit_checkpoint_path'] = model_files[0]
    print(f"Using model: {{os.path.basename(model_files[0])}}")

cosmos_pipe = Video2WorldPipeline.from_config(config)
cosmos_pipe = cosmos_pipe.to("cuda")
cosmos_pipe.eval()
print("✅ Cosmos pipeline loaded\n")

# Create test input image
print("Creating test input...")
img = np.ones((720, 1280, 3), dtype=np.uint8) * 100
img[200:520, 400:880, :] = [200, 150, 100]  # Add colored rectangle
img[300:400, 500:700, :] = [50, 100, 200]   # Add another shape
test_image = Image.fromarray(img)
test_image.save('/content/test_input.jpg')
print("✅ Test input created\n")

# Encode text prompt
prompt = "A robotic arm moves smoothly across the table, picking up objects"
print(f"Prompt: {{prompt}}\n")

inputs = tokenizer(prompt, return_tensors="pt", max_length=77,
                  padding="max_length", truncation=True).to("cuda")
with torch.no_grad():
    text_embeddings = text_encoder(**inputs).last_hidden_state

# Prepare input image
frames = np.array(test_image)[np.newaxis, ...]
frames_tensor = torch.from_numpy(frames).float() / 255.0
frames_tensor = rearrange(frames_tensor, "t h w c -> 1 c t h w").to("cuda")

# Generate video
num_frames = 16  # Start with 16 frames for quick test
print(f"Generating {{num_frames}} frames...")

with torch.no_grad():
    with torch.cuda.amp.autocast():
        output = cosmos_pipe(
            frames_tensor,
            text_embeddings,
            num_frames=num_frames,
            fps=8,
            seed=42
        )

print("✅ Video generated!\n")

# Convert and save video
if isinstance(output, torch.Tensor):
    video = output.cpu().numpy()
else:
    video = output

# Fix dimensions
if video.ndim == 5:
    video = video[0]
if video.shape[0] == 3:
    video = np.transpose(video, (1, 2, 3, 0))
if video.max() <= 1.0:
    video = (video * 255).astype(np.uint8)

# Save video
output_path = '/content/cosmos_output.mp4'
writer = imageio.get_writer(output_path, fps=8)
for frame in video:
    writer.append_data(frame)
writer.close()

print(f"✅ Video saved to: {{output_path}}")
print(f"Generated {{len(video)}} frames at resolution {{video[0].shape[:2]}}")
print(f"\nMemory used: {{torch.cuda.memory_allocated()/1024**3:.2f}} GB")
"""

print("🎬 Generating test video with Cosmos-Predict2...\n")
print("="*60)
run_cosmos(generation_script)
print("="*60)

## Step 10: Display Generated Video

In [None]:
import os
from IPython.display import HTML, Image, display
import base64

# Display input image
if os.path.exists('/content/test_input.jpg'):
    print("📸 Input image:")
    display(Image('/content/test_input.jpg', width=400))
    print()

# Display generated video
if os.path.exists('/content/cosmos_output.mp4'):
    print("🎥 Generated video:")
    
    with open('/content/cosmos_output.mp4', 'rb') as f:
        video_data = f.read()
    
    encoded = base64.b64encode(video_data).decode('ascii')
    
    display(HTML(f'''
    <video width="640" height="360" controls autoplay loop>
        <source src="data:video/mp4;base64,{encoded}" type="video/mp4">
        Your browser does not support the video tag.
    </video>
    '''))
    
    print("\n🎉 Success! Cosmos-Predict2 is working correctly!")
    
    # Download button
    from google.colab import files
    download = input("\nDownload the video? (y/n): ")
    if download.lower() == 'y':
        files.download('/content/cosmos_output.mp4')
else:
    print("❌ Video file not found. Check the output from the previous cell.")

## Step 11: Save Results to Google Drive

Save your generated videos to Google Drive to prevent data loss. This is separate from mounting to avoid interrupting your workflow.

In [None]:
import os
import shutil
from datetime import datetime

# Check if Drive was mounted earlier
drive_output_dir = os.environ.get('DRIVE_OUTPUT_DIR', None)

if drive_output_dir and os.path.exists(drive_output_dir):
    print(f"📁 Saving to Drive: {drive_output_dir}\n")
    
    saved_files = []
    
    # Save generated video
    if os.path.exists('/content/cosmos_output.mp4'):
        output_name = f"cosmos_output_{datetime.now().strftime('%H%M%S')}.mp4"
        drive_path = os.path.join(drive_output_dir, output_name)
        shutil.copy2('/content/cosmos_output.mp4', drive_path)
        saved_files.append(output_name)
        print(f"✅ Saved video: {output_name}")
    
    # Save input image
    if os.path.exists('/content/test_input.jpg'):
        input_name = f"test_input_{datetime.now().strftime('%H%M%S')}.jpg"
        drive_path = os.path.join(drive_output_dir, input_name)
        shutil.copy2('/content/test_input.jpg', drive_path)
        saved_files.append(input_name)
        print(f"✅ Saved input: {input_name}")
    
    # Save generation metadata
    metadata_file = os.path.join(drive_output_dir, f"generation_{datetime.now().strftime('%H%M%S')}.txt")
    with open(metadata_file, 'w') as f:
        f.write(f"Generation completed: {datetime.now()}\n")
        f.write(f"Model: Cosmos-{MODEL_SIZE if 'MODEL_SIZE' in locals() else 'Unknown'}\n")
        f.write(f"Prompt: A robotic arm moves smoothly across the table, picking up objects\n")
        f.write(f"Frames: 16\n")
        f.write(f"FPS: 8\n")
        f.write(f"\nFiles saved:\n")
        for file in saved_files:
            f.write(f"  - {file}\n")
    
    print(f"✅ Saved metadata: {os.path.basename(metadata_file)}")
    
    print(f"\n💾 All files backed up to Google Drive!")
    print(f"📍 Location: {drive_output_dir}")
    print("\n✨ Your work is safe even if the session disconnects!")
    
elif drive_output_dir:
    print(f"⚠️ Drive directory not found: {drive_output_dir}")
    print("Files may not have been saved to Drive")
else:
    print("⚠️ Google Drive was not mounted earlier")
    print("Files are only available in this session and will be lost if it disconnects!")
    print("\nTo save files now:")
    print("1. Run Step 2 (Mount Google Drive)")
    print("2. Run this cell again")

## 🎉 Complete! 

You now have Cosmos-Predict2 running on Google Colab with Python 3.10!

### How to Use:

1. **Run any Cosmos code** using the helper function:
```python
code = """
# Your Cosmos-Predict2 code here
from cosmos_predict2.inference import Video2WorldPipeline
# ...
"""
run_cosmos(code)
```

2. **Run commands** in the environment:
```python
run_cosmos_command("pip list")  # List installed packages
run_cosmos_command("python script.py")  # Run a script
```

### Tips:
- Save outputs to `/content/` to access from the notebook
- Mount Google Drive to preserve outputs
- Increase `num_frames` for longer videos (16, 61, 121)
- Use different prompts for various motions

### GPU Memory Guide:
| GPU | Memory | Model | Max Frames |
|-----|--------|-------|------------|
| A100 | 40GB | 14B | 121 frames |
| V100 | 16GB | 2B | 30-60 frames |
| T4 | 16GB | 2B | 16-30 frames |

### Troubleshooting:
- **Import errors**: Check Step 5 verification output
- **OOM errors**: Reduce `num_frames` or use smaller model
- **No GPU**: Enable GPU in Runtime settings
- **Slow download**: Models are large (2-10GB), be patient

### What's Next:
1. Try different prompts for various robot motions
2. Upload your own input images
3. Generate longer videos with more frames
4. Fine-tune on your own data

In [None]:
# Custom generation with auto-save
def generate_and_save(prompt, num_frames=16, fps=8, save_to_drive=True):
    """
    Generate a video with custom prompt and automatically save to Drive.
    """
    generation_code = f'''
import os
import torch
import numpy as np
from PIL import Image
from transformers import T5EncoderModel, T5Tokenizer
from cosmos_predict2.inference import (
    Video2WorldPipeline,
    get_cosmos_predict2_video2world_pipeline,
)
from einops import rearrange
import imageio
from datetime import datetime

# Read config
with open('/content/cosmos_config.txt', 'r') as f:
    MODEL_SIZE, checkpoint_dir = f.read().strip().split(',')

# Load models (reuse if already loaded)
print("Loading models...")
t5_model = "google/flan-t5-base"
tokenizer = T5Tokenizer.from_pretrained(t5_model)
text_encoder = T5EncoderModel.from_pretrained(t5_model).half().to("cuda")
text_encoder.eval()

config = get_cosmos_predict2_video2world_pipeline(model_size=MODEL_SIZE)
import glob
model_files = glob.glob(f"{{checkpoint_dir}}/**/*.pt", recursive=True)
if model_files:
    config['dit_checkpoint_path'] = model_files[0]

cosmos_pipe = Video2WorldPipeline.from_config(config)
cosmos_pipe = cosmos_pipe.to("cuda")
cosmos_pipe.eval()
print("✅ Models loaded")

# Generate with custom prompt
prompt = """{prompt}"""
print(f"\\nPrompt: {{prompt}}")
print(f"Settings: {{num_frames}} frames at {{fps}} fps\\n")

# Use existing input or create new one
if os.path.exists('/content/test_input.jpg'):
    test_image = Image.open('/content/test_input.jpg')
else:
    img = np.ones((720, 1280, 3), dtype=np.uint8) * 100
    img[200:520, 400:880, :] = [200, 150, 100]
    test_image = Image.fromarray(img)
    test_image.save('/content/test_input.jpg')

# Encode prompt
inputs = tokenizer(prompt, return_tensors="pt", max_length=77,
                  padding="max_length", truncation=True).to("cuda")
with torch.no_grad():
    text_embeddings = text_encoder(**inputs).last_hidden_state

# Prepare input
frames = np.array(test_image)[np.newaxis, ...]
frames_tensor = torch.from_numpy(frames).float() / 255.0
frames_tensor = rearrange(frames_tensor, "t h w c -> 1 c t h w").to("cuda")

# Generate
print("Generating video...")
with torch.no_grad():
    with torch.cuda.amp.autocast():
        output = cosmos_pipe(
            frames_tensor,
            text_embeddings,
            num_frames={num_frames},
            fps={fps},
            seed=42
        )

# Convert and save
if isinstance(output, torch.Tensor):
    video = output.cpu().numpy()
else:
    video = output

if video.ndim == 5:
    video = video[0]
if video.shape[0] == 3:
    video = np.transpose(video, (1, 2, 3, 0))
if video.max() <= 1.0:
    video = (video * 255).astype(np.uint8)

# Save with timestamp
timestamp = datetime.now().strftime("%H%M%S")
output_path = f'/content/cosmos_custom_{{timestamp}}.mp4'
writer = imageio.get_writer(output_path, fps={fps})
for frame in video:
    writer.append_data(frame)
writer.close()

print(f"\\n✅ Video saved: {{output_path}}")
print(f"Generated {{len(video)}} frames")

# Save prompt info
with open(f'/content/prompt_{{timestamp}}.txt', 'w') as f:
    f.write(prompt)
'''
    
    # Run generation
    print(f"🎬 Generating video for: \"{prompt[:50]}...\"")
    run_cosmos(generation_code)
    
    # Auto-save to Drive if enabled
    if save_to_drive and os.environ.get('DRIVE_OUTPUT_DIR'):
        drive_dir = os.environ['DRIVE_OUTPUT_DIR']
        timestamp = datetime.now().strftime("%H%M%S")
        
        # Find the latest generated file
        import glob
        latest_video = max(glob.glob('/content/cosmos_custom_*.mp4'), 
                          key=os.path.getctime, default=None)
        
        if latest_video:
            shutil.copy2(latest_video, f"{drive_dir}/custom_{timestamp}.mp4")
            print(f"☁️ Saved to Drive: custom_{timestamp}.mp4")

# Example usage
custom_prompts = [
    "A robotic gripper carefully picks up a delicate glass object",
    "Industrial robot arm performs precise welding operations",
    "Humanoid robot hand manipulates small electronic components",
]

print("Try generating with different prompts:")
print("-" * 60)
for i, p in enumerate(custom_prompts, 1):
    print(f"{i}. {p}")
print("-" * 60)
print("\nExample usage:")
print('generate_and_save("Your custom prompt here", num_frames=16, fps=8)')

## Step 12: Generate More Videos (Optional)

Now that everything is set up, you can generate more videos with different prompts or settings.