# EgoX exo-to-ego (Colab) using local EgoX-main copy

This notebook assumes you copied the EgoX repo into your Drive as `EgoX-main`.
It will copy that folder into Colab, install dependencies, download checkpoints, prepare data, and run inference.

**Prereqs you must provide**:
- A Drive folder with `EgoX-main/` (the repo you copied locally).
- Your input video at `data/raw/exo.mp4` inside your Drive workspace.
- Depth maps, camera intrinsics, and ego camera extrinsics from EgoPriorRenderer.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

WORKDIR = '/content/drive/MyDrive/exo_to_ego'
REPO_SRC = f'{WORKDIR}/EgoX-main'
REPO_DST = '/content/EgoX'
DATASET_DIR = f'{WORKDIR}/datasets/custom_exo'
TAKE_NAME = 'take_001'
INPUT_VIDEO = f'{WORKDIR}/data/raw/exo.mp4'
OUTPUT_DIR = f'{WORKDIR}/outputs/egox'
CHECKPOINT_DIR = f'{WORKDIR}/checkpoints'

print('WORKDIR:', WORKDIR)
print('REPO_SRC:', REPO_SRC)
print('INPUT_VIDEO:', INPUT_VIDEO)
print('DATASET_DIR:', DATASET_DIR)
print('OUTPUT_DIR:', OUTPUT_DIR)
print('CHECKPOINT_DIR:', CHECKPOINT_DIR)


In [None]:
# Copy your local EgoX-main into Colab
!rm -rf {REPO_DST}
!cp -R {REPO_SRC} {REPO_DST}
%cd /content/EgoX
!ls -la


## Install dependencies
EgoX expects Python 3.10 + CUDA 12.1. Colab usually already has CUDA, but we install PyTorch explicitly.


In [None]:
# Install PyTorch with CUDA 12.1
!pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install EgoX deps
!pip install -r requirements.txt


## Download checkpoints
This pulls the Wan2.1 pretrained model and the EgoX LoRA weights.


In [None]:
!mkdir -p {CHECKPOINT_DIR}
!pip install huggingface_hub

# Wan2.1 pretrained model
!python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id='Wan-AI/Wan2.1-I2V-14B-480P-Diffusers',
    local_dir='./checkpoints/pretrained_model/Wan2.1-I2V-14B-480P-Diffusers'
)
PY

# EgoX LoRA weights
!python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id='DAVIAN-Robotics/EgoX',
    local_dir='./checkpoints/EgoX',
    allow_patterns='*.safetensors'
)
PY


## Prepare dataset folder
EgoX expects videos and depth maps in a specific structure. We create the folder and place your exo video there.
It also expects 784x448 resolution and 49 frames. You should preprocess your video before inference.


In [None]:
import os
os.makedirs(f'{DATASET_DIR}/videos/{TAKE_NAME}', exist_ok=True)
!cp {INPUT_VIDEO} {DATASET_DIR}/videos/{TAKE_NAME}/exo_raw.mp4

# Resize + trim to 49 frames at 784x448 (30 fps). Adjust as needed.
!ffmpeg -y -i {DATASET_DIR}/videos/{TAKE_NAME}/exo_raw.mp4 -vf 'scale=784:448,fps=30' -frames:v 49 {DATASET_DIR}/videos/{TAKE_NAME}/exo.mp4

!ls -la {DATASET_DIR}/videos/{TAKE_NAME}


## Create meta.json
This initializes a meta file with default camera extrinsics and ego intrinsics.
You will later fill in camera intrinsics and ego camera extrinsics (from EgoPriorRenderer).


In [None]:
!python /content/EgoX/meta_init.py --folder_path {DATASET_DIR} --output_json {DATASET_DIR}/meta.json --overwrite
!sed -n '1,200p' {DATASET_DIR}/meta.json


## Generate captions (prompt)
EgoX uses `caption.py` to generate prompts with GPT-4o. You must set the API base URL and key in `caption.py`.
If you skip this, you can also write a custom prompt manually in a file.


In [None]:
# Edit caption.py to set YOUR_BASE_URL and YOUR_API_KEY
!sed -n '1,60p' /content/EgoX/caption.py


In [None]:
# Run caption generation (requires 49 frame images; follow EgoPriorRenderer or your own extractor)
# This updates meta.json with prompt text
# !python /content/EgoX/caption.py --json_file {DATASET_DIR}/meta.json --output_json {DATASET_DIR}/meta.json --overwrite

# If you want a manual prompt instead, create a prompt file and skip caption.py:
# with open(f'{DATASET_DIR}/prompt.txt','w') as f: f.write('your prompt here')


## Generate depth maps + ego camera extrinsics
EgoX requires depth maps and ego camera extrinsics. Use EgoPriorRenderer for this step.
Follow the official instructions: https://github.com/kdh8156/EgoX-EgoPriorRenderer
After running it, your dataset should include:
- `depth_maps/{TAKE_NAME}/frame_000.npy` ... `frame_048.npy`
- `videos/{TAKE_NAME}/ego_Prior.mp4`
- Updated `meta.json` with camera_intrinsics and ego_extrinsics


## Build list files for inference
EgoX `infer.py` expects 3 list files: prompts, exo video paths, ego prior video paths.


In [None]:
import json
from pathlib import Path

meta_path = Path(f'{DATASET_DIR}/meta.json')
meta = json.loads(meta_path.read_text())
entry = meta['test_datasets'][0]

prompt_text = entry.get('prompt', '').strip()
if not prompt_text:
    prompt_text = 'A person moves naturally in a room. Generate a plausible egocentric view.'

prompt_list = f'{DATASET_DIR}/prompt_list.txt'
exo_list = f'{DATASET_DIR}/exo_list.txt'
ego_list = f'{DATASET_DIR}/ego_list.txt'

Path(prompt_list).write_text(prompt_text + '\n')
Path(exo_list).write_text(entry['exo_path'] + '\n')
Path(ego_list).write_text(entry['ego_prior_path'] + '\n')

print('Wrote:', prompt_list)
print('Wrote:', exo_list)
print('Wrote:', ego_list)


## Run inference
Make sure depth maps and meta.json are complete before running.


In [None]:
!mkdir -p {OUTPUT_DIR}

!python /content/EgoX/infer.py \
  --prompt {DATASET_DIR}/prompt_list.txt \
  --exo_video_path {DATASET_DIR}/exo_list.txt \
  --ego_prior_video_path {DATASET_DIR}/ego_list.txt \
  --meta_data_file {DATASET_DIR}/meta.json \
  --depth_root {DATASET_DIR}/depth_maps \
  --model_path /content/EgoX/checkpoints/pretrained_model/Wan2.1-I2V-14B-480P-Diffusers \
  --lora_path /content/EgoX/checkpoints/EgoX/pytorch_lora_weights.safetensors \
  --lora_rank 256 \
  --out {OUTPUT_DIR} \
  --seed 42 \
  --use_GGA \
  --cos_sim_scaling_factor 3.0

!ls -la {OUTPUT_DIR}
