<a href="https://colab.research.google.com/github/jackdaus/egolifter/blob/colab/1colab-demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Get Data Access Link

Before you can download ADT data, you need to request from Aria the `ADT_download_urls.json`. This is an insant process. Follow the [directions in the original egolifter github](https://github.com/facebookresearch/egolifter?tab=readme-ov-file#download-and-pre-processing).

In [2]:
# IMPORTANT: Make sure you have uploaded your file `ADT_download_urls.json`!
# See original repo for instructions on how to get that.
import os

file_path = "/content/ADT_download_urls.json"

if os.path.exists(file_path):
  print("File found!", file_path)
else:
  print("Error: File ADT_download_urls.json not found!", file_path)
  print("Please upload the ADT_download_urls.json file to the /content directory.")
  print("See the original EgoLifter repository for how to get this download link.")
  raise FileNotFoundError("ADT_download_urls.json not found")

File found! /content/ADT_download_urls.json


# Install Dependencies

In [3]:
# Install uv package manager
!pip install uv

# Verify install
!uv --version

Collecting uv
  Downloading uv-0.6.16-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.6.16-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.3/17.3 MB[0m [31m118.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: uv
Successfully installed uv-0.6.16
uv 0.6.16


In [4]:
# Clone the repo (jackdaus version has uv support)
!git clone https://github.com/jackdaus/egolifter.git
%cd egolifter
# Checkout the colab branch. I'm developing compatibility with colab on that branch.
!git checkout colab

Cloning into 'egolifter'...
remote: Enumerating objects: 245, done.[K
remote: Counting objects: 100% (228/228), done.[K
remote: Compressing objects: 100% (174/174), done.[K
remote: Total 245 (delta 84), reused 182 (delta 51), pack-reused 17 (from 1)[K
Receiving objects: 100% (245/245), 4.79 MiB | 21.88 MiB/s, done.
Resolving deltas: 100% (85/85), done.
/content/egolifter
Branch 'colab' set up to track remote branch 'colab' from 'origin'.
Switched to a new branch 'colab'


In [5]:
# Install packages. This might take a couple of minutes.
!uv sync

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[2mplotly    [0m [32m--------------------------[2m----[0m[0m 12.07 MiB/14.12 MiB
[2mopencv-python[0m [32m------------------------------[2m[0m[0m 60.06 MiB/60.07 MiB
[2mbitsandbytes[0m [32m--------------------------[2m----[0m[0m 61.69 MiB/72.54 MiB
[2mnvidia-cusolver-cu12[0m [32m----------------[2m--------------[0m[0m 61.46 MiB/122.01 MiB
[2mnvidia-nccl-cu12[0m [32m-----------[2m-------------------[0m[0m 61.52 MiB/179.91 MiB
[2mnvidia-cusparse-cu12[0m [32m----------[2m--------------------[0m[0m 61.87 MiB/197.84 MiB
[2mtriton    [0m [32m----------[2m--------------------[0m[0m 62.54 MiB/199.76 MiB
[2mnvidia-cufft-cu12[0m [32m----------[2m--------------------[0m[0m 61.33 MiB/201.66 MiB
[2mnvidia-cublas-cu12[0m [32m------[2m------------------------[0m[0m 61.64 MiB/346.60 MiB
[2mopen3d    [0m [32m-----[2m-------------------------[0m[0m 62.37 MiB/426.94 MiB
[2mnvidia-c

In [6]:
# Note that in order to use the uv virtual environment, we must activate it within
# the cell. (Colab was giving me trouble creating a uv based kernel...)

# To illustrate, notice that the output of the two commands shows different locations
# of the python exectuable in use.
!source .venv/bin/activate; which python

!which python

/content/egolifter/.venv/bin/python
/usr/local/bin/python


In [7]:
# Unfortunately, we must do something convoluted. We use Jupyter notebook magic
# to create a bash script. We then activate the venv. We then run some python code.
# We should see PyTorch version 2.5.1.
%%bash
source .venv/bin/activate

python - <<'PY'
import torch
import torchvision
print("PyTorch version:", torch.__version__)
print("Torchvision version:", torchvision.__version__)
print("CUDA is available:", torch.cuda.is_available())
PY

PyTorch version: 2.5.1+cu124
Torchvision version: 0.20.1+cu124
CUDA is available: True


In [8]:
# Set environment variables (adpated from setup_env.bash in original egolifter)
import os
os.environ.update({
    "EGOLIFTER_PATH":                 "/content/egolifter",
    "GSA_PATH":                       "/content/egolifter/Grounded-Segment-Anything",
    "SAM_CHECKPOINT_PATH":            "/content/egolifter/Grounded-Segment-Anything/sam_vit_h_4b8939.pth",
    "GROUNDING_DINO_CHECKPOINT_PATH": "/content/egolifter/Grounded-Segment-Anything/groundingdino_swint_ogc.pth",
    "SAM_ENCODER_VERSION":            "vit_h",
    "GROUNDING_DINO_CONFIG_PATH":     "/content/egolifter/Grounded-Segment-Anything/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py",
    "EFFICIENTSAM_PATH":              "/content/egolifter/Grounded-Segment-Anything/EfficientSAM",
    "TAG2TEXT_PATH":                  "/content/egolifter/Grounded-Segment-Anything/Tag2Text",
    "ADT_DATA_ROOT":                  "adt",
    "ADT_PROCESSED_ROOT":             "adt_processed",
    "AM_I_DOCKER":                    "False",
    "BUILD_WITH_CUDA":                "True",
    "TAG2TEXT_CHECKPOINT_PATH":       "/content/egolifter/Grounded-Segment-Anything/Tag2Text/tag2text_swin_14m.pth",
    "RAM_CHECKPOINT_PATH":            "/content/egolifter/Grounded-Segment-Anything/Tag2Text/ram_swin_large_14m.pth",
})

In [9]:
# Set up Grounded-Segment-Anything
!git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git

Cloning into 'Grounded-Segment-Anything'...
remote: Enumerating objects: 1807, done.[K
remote: Counting objects: 100% (18/18), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 1807 (delta 9), reused 1 (delta 1), pack-reused 1789 (from 2)[K
Receiving objects: 100% (1807/1807), 155.84 MiB | 47.48 MiB/s, done.
Resolving deltas: 100% (830/830), done.


In [10]:
# Install SAM
!uv add Grounded-Segment-Anything/segment_anything

[2K[2mResolved [1m247 packages[0m [2min 4.09s[0m[0m
[2K   [36m[1mBuilding[0m[39m segment-anything[2m @ file:///content/egolifter/Grounded-Segment-Anything/s
[2K[2A   [36m[1mBuilding[0m[39m segment-anything[2m @ file:///content/egolifter/Grounded-Segment-Anything/s
[2K[2A   [36m[1mBuilding[0m[39m segment-anything[2m @ file:///content/egolifter/Grounded-Segment-Anything/s
[2K[2A   [36m[1mBuilding[0m[39m segment-anything[2m @ file:///content/egolifter/Grounded-Segment-Anything/s
[2K[2A      [32m[1mBuilt[0m[39m segment-anything[2m @ file:///content/egolifter/Grounded-Segment-Anything/s
[2K[2mPrepared [1m1 package[0m [2min 613ms[0m[0m
[2K[2mInstalled [1m1 package[0m [2min 1ms[0m[0m
 [32m+[39m [1msegment-anything[0m[2m==1.0 (from file:///content/egolifter/Grounded-Segment-Anything/segment_anything)[0m


In [11]:
# Make quick-and-dirty fix for issue with GroundingDino not working in Colab
# (due to an early 2025 up stream dependency change).
# See: https://github.com/IDEA-Research/Grounded-Segment-Anything/issues/550
%cd /content/egolifter/Grounded-Segment-Anything/GroundingDINO/groundingdino/models/GroundingDINO/csrc/MsDeformAttn
!sed -i 's/value.type()/value.scalar_type()/g' ms_deform_attn_cuda.cu
!sed -i 's/value.scalar_type().is_cuda()/value.is_cuda()/g' ms_deform_attn_cuda.cu

/content/egolifter/Grounded-Segment-Anything/GroundingDINO/groundingdino/models/GroundingDINO/csrc/MsDeformAttn


In [12]:
# Change back to project directory
%cd ~/../content/egolifter

/content/egolifter


In [13]:
!uv add Grounded-Segment-Anything/GroundingDINO

[2K[2mResolved [1m253 packages[0m [2min 694ms[0m[0m
[2K   [36m[1mBuilding[0m[39m groundingdino[2m @ file:///content/egolifter/Grounded-Segment-Anything/Grou
[2K[2A   [36m[1mBuilding[0m[39m groundingdino[2m @ file:///content/egolifter/Grounded-Segment-Anything/Grou
[37m⠙[0m [2mPreparing packages...[0m (0/6)
[2K[3A   [36m[1mBuilding[0m[39m groundingdino[2m @ file:///content/egolifter/Grounded-Segment-Anything/Grou
[37m⠙[0m [2mPreparing packages...[0m (0/6)
[2K[3A   [36m[1mBuilding[0m[39m groundingdino[2m @ file:///content/egolifter/Grounded-Segment-Anything/Grou
[37m⠙[0m [2mPreparing packages...[0m (0/6)
[2K[3A   [36m[1mBuilding[0m[39m groundingdino[2m @ file:///content/egolifter/Grounded-Segment-Anything/Grou
[37m⠙[0m [2mPreparing packages...[0m (0/6)
[2K[3A   [36m[1mBuilding[0m[39m groundingdino[2m @ file:///content/egolifter/Grounded-Segment-Anything/Grou
[37m⠙[0m [2mPreparing packages...[0m (0/6)
[2K[3A   [36m[1m

In [14]:
# This was also included in the egolift installs
!uv add diffusers[torch]

[2K[2mResolved [1m255 packages[0m [2min 785ms[0m[0m
[2K[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2maccelerate[0m [32m----------------[2m--------------[0m[0m 174.91 KiB/346.43 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/2)
[2maccelerate[0m [32m-----------------[2m-------------[0m[0m 190.91 KiB/346.43 K

In [15]:
# Download model weights
%cd Grounded-Segment-Anything/
!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
!wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
%cd ..

/content/egolifter/Grounded-Segment-Anything
--2025-04-24 00:24:25--  https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 3.167.152.123, 3.167.152.97, 3.167.152.77, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|3.167.152.123|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2564550879 (2.4G) [binary/octet-stream]
Saving to: ‘sam_vit_h_4b8939.pth’


2025-04-24 00:24:36 (216 MB/s) - ‘sam_vit_h_4b8939.pth’ saved [2564550879/2564550879]

--2025-04-24 00:24:36--  https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/611591640/f221e500-c2fc-4fd3-b84e-8ad92a6923f3?X-Amz

# Download Data

First, upload your `ADT_download_urls.json` file. Then, run the code below.

## Prepare to download data

In [16]:
# Sanity check to see if environment variables make their way into the uv venv
%%bash
source .venv/bin/activate

python - <<'PY'
import os
print({os.environ['ADT_DATA_ROOT']})
PY

{'adt'}


In [18]:
%%bash
source .venv/bin/activate

python - <<'PY'

import os
import shutil

# Define data directories
ADT_DATA_ROOT = os.environ['ADT_DATA_ROOT']
ADT_PROCESSED_ROOT = os.environ['ADT_PROCESSED_ROOT']

# Create directories if they don't exist
os.makedirs(ADT_DATA_ROOT, exist_ok=True)
os.makedirs(ADT_PROCESSED_ROOT, exist_ok=True)

# Copy the download URLs JSON file
source_path = "/content/ADT_download_urls.json"  # Update if needed
destination_path = os.path.join(ADT_DATA_ROOT, "ADT_download_urls.json")
shutil.copy(source_path, destination_path)

PY

In [19]:
# Copy the vignette images to the dataset
!cp assets/vignette_imx577.png ${ADT_DATA_ROOT} # Vignette image for the RGB camera
!cp assets/vignette_ov7251.png ${ADT_DATA_ROOT} # Vignette image for the SLAM camera

In [20]:
# Move the vignette files
import shutil

# Define the destination paths within the data directory
vignette_rgb_destination = os.path.join(os.environ['ADT_DATA_ROOT'], "vignette_imx577.png")
vignette_slam_destination = os.path.join(os.environ['ADT_DATA_ROOT'], "vignette_ov7251.png")

# Copy the vignette images
shutil.copy("assets/vignette_imx577.png", vignette_rgb_destination)
shutil.copy("assets/vignette_ov7251.png", vignette_slam_destination)

'adt/vignette_ov7251.png'

## Actually download data

In [21]:
# Define the scene names in a Python list. For now, we just have one sample scene.
scene_names_new = ["Apartment_release_golden_skeleton_seq100_10s_sample_M1292"]
scene_names     = ["Apartment_release_golden_skeleton_seq100_10s_sample"]

# Loop through the scene names and execute the download command for each
for scene_name in scene_names_new:
    !uvx --from projectaria-tools aria_dataset_downloader \
        -c adt/ADT_download_urls.json \
        -o adt/ \
        -d 0 1 2 3 6 7 \
        -l {scene_name}

[2K[37m⠙[0m [2mPreparing packages...[0m (0/5)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/5)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/5)
[2mcharset-normalizer[0m [32m----[2m--------------------------[0m[0m 14.90 KiB/140.49 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/5)
[2mcharset-normalizer[0m [32m----[2m--------------------------[0m[0m 14.90 KiB/140.49 KiB
[2mnumpy     [0m [32m[2m------------------------------[0m[0m     0 B/15.67 MiB
[2K[3A[37m⠙[0m [2mPreparing packages...[0m (0/5)
[2mcharset-normalizer[0m [32m----[2m--------------------------[0m[0m 14.90 KiB/140.49 KiB
[2mpillow    [0m [32m[2m------------------------------[0m[0m     0 B/4.39 MiB
[2mnumpy     [0m [32m[2m------------------------------[0m[0m     0 B/15.67 MiB
[2K[4A[37m⠙[0m [2mPreparing packages...[0m (0/5)
[2mcharset-normalizer[0m [32m----[2m--------------------------[0m[0m 14.90 KiB/140.49 KiB
[2mpillow    [0m [32m[2m--------

# Process Data (Part 1)

In [22]:
# Process the images to be in a format appropriate for 3dgs
for scene_name in scene_names:
  !uv run python scripts/process_adt_3dgs.py \
    --data_root adt \
    --output_root adt_processed \
    --sequence_name {scene_name}

Processing sequence adt/Apartment_release_golden_skeleton_seq100_10s_sample_M1292
[0m[38;2;000;000;255m[ProgressLogger][INFO]: 2025-04-24 00:26:18: Opening adt/Apartment_release_golden_skeleton_seq100_10s_sample_M1292/video.vrs...[0m
[0m[38;2;000;128;000m[MultiRecordFileReader][DEBUG]: Opened file 'adt/Apartment_release_golden_skeleton_seq100_10s_sample_M1292/video.vrs' and assigned to reader #0[0m
[0m[38;2;000;000;255m[VrsDataProvider][INFO]: streamId 211-1/camera-et activated[0m
[0m[38;2;000;000;255m[VrsDataProvider][INFO]: streamId 214-1/camera-rgb activated[0m
[0m[38;2;000;000;255m[VrsDataProvider][INFO]: streamId 247-1/baro0 activated[0m
[0m[38;2;000;000;255m[VrsDataProvider][INFO]: Timecode stream found: 285-2[0m
[0m[38;2;000;000;255m[VrsDataProvider][INFO]: streamId 1201-1/camera-slam-left activated[0m
[0m[38;2;000;000;255m[VrsDataProvider][INFO]: streamId 1201-2/camera-slam-right activated[0m
[0m[38;2;000;000;255m[VrsDataProvider][INFO]: streamId 1202-

# Train on Vanilla 3DGS Pipeline

This is a first test of training on the vanilla 3DGS pipeline.

In [23]:
# Run the vanilla 3dgs pipeline.
# It will ask if you want to log into wandb to visualize training progress/logs
# Run the code. In Colab, we must limit number of worker threads to 2.
# Set this to True to run the sample. We won't this by default. But this can
# be a good first test before moving on to the more complicated stuff below.
run_vanilla_3dgs_sample = False

if  run_vanilla_3dgs_sample:
  !uv run python train_lightning.py \
    scene.scene_name=Apartment_release_golden_skeleton_seq100_10s_sample \
    scene.data_root=$ADT_PROCESSED_ROOT \
    exp_name=3dgs \
    output_root=./output/adt \
    wandb.project=egolifter_adt \
    scene.num_workers=2

# Process Data (Part 2)

## Segmentation

In [None]:
# Generate the SAM segmentation results. This takes about 20 to 30 minutes.
!uv run python scripts/generate_gsa_results.py \
  -i adt_processed/Apartment_release_golden_skeleton_seq100_10s_sample \
  --class_set none \
  --sam_variant sam \
  --max_longer_side 512 \
  --no_clip

open_clip_model.safetensors:  59% 2.33G/3.94G [00:09<00:06, 248MB/s]

## Generate evaluation target for query-based segmentation

### generate_2dseg_query

In [None]:
%%bash
source .venv/bin/activate

SCENE_NAME="Apartment_release_golden_skeleton_seq100_10s_sample"

uv run python scripts/generate_2dseg_query.py \
  --data_root $ADT_PROCESSED_ROOT \
  --scene_name $SCENE_NAME

### generate_2dseg_query_sample

In [None]:
%%bash
source .venv/bin/activate

SCENE_NAME="Apartment_release_golden_skeleton_seq100_10s_sample"

uv run python scripts/generate_2dseg_query_sample.py \
  --data_root $ADT_PROCESSED_ROOT \
  --scene_name $SCENE_NAME

### generate_3dbox_query

In [None]:
%%bash
source .venv/bin/activate

SCENE_NAME="Apartment_release_golden_skeleton_seq100_10s_sample"

uv run python scripts/generate_3dbox_query.py \
  --raw_root $ADT_DATA_ROOT \
  --data_root $ADT_PROCESSED_ROOT \
  --scene_name $SCENE_NAME

# EgoLifter (full method)

In [None]:
# EgoLifter (full method). This takes about 5 hours on the current example data.
!uv run python train_lightning.py \
    scene.scene_name="Apartment_release_golden_skeleton_seq100_10s_sample" \
    scene.data_root=$ADT_PROCESSED_ROOT \
    model=unc_2d_unet \
    model.unet_acti=sigmoid \
    model.dim_extra=16 \
    lift.use_contr=True \
    exp_name=egolifter \
    output_root=./output/adt \
    wandb.project=egolifter_adt \
    scene.num_workers=2