iMED-PE baseline

This repository supports the iMED 2026 challenge subtask on pose estimation, part of EndoVis 2026 at MICCAI 2026 (Strasbourg, France).

[Challenge Website] [Participate] [Parent Challenge Hub]

Example sequences with pose overlays

Minimal baseline for iMED-PE trajectory estimation using ALIKED + LightGlue + essential matrix.

Dataset splits

Sequences are organized under train/ and test/ for convenience (local development, ablations, and reporting numbers on data with public ground truth). You are welcome to train on all released sequences (train + test combined) if that helps your method—the challenge maintains a separate held-out set (hidden_test/) that is not part of either split and is used for final evaluation.

Pose convention

Ground-truth pose.txt stores the trajectory of endoscope2/L relative to endoscope1/L as frame-to-initial transforms:

[ T_{\mathrm{rel}}(t) = T_0^{-1}, T(t) ]

The first frame is identity. This is not single-camera temporal VO on endoscope2/L alone: in in-vivo use, endoscope1 can move (sometimes not insignificantly due to physiological movements from the subject), so one-camera relative motion couples scope motion into a drifting “world.” The task is same-time cross-camera pose between endoscope1/L and endoscope2/L.

What it does

Loads sequences from train/ or test/.
Matches endoscope1/L and endoscope2/L at the same frame index.
Estimates cross-camera relative pose per frame (identity at frame 0).
Writes predictions in pose.txt format: frame_idx tx ty tz qx qy qz qw.

Install

cd <repo>
python -m pip install -r requirements.txt

Run baseline

python scripts/run_baseline.py \
  --data-root <data-root> \
  --split train \
  --output-root <pred-root> \
  --device cuda

Outputs:

<pred-root>/train/<sequence_name>/pose.txt
<pred-root>/test/<sequence_name>/pose.txt

Docker

From repo root (build needs network; run does not):

docker build -t imedpe:dev .
docker run --rm --gpus all --network=none --memory=20g \
  -v <data-root>/train:/input:ro -v /tmp/out:/output imedpe:dev

Participant submission image: see imedpe_submission/README.md.

Evaluate

We use the same metrics scripts as CLiMB for EndoVIS consistency. Huge thanks to the CLiMB team!

python scripts/evaluate_ate.py \
  --data-root <data-root> \
  --split train \
  --pred-root <pred-root>

Uses Horn Sim(3) alignment (Endomapper-style) on translations, then reports:

ATE: mean_ate, std_ate, median_ate (mm)
RPE at frame deltas 1, 10, 20, 40: translational (mm) and rotational (deg)
num_matched_poses, registered_pct

Optional JSON export: --json-out results.json

Example baseline results (train split)

Cross-camera ALIKED + LightGlue + essential matrix baseline on 61 train/ sequences (Horn Sim(3) alignment, same metrics as above):

Metric	Value
Mean ATE	2.18 mm
Mean of per-sequence median ATE	2.06 mm
Mean std ATE (per sequence)	1.13 mm
Registered frames	100%
RPE trans / rot, δ=1	1.17 mm / 6.05°
RPE trans / rot, δ=10	3.13 mm / 7.45°
RPE trans / rot, δ=20	3.07 mm / 7.70°
RPE trans / rot, δ=40	3.40 mm / 7.76°

These numbers are a reference point only; re-run run_baseline.py and evaluate_ate.py on your machine to reproduce.

Expected dataset layout

<data-root>/
  train/<sequence_name>/...
  test/<sequence_name>/...
  hidden_test/<sequence_name>/...   # held-out (not in train/test)

Each sequence directory contains:

pose.txt
K.txt
endoscope1/L/frame_XXXXXX.png
endoscope1/R/frame_XXXXXX.png
endoscope2/L/frame_XXXXXX.png
endoscope2/R/frame_XXXXXX.png

References

Baseline feature extraction and matching:

@article{Zhao2023ALIKED,
    title = {ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation},
    url = {https://arxiv.org/pdf/2304.03608.pdf},
    doi = {10.1109/TIM.2023.3271000},
    journal = {IEEE Transactions on Instrumentation & Measurement},
    author = {Zhao, Xiaoming and Wu, Xingming and Chen, Weihai and Chen, Peter C. Y. and Xu, Qingsong and Li, Zhengguo},
    year = {2023},
    volume = {72},
    pages = {1-16},
}

@inproceedings{lindenberger2023lightglue,
  author    = {Philipp Lindenberger and
               Paul-Edouard Sarlin and
               Marc Pollefeys},
  title     = {{LightGlue: Local Feature Matching at Light Speed}},
  booktitle = {ICCV},
  year      = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
imcpe-orchestrator		imcpe-orchestrator
imedpe_submission		imedpe_submission
scripts		scripts
src/imcpe		src/imcpe
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
references.bib		references.bib
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iMED-PE baseline

Dataset splits

Pose convention

What it does

Install

Run baseline

Docker

Evaluate

Example baseline results (train split)

Expected dataset layout

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

iMED-PE baseline

Dataset splits

Pose convention

What it does

Install

Run baseline

Docker

Evaluate

Example baseline results (train split)

Expected dataset layout

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages