Skip to content

imed-challenge/imedpe

Repository files navigation

iMED-PE baseline

This repository supports the iMED 2026 challenge subtask on pose estimation, part of EndoVis 2026 at MICCAI 2026 (Strasbourg, France).

[Challenge Website] [Participate] [Parent Challenge Hub]

Example sequences with pose overlays

session_002 pig intestine zoom in session_004 scene 2 circular session_007 scene 3 zoom in

Minimal baseline for iMED-PE trajectory estimation using ALIKED + LightGlue + essential matrix.

Dataset splits

Sequences are organized under train/ and test/ for convenience (local development, ablations, and reporting numbers on data with public ground truth). You are welcome to train on all released sequences (train + test combined) if that helps your method—the challenge maintains a separate held-out set (hidden_test/) that is not part of either split and is used for final evaluation.

Pose convention

Ground-truth pose.txt stores the trajectory of endoscope2/L relative to endoscope1/L as frame-to-initial transforms:

[ T_{\mathrm{rel}}(t) = T_0^{-1}, T(t) ]

The first frame is identity. This is not single-camera temporal VO on endoscope2/L alone: in in-vivo use, endoscope1 can move (sometimes not insignificantly due to physiological movements from the subject), so one-camera relative motion couples scope motion into a drifting “world.” The task is same-time cross-camera pose between endoscope1/L and endoscope2/L.

What it does

  • Loads sequences from train/ or test/.
  • Matches endoscope1/L and endoscope2/L at the same frame index.
  • Estimates cross-camera relative pose per frame (identity at frame 0).
  • Writes predictions in pose.txt format: frame_idx tx ty tz qx qy qz qw.

Install

cd <repo>
python -m pip install -r requirements.txt

Run baseline

python scripts/run_baseline.py \
  --data-root <data-root> \
  --split train \
  --output-root <pred-root> \
  --device cuda

Outputs:

  • <pred-root>/train/<sequence_name>/pose.txt
  • <pred-root>/test/<sequence_name>/pose.txt

Docker

From repo root (build needs network; run does not):

docker build -t imedpe:dev .
docker run --rm --gpus all --network=none --memory=20g \
  -v <data-root>/train:/input:ro -v /tmp/out:/output imedpe:dev

Participant submission image: see imedpe_submission/README.md.

Evaluate

We use the same metrics scripts as CLiMB for EndoVIS consistency. Huge thanks to the CLiMB team!

python scripts/evaluate_ate.py \
  --data-root <data-root> \
  --split train \
  --pred-root <pred-root>

Uses Horn Sim(3) alignment (Endomapper-style) on translations, then reports:

  • ATE: mean_ate, std_ate, median_ate (mm)
  • RPE at frame deltas 1, 10, 20, 40: translational (mm) and rotational (deg)
  • num_matched_poses, registered_pct

Optional JSON export: --json-out results.json

Example baseline results (train split)

Cross-camera ALIKED + LightGlue + essential matrix baseline on 61 train/ sequences (Horn Sim(3) alignment, same metrics as above):

Metric Value
Mean ATE 2.18 mm
Mean of per-sequence median ATE 2.06 mm
Mean std ATE (per sequence) 1.13 mm
Registered frames 100%
RPE trans / rot, δ=1 1.17 mm / 6.05°
RPE trans / rot, δ=10 3.13 mm / 7.45°
RPE trans / rot, δ=20 3.07 mm / 7.70°
RPE trans / rot, δ=40 3.40 mm / 7.76°

These numbers are a reference point only; re-run run_baseline.py and evaluate_ate.py on your machine to reproduce.

Expected dataset layout

<data-root>/
  train/<sequence_name>/...
  test/<sequence_name>/...
  hidden_test/<sequence_name>/...   # held-out (not in train/test)

Each sequence directory contains:

pose.txt
K.txt
endoscope1/L/frame_XXXXXX.png
endoscope1/R/frame_XXXXXX.png
endoscope2/L/frame_XXXXXX.png
endoscope2/R/frame_XXXXXX.png

References

Baseline feature extraction and matching:

@article{Zhao2023ALIKED,
    title = {ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation},
    url = {https://arxiv.org/pdf/2304.03608.pdf},
    doi = {10.1109/TIM.2023.3271000},
    journal = {IEEE Transactions on Instrumentation & Measurement},
    author = {Zhao, Xiaoming and Wu, Xingming and Chen, Weihai and Chen, Peter C. Y. and Xu, Qingsong and Li, Zhengguo},
    year = {2023},
    volume = {72},
    pages = {1-16},
}
@inproceedings{lindenberger2023lightglue,
  author    = {Philipp Lindenberger and
               Paul-Edouard Sarlin and
               Marc Pollefeys},
  title     = {{LightGlue: Local Feature Matching at Light Speed}},
  booktitle = {ICCV},
  year      = {2023}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors