This repository supports the iMED 2026 challenge subtask on pose estimation, part of EndoVis 2026 at MICCAI 2026 (Strasbourg, France).
[Challenge Website] [Participate] [Parent Challenge Hub]
Example sequences with pose overlays
Minimal baseline for iMED-PE trajectory estimation using ALIKED + LightGlue + essential matrix.
Sequences are organized under train/ and test/ for convenience (local development, ablations, and reporting numbers on data with public ground truth). You are welcome to train on all released sequences (train + test combined) if that helps your method—the challenge maintains a separate held-out set (hidden_test/) that is not part of either split and is used for final evaluation.
Ground-truth pose.txt stores the trajectory of endoscope2/L relative to endoscope1/L as frame-to-initial transforms:
[ T_{\mathrm{rel}}(t) = T_0^{-1}, T(t) ]
The first frame is identity. This is not single-camera temporal VO on endoscope2/L alone: in in-vivo use, endoscope1 can move (sometimes not insignificantly due to physiological movements from the subject), so one-camera relative motion couples scope motion into a drifting “world.” The task is same-time cross-camera pose between endoscope1/L and endoscope2/L.
- Loads sequences from
train/ortest/. - Matches
endoscope1/Landendoscope2/Lat the same frame index. - Estimates cross-camera relative pose per frame (identity at frame 0).
- Writes predictions in
pose.txtformat:frame_idx tx ty tz qx qy qz qw.
cd <repo>
python -m pip install -r requirements.txtpython scripts/run_baseline.py \
--data-root <data-root> \
--split train \
--output-root <pred-root> \
--device cudaOutputs:
<pred-root>/train/<sequence_name>/pose.txt<pred-root>/test/<sequence_name>/pose.txt
From repo root (build needs network; run does not):
docker build -t imedpe:dev .
docker run --rm --gpus all --network=none --memory=20g \
-v <data-root>/train:/input:ro -v /tmp/out:/output imedpe:devParticipant submission image: see imedpe_submission/README.md.
We use the same metrics scripts as CLiMB for EndoVIS consistency. Huge thanks to the CLiMB team!
python scripts/evaluate_ate.py \
--data-root <data-root> \
--split train \
--pred-root <pred-root>Uses Horn Sim(3) alignment (Endomapper-style) on translations, then reports:
- ATE:
mean_ate,std_ate,median_ate(mm) - RPE at frame deltas 1, 10, 20, 40: translational (mm) and rotational (deg)
num_matched_poses,registered_pct
Optional JSON export: --json-out results.json
Cross-camera ALIKED + LightGlue + essential matrix baseline on 61 train/ sequences (Horn Sim(3) alignment, same metrics as above):
| Metric | Value |
|---|---|
| Mean ATE | 2.18 mm |
| Mean of per-sequence median ATE | 2.06 mm |
| Mean std ATE (per sequence) | 1.13 mm |
| Registered frames | 100% |
| RPE trans / rot, δ=1 | 1.17 mm / 6.05° |
| RPE trans / rot, δ=10 | 3.13 mm / 7.45° |
| RPE trans / rot, δ=20 | 3.07 mm / 7.70° |
| RPE trans / rot, δ=40 | 3.40 mm / 7.76° |
These numbers are a reference point only; re-run run_baseline.py and evaluate_ate.py on your machine to reproduce.
<data-root>/
train/<sequence_name>/...
test/<sequence_name>/...
hidden_test/<sequence_name>/... # held-out (not in train/test)
Each sequence directory contains:
pose.txt
K.txt
endoscope1/L/frame_XXXXXX.png
endoscope1/R/frame_XXXXXX.png
endoscope2/L/frame_XXXXXX.png
endoscope2/R/frame_XXXXXX.png
Baseline feature extraction and matching:
@article{Zhao2023ALIKED,
title = {ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation},
url = {https://arxiv.org/pdf/2304.03608.pdf},
doi = {10.1109/TIM.2023.3271000},
journal = {IEEE Transactions on Instrumentation & Measurement},
author = {Zhao, Xiaoming and Wu, Xingming and Chen, Weihai and Chen, Peter C. Y. and Xu, Qingsong and Li, Zhengguo},
year = {2023},
volume = {72},
pages = {1-16},
}@inproceedings{lindenberger2023lightglue,
author = {Philipp Lindenberger and
Paul-Edouard Sarlin and
Marc Pollefeys},
title = {{LightGlue: Local Feature Matching at Light Speed}},
booktitle = {ICCV},
year = {2023}
}

