A clean, tested reimplementation of the Matchability Error (
from matchability import matchability_error
res = matchability_error(left, right_gt, right_pred) # paths | PIL | numpy | torch all accepted
print(res.error_pct, res.tp, res.fp, res.fn) # DeDoDe v2 auto-downloads on first callUsing a robust matcher (DeDoDe v2), we detect one fixed set of keypoints in the left image and ask which
of them have an epipolar-consistent match in the GT right view (
- π’ TP β correct, matchable detail preserved in both views.
- π FP (hallucination) β detail matchable in the prediction but not in the GT (invented geometry).
- π΄ FN (omission) β detail matchable in the GT but lost in the prediction (over-smoothing / blur).
A lower error means the synthesized view keeps consistent, matchable texture along the correct epipolar
geometry. The full operational definition (including choices the paper leaves implicit) is in
docs/metric.md.
pip install -e . # torch + kornia come as core deps; DeDoDe v2 works out of the box
pip install -e ".[viz]" # + matplotlib, for sensitivity plots
pip install -e ".[dev]" # + pytest + ruffThe DeDoDe v2 checkpoint is auto-downloaded and cached on first use, and the device is auto-selected (mps > cuda > cpu) β Apple-Silicon MPS is first-class.
from matchability import Matchability
metric = Matchability() # default DeDoDe v2; loads the model once
res = metric(left, right_gt, right_pred)
print(f"E_match = {res.error_pct:.1f}% (TP={res.tp}, FP={res.fp}, FN={res.fn})")
# tune knobs / pick a device explicitly:
metric = Matchability(tau=2.0, n_keypoints=5000, working_resolution=768, device="mps")matchability left.png right_gt.png right_pred.png
matchability left.png right_gt.png right_pred.png --backend classical --viz overlay.png| Backend | Use | Notes |
|---|---|---|
DeDoDeV2Matcher (default) |
faithful metric | kornia DeDoDe v2 (L-C4-v2 + G-upright), MPS/CUDA/CPU |
ClassicalMatcher |
fast / weight-free | SIFT + mutual-NN; used in CI |
MockMatcher |
unit tests | deterministic, programmable |
The metric core is matcher-agnostic β pass any Matcher to Matchability(matcher=...).
We swept 11 distortions across their severity ranges on 5 real Apple Vision Pro spatial video
(MV-HEVC format) stereo pairs. For each pair, the GT right view was distorted to simulate a
predicted right view (R_pred = distort(R_gt)), and E_match was computed with DeDoDe v2
(768 px, 5000 keypoints, Ο=2 px) alongside SSIM and PSNR for comparison. This reproduces the
sensitivity analysis from Appendix D.1 of the Elastic3D paper.
Each stereo pair is a single frame extracted from a 2200Γ2200 AVP video (left and right eyes stored as separate views in one MV-HEVC file). Metrics are averaged over the 5 videos.
Crimson = E_match, steel-blue = 1βSSIM, sea-green = 1βPSNR (normalised per-distortion). Top row: insensitive distortions β E_match stays flat while SSIM/PSNR degrade (pixel-level change without stereo-fidelity loss). Remaining rows: sensitive distortions β E_match rises sharply.
- Texture distortions (blur, noise, JPEG) β
E_matchrises sharply as keypoints are destroyed. - Geometric distortions (horizontal shift, disparity scale) β
E_matchstays flat because DeDoDe is translation-invariant. SSIM/PSNR degrade while stereo fidelity is intact. - Epipolar violations (vertical shift) β
E_matchrises because matches are filtered by the epipolar constraint (|Ξy| > Ο = 2px), while SSIM/PSNR barely change for small shifts.
Insensitive (flat) β E_match should stay low despite pixel-level degradation:
| Distortion | Family | What it tests |
|---|---|---|
disparity_scale |
geometric | Horizontal stretch (wrong stereo strength) β only geometry, not texture |
horizontal_shift |
geometric | Pure horizontal disparity offset β DeDoDe is translation-invariant |
Sensitive (rises) β E_match should rise with severity:
| Distortion | Family | What it tests |
|---|---|---|
contrast_fade |
texture | Low-contrast regions become ambiguous to match |
downscale_upscale |
texture | Bicubic downsample + upsample loses high-frequency texture |
elastic_warp |
geometric | Smooth spatial warp disrupts both descriptor and epipolar geometry |
gaussian_blur |
texture | Over-smoothing destroys keypoint texture |
gaussian_noise |
texture | Salt-and-pepper pattern disrupts descriptors |
jpeg |
texture | Compression artefacts bleed into descriptors |
occlusion_patch |
structural | Black patch simulates disocclusion; error β occluded fraction |
vertical_shift |
geometric | Breaks epipolar consistency ( |
Matches are drawn as coloured lines on a left β₯ right composite: π’ TP β preserved in both GT and pred Β |Β π FP β in pred but not GT (hallucination) Β |Β π΄ FN β in GT but absent in pred (omission, line points to expected location)
Insensitive β mostly π’, E_match stays low:
| Disparity scale (Γ1.2, 22.4%) | Horizontal shift (32px, 6.7%) |
|---|---|
![]() |
![]() |
Sensitive β increasing π΄π with severity:
| Contrast fade (Γ0.2, 53.5%) | Downscale+upscale (Γ0.25, 63.6%) | Elastic warp (a=8, 82.8%) |
|---|---|---|
![]() |
![]() |
![]() |
| Gaussian blur (Ο=8, 99.7%) | Gaussian noise (Ο=40, 70.7%) | JPEG (q=5, 70.2%) |
|---|---|---|
![]() |
![]() |
![]() |
| Occlusion patch (50%, 70.6%) | Vertical shift (6px, 99.6%) | |
|---|---|---|
![]() |
![]() |
Full numbers (DeDoDe v2, 5 pairs, 768 px, Ο=2px):
| Distortion | Expected | E_match min β max |
SSIM min β max | PSNR(dB) min β max |
|---|---|---|---|---|
| insensitive (flat) | ||||
| disparity_scale | flat | 0.0% β 81.5% | 1.00 β 0.57 | 100 β 16.8 |
| horizontal_shift | flat | 0.0% β 44.2% | 1.00 β 0.54 | 100 β 15.8 |
| sensitive (rises) | ||||
| contrast_fade | rises | 0.0% β 96.1% | 1.00 β 0.67 | 100 β 14.7 |
| downscale_upscale | rises | 0.0% β 99.1% | 1.00 β 0.73 | 100 β 26.7 |
| elastic_warp | rises | 0.0% β 96.8% | 1.00 β 0.58 | 100 β 19.3 |
| gaussian_blur | rises | 0.0% β 100.0% | 1.00 β 0.70 | 100 β 23.1 |
| gaussian_noise | rises | 0.0% β 98.2% | 1.00 β 0.05 | 100 β 11.3 |
| jpeg | rises | 9.5% β 92.5% | 0.99 β 0.73 | 46 β 25.7 |
| occlusion_patch | rises | 0.1% β 88.7% | 1.00 β 0.67 | 75 β 14.2 |
| vertical_shift | rises | 0.0% β 99.9% | 1.00 β 0.60 | 100 β 21.6 |
To reproduce the study from scratch:
python experiments/scripts/extract_frames.py --input-dir data/raw --output-dir data/frames
python experiments/scripts/run_sensitivity.py --backend dedode --working-resolution 768
python experiments/scripts/generate_overlays.py # regenerate overlays only (fast)To regenerate only the plots from an existing CSV:
python experiments/scripts/plot_results.pyA separate experiment measures how VAE encode-decode cycles degrade the Matchability metric. Seven publicly available VAEs are compared across two architecture families:
| VAE | HuggingFace repo | Family | E_match |
|---|---|---|---|
| TAESD | madebyollin/taesd |
AutoencoderTiny (SD1) | 50.1% |
| TAESDXL | madebyollin/taesdxl |
AutoencoderTiny (SDXL) | 49.8% |
| TAESD3 | madebyollin/taesd3 |
AutoencoderTiny (SD3) | 36.2% |
| TAEF1 | madebyollin/taef1 |
AutoencoderTiny (FLUX) | 37.5% |
| SD-VAE-MSE | stabilityai/sd-vae-ft-mse |
AutoencoderKL (SD1) | 41.9% |
| SD-VAE-EMA | stabilityai/sd-vae-ft-ema |
AutoencoderKL (SD1) | 41.8% |
| SDXL-VAE | madebyollin/sdxl-vae-fp16-fix |
AutoencoderKL (SDXL) | 38.7% |
All VAEs introduce substantial E_match degradation. Notably, the newer-generation tiny AEs (TAESD3, TAEF1) are ~14 pp better than their SD1/SDXL predecessors despite similar parameter counts β E_match exposes the reconstruction quality improvements that pixel metrics like SSIM miss.
pip install diffusers accelerate
python experiments/scripts/run_vae_experiment.py --backend dedodepip install -e ".[dev,viz]"
pytest -m "not dedode" # fast unit + property tests (no model weights)
pytest -m dedode # slow: real DeDoDe v2 (downloads weights once)
ruff check .CI runs the fast suite on every push/PR. An opt-in job exercises the real DeDoDe backend on a
weekly schedule. Commits follow Conventional Commits and
versioning is automated with release-please β see
CONTRIBUTING.md.
@inproceedings{metzger2026elastic3d,
title = {Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding},
author = {Metzger, Nando and Truong, Prune and Bhat, Goutam and Schindler, Konrad and Tombari, Federico},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026},
}










