Skip to content

nandometzger/Matchability

Repository files navigation

Matchability

CI License: MIT Python 3.11+

A clean, tested reimplementation of the Matchability Error ($\mathcal{E}_{\text{Match}}$) β€” the stereoscopic-fidelity metric from Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding (Metzger et al., CVPR 2026). It measures whether a predicted right view preserves the same matchable, epipolar-consistent texture as the ground-truth right view β€” a proxy for the binocular rivalry that makes synthesized stereo uncomfortable to watch.

from matchability import matchability_error

res = matchability_error(left, right_gt, right_pred)   # paths | PIL | numpy | torch all accepted
print(res.error_pct, res.tp, res.fp, res.fn)           # DeDoDe v2 auto-downloads on first call

What it measures

Using a robust matcher (DeDoDe v2), we detect one fixed set of keypoints in the left image and ask which of them have an epipolar-consistent match in the GT right view ($M_{gt}$) versus the predicted right view ($M_{pred}$). The error is the complement of their Jaccard index:

$$\mathcal{E}_{\text{Match}} = 1 - \frac{|M_{gt}\cap M_{pred}|}{|M_{gt}\cup M_{pred}|} = \frac{N_{FP}+N_{FN}}{N_{TP}+N_{FP}+N_{FN}}$$

  • 🟒 TP β€” correct, matchable detail preserved in both views.
  • 🟠 FP (hallucination) β€” detail matchable in the prediction but not in the GT (invented geometry).
  • πŸ”΄ FN (omission) β€” detail matchable in the GT but lost in the prediction (over-smoothing / blur).

A lower error means the synthesized view keeps consistent, matchable texture along the correct epipolar geometry. The full operational definition (including choices the paper leaves implicit) is in docs/metric.md.

Install

pip install -e .          # torch + kornia come as core deps; DeDoDe v2 works out of the box
pip install -e ".[viz]"   # + matplotlib, for sensitivity plots
pip install -e ".[dev]"   # + pytest + ruff

The DeDoDe v2 checkpoint is auto-downloaded and cached on first use, and the device is auto-selected (mps > cuda > cpu) β€” Apple-Silicon MPS is first-class.

Usage

Python

from matchability import Matchability

metric = Matchability()                          # default DeDoDe v2; loads the model once
res = metric(left, right_gt, right_pred)
print(f"E_match = {res.error_pct:.1f}%  (TP={res.tp}, FP={res.fp}, FN={res.fn})")

# tune knobs / pick a device explicitly:
metric = Matchability(tau=2.0, n_keypoints=5000, working_resolution=768, device="mps")

Command line

matchability left.png right_gt.png right_pred.png
matchability left.png right_gt.png right_pred.png --backend classical --viz overlay.png

Backends

Backend Use Notes
DeDoDeV2Matcher (default) faithful metric kornia DeDoDe v2 (L-C4-v2 + G-upright), MPS/CUDA/CPU
ClassicalMatcher fast / weight-free SIFT + mutual-NN; used in CI
MockMatcher unit tests deterministic, programmable

The metric core is matcher-agnostic β€” pass any Matcher to Matchability(matcher=...).


Empirical sensitivity study

Setup

We swept 11 distortions across their severity ranges on 5 real Apple Vision Pro spatial video (MV-HEVC format) stereo pairs. For each pair, the GT right view was distorted to simulate a predicted right view (R_pred = distort(R_gt)), and E_match was computed with DeDoDe v2 (768 px, 5000 keypoints, Ο„=2 px) alongside SSIM and PSNR for comparison. This reproduces the sensitivity analysis from Appendix D.1 of the Elastic3D paper.

Each stereo pair is a single frame extracted from a 2200Γ—2200 AVP video (left and right eyes stored as separate views in one MV-HEVC file). Metrics are averaged over the 5 videos.

Results

Comparison

Crimson = E_match, steel-blue = 1βˆ’SSIM, sea-green = 1βˆ’PSNR (normalised per-distortion). Top row: insensitive distortions β€” E_match stays flat while SSIM/PSNR degrade (pixel-level change without stereo-fidelity loss). Remaining rows: sensitive distortions β€” E_match rises sharply.

Why E_match is different from SSIM/PSNR

  • Texture distortions (blur, noise, JPEG) β†’ E_match rises sharply as keypoints are destroyed.
  • Geometric distortions (horizontal shift, disparity scale) β†’ E_match stays flat because DeDoDe is translation-invariant. SSIM/PSNR degrade while stereo fidelity is intact.
  • Epipolar violations (vertical shift) β†’ E_match rises because matches are filtered by the epipolar constraint (|Ξ”y| > Ο„ = 2px), while SSIM/PSNR barely change for small shifts.

Distortion catalogue

Insensitive (flat) β€” E_match should stay low despite pixel-level degradation:

Distortion Family What it tests
disparity_scale geometric Horizontal stretch (wrong stereo strength) β€” only geometry, not texture
horizontal_shift geometric Pure horizontal disparity offset β€” DeDoDe is translation-invariant

Sensitive (rises) β€” E_match should rise with severity:

Distortion Family What it tests
contrast_fade texture Low-contrast regions become ambiguous to match
downscale_upscale texture Bicubic downsample + upsample loses high-frequency texture
elastic_warp geometric Smooth spatial warp disrupts both descriptor and epipolar geometry
gaussian_blur texture Over-smoothing destroys keypoint texture
gaussian_noise texture Salt-and-pepper pattern disrupts descriptors
jpeg texture Compression artefacts bleed into descriptors
occlusion_patch structural Black patch simulates disocclusion; error ∝ occluded fraction
vertical_shift geometric Breaks epipolar consistency (

Match overlays (video 0001)

Matches are drawn as coloured lines on a left βˆ₯ right composite: 🟒 TP β€” preserved in both GT and pred Β |Β  🟠 FP β€” in pred but not GT (hallucination) Β |Β  πŸ”΄ FN β€” in GT but absent in pred (omission, line points to expected location)

Insensitive β€” mostly 🟒, E_match stays low:

Disparity scale (Γ—1.2, 22.4%) Horizontal shift (32px, 6.7%)
disp hshift

Sensitive β€” increasing πŸ”΄πŸŸ  with severity:

Contrast fade (Γ—0.2, 53.5%) Downscale+upscale (Γ—0.25, 63.6%) Elastic warp (a=8, 82.8%)
contrast down elastic
Gaussian blur (Οƒ=8, 99.7%) Gaussian noise (Οƒ=40, 70.7%) JPEG (q=5, 70.2%)
blur noise jpeg
Occlusion patch (50%, 70.6%) Vertical shift (6px, 99.6%)
occ vshift

Full numbers (DeDoDe v2, 5 pairs, 768 px, Ο„=2px):

Distortion Expected E_match min β†’ max SSIM min β†’ max PSNR(dB) min β†’ max
insensitive (flat)
disparity_scale flat 0.0% β†’ 81.5% 1.00 β†’ 0.57 100 β†’ 16.8
horizontal_shift flat 0.0% β†’ 44.2% 1.00 β†’ 0.54 100 β†’ 15.8
sensitive (rises)
contrast_fade rises 0.0% β†’ 96.1% 1.00 β†’ 0.67 100 β†’ 14.7
downscale_upscale rises 0.0% β†’ 99.1% 1.00 β†’ 0.73 100 β†’ 26.7
elastic_warp rises 0.0% β†’ 96.8% 1.00 β†’ 0.58 100 β†’ 19.3
gaussian_blur rises 0.0% β†’ 100.0% 1.00 β†’ 0.70 100 β†’ 23.1
gaussian_noise rises 0.0% β†’ 98.2% 1.00 β†’ 0.05 100 β†’ 11.3
jpeg rises 9.5% β†’ 92.5% 0.99 β†’ 0.73 46 β†’ 25.7
occlusion_patch rises 0.1% β†’ 88.7% 1.00 β†’ 0.67 75 β†’ 14.2
vertical_shift rises 0.0% β†’ 99.9% 1.00 β†’ 0.60 100 β†’ 21.6

To reproduce the study from scratch:

python experiments/scripts/extract_frames.py --input-dir data/raw --output-dir data/frames
python experiments/scripts/run_sensitivity.py --backend dedode --working-resolution 768
python experiments/scripts/generate_overlays.py   # regenerate overlays only (fast)

To regenerate only the plots from an existing CSV:

python experiments/scripts/plot_results.py

VAE roundtrip experiment

A separate experiment measures how VAE encode-decode cycles degrade the Matchability metric. Seven publicly available VAEs are compared across two architecture families:

VAE comparison

VAE HuggingFace repo Family E_match
TAESD madebyollin/taesd AutoencoderTiny (SD1) 50.1%
TAESDXL madebyollin/taesdxl AutoencoderTiny (SDXL) 49.8%
TAESD3 madebyollin/taesd3 AutoencoderTiny (SD3) 36.2%
TAEF1 madebyollin/taef1 AutoencoderTiny (FLUX) 37.5%
SD-VAE-MSE stabilityai/sd-vae-ft-mse AutoencoderKL (SD1) 41.9%
SD-VAE-EMA stabilityai/sd-vae-ft-ema AutoencoderKL (SD1) 41.8%
SDXL-VAE madebyollin/sdxl-vae-fp16-fix AutoencoderKL (SDXL) 38.7%

All VAEs introduce substantial E_match degradation. Notably, the newer-generation tiny AEs (TAESD3, TAEF1) are ~14 pp better than their SD1/SDXL predecessors despite similar parameter counts β€” E_match exposes the reconstruction quality improvements that pixel metrics like SSIM miss.

pip install diffusers accelerate
python experiments/scripts/run_vae_experiment.py --backend dedode

Development

pip install -e ".[dev,viz]"
pytest -m "not dedode"     # fast unit + property tests (no model weights)
pytest -m dedode           # slow: real DeDoDe v2 (downloads weights once)
ruff check .

CI runs the fast suite on every push/PR. An opt-in job exercises the real DeDoDe backend on a weekly schedule. Commits follow Conventional Commits and versioning is automated with release-please β€” see CONTRIBUTING.md.

Citation

@inproceedings{metzger2026elastic3d,
  title     = {Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding},
  author    = {Metzger, Nando and Truong, Prune and Bhat, Goutam and Schindler, Konrad and Tombari, Federico},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026},
}

License

MIT

About

Inofficial reimplementation of the Matchability metric of the Elastic3D paper

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages