Déjà View: Looping Transformers for Multi-View 3D Reconstruction

NVIDIA University of Modena and Reggio Emilia University of Toronto ETH Zurich

Alessandro Burzio*, Tobias Fischer*, Sven Elflein, Qunjie Zhou, Riccardo de Lutio, Jiawei Ren, Jiahui Huang, Shengyu Huang, Marc Pollefeys, Laura Leal-Taixé, Zan Gojcic+, Haithem Turki+

Overview

DéjàView (DVLT) is a recurrent transformer for multi-view 3D reconstruction. It loops a shared block of frame/global attention with discrete depth indexing, producing per-pixel rays, depth, confidence, and camera poses from an unordered set of images. Trained once, the number of refinement steps K becomes an inference-time compute knob, matching or outperforming substantially larger feed-forward baselines at a fraction of their parameters.

This repository contains:

The DVLT model + four configurable ablations (vanilla, decoupled blocks, no s_out token, no depth-scaling).
Evaluation wrappers for five baselines: VGGT, VGGT-Omega, Depth-Anything-3, MapAnything, and Pi3. Each wrapper imports the upstream package (installed separately — see INSTALL.md).
A training stack built on accelerate + Hydra, with optional W&B logging.
A Stage-2 fine-tune recipe for the depth-conv head.
Rerun-based visualization tools.

Release status

Quickstart

Install

See docs/INSTALL.md. The short version:

conda create -n dvlt python=3.12 && conda activate dvlt
conda install pytorch=2.5.1 torchvision pytorch-cuda=12.4 -c pytorch -c nvidia -c conda-forge
pip install -e .[all]

Quick setup

Quick example script:

import torch
from accelerate import Accelerator

from dvlt.model.dvlt.model import DVLT
from dvlt.util.preprocess import load_sequence, preprocess_images

checkpoint_path = "nvidia/dvlt"  # local dir, HTTPS URL, or HF Hub repo id
# load_sequence accepts a directory, a single video, or an explicit list of files.
input_path = "path/to/scene_dir"
# Or: input_path = "path/to/clip.mp4"
# Or: from glob import glob; input_path = sorted(glob("path/to/scene_dir/*.png"))

accelerator = Accelerator(mixed_precision="bf16")

model = DVLT(img_size=504)
model.load_pretrained(checkpoint_path, strict=True)
model.setup_test(accelerator)

_, frames = load_sequence(input_path)
batch = preprocess_images(frames, img_size=504, patch_size=14, device=accelerator.device)

with torch.no_grad(), accelerator.autocast():
    predictions = model.predict(batch, accelerator)

cameras = predictions["cameras"][0]            # Cameras object with shape [S]
extrinsics_c2w = cameras.camera_to_worlds       # (S, 3, 4) — OpenCV convention [R | t]
intrinsics = cameras.get_intrinsics_matrices()  # (S, 3, 3)

depths = predictions["depths"][0]              # (S, H, W)
world_points = predictions["world_points"][0]  # (S, H, W, 3)

Train

# Single-GPU
python -m dvlt.scripts.train --config-name dvlt-large data=scannetpp

# Multi-GPU (4 GPUs)
accelerate launch --num-processes 4 -m dvlt.scripts.train --config-name dvlt-large data=scannetpp

# Resume
python -m dvlt.scripts.train \
    --config-dir=outputs/<run> \
    --config-name=config.yaml \
    trainer.resume_from_checkpoint=latest

Evaluate

benchmark_lite (DTU, ETH3D, 7Scenes) is a convenience benchmark over the datasets that don't require heavy preprocessing; the full benchmark adds ScanNet++ and NuScenes.

python -m dvlt.scripts.test --config-name dvlt data=benchmark
# multi-GPU: accelerate launch --num-processes <N> -m dvlt.scripts.test --config-name dvlt data=benchmark
python -m dvlt.scripts.test --config-name dvlt data=benchmark_lite
# multi-GPU: accelerate launch --num-processes <N> -m dvlt.scripts.test --config-name dvlt data=benchmark_lite

DVLT reference results on the full benchmark:

Dataset	Pose AUC@3	Pose AUC@30	Depth inlier@3%	Depth AbsRel
DTU	0.8319	0.9880	0.9706	0.0093
ETH3D	0.6604	0.9536	0.7717	0.0267
7Scenes	0.1393	0.8172	0.7437	0.0349
ScanNet++	0.7941	0.9803	0.9239	0.0167
NuScenes	0.4340	0.8534	0.5853	0.0673

Interactive demo

Browser UI for uploading images / video and exploring the predicted 3D point cloud, depth maps and camera trajectory. The dropdown switches between DVLT and the baseline wrappers (VGGT, VGGT-Omega, DA3, Pi3, MapAnything); each baseline requires its upstream package installed (see docs/INSTALL.md).

# Launch on http://localhost:7860 (DVLT preselected)
python -m dvlt.scripts.gradio_demo

The same script also has a headless offline mode that skips Gradio and writes a .glb + .rrd per (sequence, model) under demo_outputs/<sequence_name>/. --input accepts a directory of images, a single image, or a video file (mp4/mov/gif/...), and may be repeated to process multiple sequences in one go; --models is a comma-separated list of config names from the curated registry (or all).

# Run two models on two sequences (one image dir, one video)
python -m dvlt.scripts.gradio_demo --offline \
    --input /path/to/scene_dir \
    --input /path/to/clip.mp4 \
    --models dvlt

# Run every registered model on one sequence
python -m dvlt.scripts.gradio_demo --offline --input /path/to/scene_dir --models all

Configuration

DVLT uses Hydra for configuration. Top-level experiment configs live in src/dvlt/config/experiments/:

Config	Description
`dvlt-large`	Stage-1 recipe (large model, full training schedule, linear depth head).
`dvlt-large-ablation`	Vanilla ablation parent — toggle decoupled blocks, no-`s_out`, no-depthscale via overrides.
`dvlt-large-ablation-decoupled`	Fully decoupled blocks (`recurrence_mode=none`, no looping): a distinct block per step, fixed 16 steps.
`dvlt-large-depthconv-stage2`	Stage-2 depth-conv head fine-tune (matches the released checkpoint and the model's default `depth_head_type="conv"`).
`dvlt`	Inference-only alias for the released stage-2 checkpoint.
`vggt`, `vggt_omega`, `da3-{base,large,giant}`, `pi3`, `pi3x`, `mapanything`	Eval-only baseline wrappers. Require the upstream package installed (see INSTALL.md).

User configuration (data paths)

Per-user settings (most importantly, the dataset root) live in src/dvlt/config/experiments/user/. Copy default.yaml to local.yaml, edit data_root, and select it via user=local:

python -m dvlt.scripts.train --config-name dvlt-large data=scannetpp user=local

user.data_root can also be overridden inline or via the DVLT_DATA_ROOT environment variable.

Selecting datasets

Pick a single curated dataset config:

python -m dvlt.scripts.train --config-name dvlt-large data=scannetpp
python -m dvlt.scripts.train --config-name dvlt-large data=mixed_all

Tab completion

For scripts using the @cli decorator (train, test, visualize):

eval "$(python -m dvlt.scripts.train -sc install)"
# later, to remove:
eval "$(python -m dvlt.scripts.train -sc uninstall)"

Documentation

docs/INSTALL.md — environment setup + baseline installs
docs/data/DATA.md — data pipeline overview + how to add a new dataset parser
docs/CONTRIB.md — dev setup, code style, tests
docs/TESTING.md — full test-runner documentation

Acknowledgments

We are also grateful to several other open-source repositories that we drew inspiration from or built upon during the development of our pipeline:

Citation

If you find this work useful, please cite:

@article{burzio2026dejaview,
  title   = {D\'ej\`a View: Looping Transformers for Multi-View 3D Reconstruction},
  author  = {Burzio, Alessandro and Fischer, Tobias and Elflein, Sven and Zhou, Qunjie and de Lutio, Riccardo and Ren, Jiawei and Huang, Jiahui and Huang, Shengyu and Pollefeys, Marc and Leal-Taix{\'e}, Laura and Gojcic, Zan and Turki, Haithem},
  journal = {arXiv preprint arXiv:2605.30215},
  year    = {2026}
}

License + attribution

The DVLT code is released under the Apache License, Version 2.0 — see LICENSE. The model weights (the nvidia/dvlt checkpoint) are released under the NVIDIA License — non-commercial, research-and-evaluation use only; see LICENSES/NVIDIA-LICENSE.txt.

Portions of the codebase are adapted from third-party open-source projects (DINOv2, PyTorch3D, MoGe, AnyCalib, MultiNeRF, Depth-Anything-3, VGGT). Each adapted file carries the upstream copyright + license notice in its header; see THIRD_PARTY_LICENSES.md for the full attribution map and full upstream license texts. The VGGT-derived files are distributed under the VGGT License; see LICENSES/VGGT-LICENSE.txt.

The baseline evaluation wrappers in src/dvlt/model/{vggt,vggt_omega,da3,mapanything,pi3}/ import (do not vendor) their respective upstream packages, each of which is governed by its own license — see THIRD_PARTY_LICENSES.md §"Upstream packages used for evaluation".

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSES		LICENSES
assets		assets
docs		docs
src/dvlt		src/dvlt
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Déjà View: Looping Transformers for Multi-View 3D Reconstruction

Overview

Release status

Quickstart

Install

Quick setup

Train

Evaluate

Interactive demo

Configuration

User configuration (data paths)

Selecting datasets

Tab completion

Documentation

Acknowledgments

Citation

License + attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Déjà View: Looping Transformers for Multi-View 3D Reconstruction

Overview

Release status

Quickstart

Install

Quick setup

Train

Evaluate

Interactive demo

Configuration

User configuration (data paths)

Selecting datasets

Tab completion

Documentation

Acknowledgments

Citation

License + attribution

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages