NeurIPS, 2025
Shaowei Liu*
·
David Yifan Yao*
·
Saurabh Gupta†
·
Shenlong Wang†
·
This repository contains the pytorch implementation for the paper VisualSync: Multi-Camera Synchronization via Cross-View Object Motion, NeurIPS 2025. In this paper, we propose a generic multi-video synchronization framework.
Before you begin, you will need an OpenAI API key for the dynamic object identification step.
-
Find your API key at platform.openai.com/api-keys.
-
Set it as an environment variable by creating a
.envfile in the root of this project:echo "OPENAI_API_KEY=sk-your_api_key_here" > .env
(Be sure to replace
sk-your_api_key_herewith your actual key.)
All scripts are located in the ./scripts/ directory and should be called from the project's root directory.
This script sets up the Conda environment and installs all necessary dependencies.
Note: This installation is tested to work with CUDA 12.4.
./scripts/install.shThis script downloads the pre-trained model weights required for SAM and DEVA.
./scripts/download_weights.shYour dataset must be organized in the following structure. The main data directory (e.g., DATA_DIR) can be named anything, but the subdirectories must follow this format:
DATA_DIR/
├── scene1_cam1/
│ └── rgb/
│ ├── <img_name>1.jpg
│ ├── <img_name>2.jpg
│ └── ...
├── scene1_cam2/
│ └── rgb/
│ └── ...
├── scene1_3/
│ └── rgb/
│ └── ...
└── scene2_1/
└── rgb/
└── ...
- Scene Grouping: The name of each video directory must have its scene name before the first underscore (e.g.,
scene1_cam1andscene1_3are grouped asscene1). This is critical for the VGGT and segmentation scripts. - Image Directory: All video frames (images) must be stored in a subdirectory named
rgb. - Static Cameras: If a video directory name contains "cam" (e.g.,
scene1_cam1), it is treated as a static camera. For these videos, only the first image will be used for pose prediction.
Once your data is formatted correctly, run the main preprocessing script. This will call the other scripts to generate dynamic object masks and pose estimations for your dataset.
./scripts/preprocess.sh@inproceedings{liu2025visualsync},
title={VisualSync: Multi-Camera Synchronization via Cross-View Object Motion},
author={Liu, Shaowei and Yao, David Yifan and Gupta, Saurabh and Wang, Shenlong},
booktitle={NeurIPS},
year={2025}
}- Uni4D for dynamic object segmentation.
- VGGT for camera pose estimation.
- CoTracker3 for video tracking.
- MASt3R for cross-view correspondence.