VisualSync: Multi-Camera Synchronization via Cross-View Object Motion

NeurIPS, 2025
Shaowei Liu* · David Yifan Yao* · Saurabh Gupta† · Shenlong Wang† ·

This repository contains the pytorch implementation for the paper VisualSync: Multi-Camera Synchronization via Cross-View Object Motion, NeurIPS 2025. In this paper, we propose a generic multi-video synchronization framework.

1. Prerequisites

Before you begin, you will need an OpenAI API key for the dynamic object identification step.

Find your API key at platform.openai.com/api-keys.
Set it as an environment variable by creating a .env file in the root of this project:
```
echo "OPENAI_API_KEY=sk-your_api_key_here" > .env
```
(Be sure to replace sk-your_api_key_here with your actual key.)

2. Installation

All scripts are located in the ./scripts/ directory and should be called from the project's root directory.

Step 1: Install Environment

This script sets up the Conda environment and installs all necessary dependencies.

Note: This installation is tested to work with CUDA 12.4.

./scripts/install.sh

Step 2: Download Model Weights

This script downloads the pre-trained model weights required for SAM and DEVA.

./scripts/download_weights.sh

3. Data Preparation & Usage

Step 1: Dataset Structure

Your dataset must be organized in the following structure. The main data directory (e.g., DATA_DIR) can be named anything, but the subdirectories must follow this format:

DATA_DIR/
├── scene1_cam1/
│   └── rgb/
│       ├── <img_name>1.jpg
│       ├── <img_name>2.jpg
│       └── ...
├── scene1_cam2/
│   └── rgb/
│       └── ...
├── scene1_3/
│   └── rgb/
│       └── ...
└── scene2_1/
    └── rgb/
        └── ...

Important Formatting Rules:

Scene Grouping: The name of each video directory must have its scene name before the first underscore (e.g., scene1_cam1 and scene1_3 are grouped as scene1). This is critical for the VGGT and segmentation scripts.
Image Directory: All video frames (images) must be stored in a subdirectory named rgb.
Static Cameras: If a video directory name contains "cam" (e.g., scene1_cam1), it is treated as a static camera. For these videos, only the first image will be used for pose prediction.

Step 2: Run Preprocessing

Once your data is formatted correctly, run the main preprocessing script. This will call the other scripts to generate dynamic object masks and pose estimations for your dataset.

./scripts/preprocess.sh

Citation

@inproceedings{liu2025visualsync},
  title={VisualSync: Multi-Camera Synchronization via Cross-View Object Motion},
  author={Liu, Shaowei and Yao, David Yifan and Gupta, Saurabh and Wang, Shenlong},
  booktitle={NeurIPS},
  year={2025}
}

Acknowledgement

Uni4D for dynamic object segmentation.
VGGT for camera pose estimation.
CoTracker3 for video tracking.
MASt3R for cross-view correspondence.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
preprocess		preprocess
scripts		scripts
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisualSync: Multi-Camera Synchronization via Cross-View Object Motion

1. Prerequisites

2. Installation

Step 1: Install Environment

Step 2: Download Model Weights

3. Data Preparation & Usage

Step 1: Dataset Structure

Important Formatting Rules:

Step 2: Run Preprocessing

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

stevenlsw/visualsync

Folders and files

Latest commit

History

Repository files navigation

VisualSync: Multi-Camera Synchronization via Cross-View Object Motion

1. Prerequisites

2. Installation

Step 1: Install Environment

Step 2: Download Model Weights

3. Data Preparation & Usage

Step 1: Dataset Structure

Important Formatting Rules:

Step 2: Run Preprocessing

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages