Skip to content

stevenlsw/visualsync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation


VisualSync: Multi-Camera Synchronization via Cross-View Object Motion

NeurIPS, 2025
Shaowei Liu* · David Yifan Yao* · Saurabh Gupta† · Shenlong Wang† ·

Paper PDF Project Page


This repository contains the pytorch implementation for the paper VisualSync: Multi-Camera Synchronization via Cross-View Object Motion, NeurIPS 2025. In this paper, we propose a generic multi-video synchronization framework.

1. Prerequisites

Before you begin, you will need an OpenAI API key for the dynamic object identification step.

  1. Find your API key at platform.openai.com/api-keys.

  2. Set it as an environment variable by creating a .env file in the root of this project:

    echo "OPENAI_API_KEY=sk-your_api_key_here" > .env

    (Be sure to replace sk-your_api_key_here with your actual key.)


2. Installation

All scripts are located in the ./scripts/ directory and should be called from the project's root directory.

Step 1: Install Environment

This script sets up the Conda environment and installs all necessary dependencies.

Note: This installation is tested to work with CUDA 12.4.

./scripts/install.sh

Step 2: Download Model Weights

This script downloads the pre-trained model weights required for SAM and DEVA.

./scripts/download_weights.sh

3. Data Preparation & Usage

Step 1: Dataset Structure

Your dataset must be organized in the following structure. The main data directory (e.g., DATA_DIR) can be named anything, but the subdirectories must follow this format:

DATA_DIR/
├── scene1_cam1/
│   └── rgb/
│       ├── <img_name>1.jpg
│       ├── <img_name>2.jpg
│       └── ...
├── scene1_cam2/
│   └── rgb/
│       └── ...
├── scene1_3/
│   └── rgb/
│       └── ...
└── scene2_1/
    └── rgb/
        └── ...

Important Formatting Rules:

  • Scene Grouping: The name of each video directory must have its scene name before the first underscore (e.g., scene1_cam1 and scene1_3 are grouped as scene1). This is critical for the VGGT and segmentation scripts.
  • Image Directory: All video frames (images) must be stored in a subdirectory named rgb.
  • Static Cameras: If a video directory name contains "cam" (e.g., scene1_cam1), it is treated as a static camera. For these videos, only the first image will be used for pose prediction.

Step 2: Run Preprocessing

Once your data is formatted correctly, run the main preprocessing script. This will call the other scripts to generate dynamic object masks and pose estimations for your dataset.

./scripts/preprocess.sh

Citation

@inproceedings{liu2025visualsync},
  title={VisualSync: Multi-Camera Synchronization via Cross-View Object Motion},
  author={Liu, Shaowei and Yao, David Yifan and Gupta, Saurabh and Wang, Shenlong},
  booktitle={NeurIPS},
  year={2025}
}

Acknowledgement

  • Uni4D for dynamic object segmentation.
  • VGGT for camera pose estimation.
  • CoTracker3 for video tracking.
  • MASt3R for cross-view correspondence.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published