Skip to content

objectfolder/sound-generation-dynamic-objects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sound Generation of Dynamic Objects

Given a video clip of a falling object, the goal of this task is to generate the corresponding sound based on the visual appearance and motion of the object. The generated sound must match the object’s intrinsic properties (e.g., material type) and temporally align with the object’s movement in the given video. This task is related to prior work on sound generation from in-the-wild videos, but here we focus more on predicting soundtracks that closely match the object dynamics.

Usage

Data Preparation

The dataset used to train the baseline models can be downloaded from here

Training & Evaluation

Start the training process, and test the best model on test-set after training:

python main.py --batch_size 32 --weight_decay 1e-2 --lr 1e-3 \
               --model RegNet --exp RegNet \
               --config_location ./configs/regnet_aux_4.yml

Evaluate the best model in RegNet:

python main.py --batch_size 32 --weight_decay 1e-2 --lr 1e-3 \
               --model RegNet --exp RegNet \
               --config_location ./configs/regnet_aux_4.yml \
               --eval

Add your own model

To train and test your new model on ObjectFolder Sound Generation of Dynamic Objects Benchmark, you only need to modify several files in models, you may follow these simple steps.

  1. Create new model directory

    mkdir models/my_model
  2. Design new model

    cd models/my_model
    touch my_model.py
  3. Build the new model and its optimizer

    Add the following code into models/build.py:

    elif args.model == 'my_model':
        from my_model import my_model
        model = my_model.my_model(args)
        optimizer = optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
  4. Add the new model into the pipeline

    Once the new model is built, it can be trained and evaluated similarly:

    python main.py --batch_size 32 --weight_decay 1e-2 --lr 1e-3 \
                   --model my_model --exp my_model \
                   --config_location ./configs/my_model.yml

Results on ObjectFolder Sound Generation of Dynamic Objects Benchmark

In our experiments, we choose 500 objects with reasonable scales, and 10 videos are generated for each object. We split the 10 videos into train/val/test splits of 8/1/1.

Results on ObjectFolder

Method STFT Envelope CDPAM
RegNet 0.010 0.036 0.0000565
MCR 0.034 0.042 0.0000592

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •