Skip to content

ku-vai/TPoS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion

This is the official implementation of the paper: "The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV 2023)".

[Paper] [Project Page]

Usage

Following code is based on Stable Diffusion. So for more details, you can visit link. Get the checkpoints for Stable Diffusion.

gh repo clone ku-vai/TPoS

Train Audio Encoder

You can find audio_encoder/train.py to train the Audio Encoder. You need two datasets (Landscape and VGGSound). This code is based on Sound-guided Semantic Image Manipulation (CVPR2022).

You can use the following codes for training.

cd audio_encoder
python train_audio_encoder.py

Or you can simply download our pretrained weights from following link: link. Locate downloaded weights in pretrained_models.

Video Generation with Sound

When you want to test your model with image dataset, you can easily run the code with bash inference.sh. You can change the audio and text prompt.

Citation

@inproceedings{jeong2023power,
  title={The power of sound (tpos): Audio reactive video generation with stable diffusion},
  author={Jeong, Yujin and Ryoo, Wonjeong and Lee, Seunghyun and Seo, Dabin and Byeon, Wonmin and Kim, Sangpil and Kim, Jinkyu},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={7822--7832},
  year={2023}
}

About

This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%