The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion

This is the official implementation of the paper: "The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV 2023)".

[Paper] [Project Page]

Usage

Following code is based on Stable Diffusion. So for more details, you can visit link. Get the checkpoints for Stable Diffusion.

gh repo clone ku-vai/TPoS

Train Audio Encoder

You can find audio_encoder/train.py to train the Audio Encoder. You need two datasets (Landscape and VGGSound). This code is based on Sound-guided Semantic Image Manipulation (CVPR2022).

You can use the following codes for training.

cd audio_encoder
python train_audio_encoder.py

Or you can simply download our pretrained weights from following link: link. Locate downloaded weights in pretrained_models.

Video Generation with Sound

When you want to test your model with image dataset, you can easily run the code with bash inference.sh. You can change the audio and text prompt.

Citation

@inproceedings{jeong2023power,
  title={The power of sound (tpos): Audio reactive video generation with stable diffusion},
  author={Jeong, Yujin and Ryoo, Wonjeong and Lee, Seunghyun and Seo, Dabin and Byeon, Wonmin and Kim, Sangpil and Kim, Jinkyu},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={7822--7832},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
audio_encoder		audio_encoder
audio_sample		audio_sample
configs		configs
ldm		ldm
README.md		README.md
img2img.py		img2img.py
inference.sh		inference.sh
txt2img.py		txt2img.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio_encoder

audio_encoder

audio_sample

audio_sample

configs

configs

ldm

ldm

README.md

README.md

img2img.py

img2img.py

inference.sh

inference.sh

txt2img.py

txt2img.py

Repository files navigation

The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion

Usage

Train Audio Encoder

Video Generation with Sound

Citation

About

Releases

Packages

Languages

ku-vai/TPoS

Folders and files

Latest commit

History

Repository files navigation

The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion

Usage

Train Audio Encoder

Video Generation with Sound

Citation

About

Resources

Stars

Watchers

Forks

Languages