Skip to content

GongyeLiu/StyleCrafter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

                 

GongyeLiu, Menghan Xia*, Yong Zhang, Haoxin Chen, Jinbo Xing,
Xintao Wang, Yujiu Yang*, Ying Shan


(* corresponding authors)

From Tsinghua University and Tencent AI Lab.

🔆 Introduction

TL;DR: We propose StyleCrafter, a generic method that enhances pre-trained T2V models with style control, supporting Style-Guided Text-to-Image Generation and Style-Guided Text-to-Video Generation.

1. ⭐⭐ Style-Guided Text-to-Video Generation.

Style-guided text-to-video results. Resolution: 320 x 512; Frames: 16. (Compressed)

2. Style-Guided Text-to-Image Generation.

Style-guided text-to-image results. Resolution: 512 x 512. (Compressed)

📝 Changelog

  • [2023.12.08]: 🔥🔥 Release the Huggingface online demo.
  • [2023.12.05]: 🔥🔥 Release the code and checkpoint.
  • [2023.11.30]: 🔥🔥 Release the project page.

⏳ TODO

  • Remove Video Watermark(due to trained on WebVid10M).

🧰 Models

Model Resolution Checkpoint
StyleCrafter 320x512 Hugging Face

It takes approximately 5 seconds to generate a 512×512 image and 85 seconds to generate a 320×512 video with 16 frames using a single NVIDIA A100 (40G) GPU. A GPU with at least 16G GPU memory is required to perform the inference process.

⚙️ Setup

conda create -n stylecrafter python=3.8.5
conda activate stylecrafter
pip install -r requirements.txt

💫 Inference

  1. Download all checkpoints according to the instructions
  2. Run the commands in terminal.
# style-guided text-to-image generation
sh scripts/run_infer_image.sh

# style-guided text-to-video generation
sh scripts/run_infer_video.sh
  1. (Optional) Infernce on your own data according to the instructions

👨‍👩‍👧‍👦 Crafter Family

VideoCrafter1: Framework for high-quality text-to-video generation.

ScaleCrafter: Tuning-free method for high-resolution image/video generation.

TaleCrafter: An interactive story visualization tool that supports multiple characters.

LongerCrafter: Tuning-free method for longer high-quality video generation.

DynamiCrafter Animate open-domain still images to high-quality videos.

📢 Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.


🙏 Acknowledgements

We would like to thank AK(@_akhaliq) for the help of setting up online demo.

📭 Contact

If your have any comments or questions, feel free to contact lgy22@mails.tsinghua.edu.cn

About

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published