GitHub - invictus717/InteractiveVideo: InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

🔆 Introduction

InteractiveVideo is a user-centric framework for interactive video generation. It highlights the contributions of comprehensive editing by users' intuitive manipulation, and it performs high-quality regional content control and precise motion control. We would like to introduce features as follows:

1. Personalize A Video


"Purple Flowers."	"Purple Flowers, bee"	"the purple flowers are shaking, a bee is flying"


"1 Cat."	"1 Cat, butterfly"	"the small yellow butterfly is flying to the cat's face"

2. Fine-grained Video Editing


"flowers."	"flowers."	"windy, the flowers are shaking in the wind"


"1 Man."	"1 Man, rose."	"1 Man, smiling."

3. Powerful Motion Control

InteractiveVideo can perform precise motion control.


"1 man, dark light "	"the man is turning his body"	"the man is turning his body"


"1 beautiful girl with long black hair, and a flower on her head, clouds"	" the girl is turning gradually"	" the girl is turning gradually"

4. Characters Dressing up

InteractiveVideo can smoothly cooperate with LoRAs and DreamBooth, thus, there are many potential functions of this framework that are still under-explored.


"Yae Miko" (Genshin Impact)	"Dressing Up "	"Dressing Up"

⚙️ Quick Start

1. Install Environment via Anaconda

# create a conda environment
conda create -n ivideo python=3.10
conda activate ivideo

# install requirements
pip install -r requirements.txt

2. Prepare Checkpoints

You can simply use the following script to download checkpoints

python scripts/download_models.py

This will take a long time, you can also selectively download checkpoints by modifying "scripts/download_models.py" and "scripts/*.json". Please make sure that there is at least one checkpoint left for each JSON file. Moreover, all checkpoints are listed as follows

Checkpoints for enjoying image-to-image generation

Models	Types	Version	Checkpoints
StableDiffusion	-	v1.5	Huggingface
StableDiffusion	-	turbo	Huggingface
KoHaKu	Animation	v2.1	Huggingface
LCM-LoRA-StableDiffusion	-	v1.5	Huggingface
LCM-LoRA-StableDiffusion	-	xl	Huggingface

Checkpoints for enjoying image-to-video generation

Models	Types	Version	Checkpoints
StableDiffusion	-	v1.5	Huggingface
PIA (UNet)	-	-	Huggingface
Dreambooth	MagicMixRealistic	v5	Civitai
Dreambooth	RCNZCartoon3d	v10	Civitai
Dreambooth	RealisticVision	-	Huggingface

Checkpoints for enjoying dragging images.

Models	Types	Resolution	Checkpoints
StyleGAN-2	Lions	512 x 512	Google Storage
StyleGAN-2	Dogs	1024 x 1024	Google Storage
StyleGAN-2	Horses	256 x 256	Google Storage
StyleGAN-2	Elephants	512 x 512	Google Storage
StyleGAN-2	Face (FFHQ)	512 x 512	NGC
StyleGAN-2	Cat Face (AFHQ)	512 x 512	NGC
StyleGAN-2	Car	512 x 512	CloudFront
StyleGAN-2	Cat	512 x 512	CloudFront
StyleGAN-2	Landmark (LHQ)	256 x 256	Google Drive

Also, you can train and try your customized models. You should put your model into the "checkpoints" folder, which is organized as follows

InteractiveVideo  # project
|----checkpoints
|----|----drag  # Drag
|----|----|----stylegan2_elephants_512_pytorch.pkl
|----|----i2i  # Image-2-Image
|----|----|----lora
|----|----|----|----lcm-lora-sdv1-5.safetensors
|----|----i2v  # Image-to-Video
|----|----|----unet
|----|----|----|----pia.ckpt
|----|----|----dreambooth
|----|----|----|----realisticVisionV51_v51VAE.safetensors
|----|----diffusion_body
|----|----|----stable-diffusion-v1-5
|----|----|----kohahu-v2-1
|----|----|----sd-turbo

💫 Usage

1. Local demo

To run a local demo, use the following command (recommended)

  python demo/main.py

You can also run our web demo locally with

  python demo/main_gradio.py

In the following, we provide some instructions for a quick start.

2. Image-to-Image Generation

Input image-to-image text prompts, and click the "Confirm Text" button. The generation is real-time.

3. Image-to-Video Generation

Input image-to-video text prompts, and click the "Confirm Text" button. Then click the "Generate Video" button and wait for seconds.

The generated video might not be satisfactory, but you can properly customize the video with multi-modal instructions. For example, draw butterflies to help the model know the location of them.

4. Drag Image

You can also drag images. First, you should choose a proper checkpoint in the "Drag Image" tab and click the "Drag Mode On" button. It will take a few minutes to prepare. Then you can draw masks, add points, and click the "start" button. Once the result is satisfactory, click the "stop" button.

😉 Citation

If the code and paper help your research, please kindly cite:

@article{zhang2024interactivevideo,
      title={InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions}, 
      author={Zhang, Yiyuan and Kang, Yuhao and Zhang, Zhixin and Ding, Xiaohan and Zhao, Sanyuan and Yue, Xiangyu},
      year={2024},
      eprint={2402.03040},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

🤗 Acknowledgements

Our codebase builds on Stable Diffusion, StreamDiffusion, DragGAN, PTI, and PIA. Thanks the authors for sharing their awesome codebases!

📢 Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
demo		demo
models		models
samples		samples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

invictus717/InteractiveVideo

Folders and files

Latest commit

History

Repository files navigation