Is it possible to run text2video-zero with less than 15GB GPU? #2898

camenduru · 2023-03-30T05:46:18Z

Describe the bug

Hi 👋 everybody, I have a question is it possible to use xformers with text2video-zero

when I add self.pipe.enable_xformers_memory_efficient_attention() I am getting a temporal consistency problem

with xformers	without xformers (needs more than 20GB vram)

code is here https://gitlab.com/camenduru/text2-video-zero/-/blob/dev/model.py

Reproduction

🧬

Logs

No response

System Info

🥔

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2023-03-31T12:10:51Z

That's very interesting! Did you train text2video-zero? Or does it just use pretrained checkpoints?

The other possibility to reduce memory would be to make use of attention slicing: https://huggingface.co/docs/diffusers/api/diffusion_pipeline#diffusers.DiffusionPipeline.enable_attention_slicing

camenduru · 2023-03-31T15:09:04Z

oops maybe this is not text2video-zero but it is inside the demo Pose Conditional https://huggingface.co/spaces/PAIR/Text2Video-Zero

only with self.pipe.enable_attention_slicing() also not working 😭

model: https://huggingface.co/plasmo/woolitize-768sd1-5/tree/main
controlnet model: https://huggingface.co/lllyasviel/sd-controlnet-openpose/tree/main

this is the colab you can test https://github.com/camenduru/text2video-zero-colab

how can I reduce the size of the model

https://huggingface.co/webui/ControlNet-modules-safetensors/tree/main

https://huggingface.co/lllyasviel/sd-controlnet-openpose/tree/main

camenduru · 2023-04-11T07:36:51Z

Hi @sayakpaul 👋 perhaps the issue is with the diffuser's core. Can you explain why this is happening or if it's expected behavior?

sayakpaul · 2023-04-11T07:49:54Z

with xformers	without xformers

Does this exist with the latest TextToVideoZeroPipeline?

The above was generated with the latest TextToVideoZeroPipeline. Here's my Colab.

Cc: @19and99

camenduru · 2023-04-11T07:55:14Z

yep same problem maybe switch with xformers and without xformers?

sayakpaul · 2023-04-11T07:58:06Z

Did not get you.

#2898 (comment) already has with and without xformers.

camenduru · 2023-04-11T08:15:10Z

the code using control net pose for consistency between frames https://gitlab.com/camenduru/text2-video-zero/-/blob/dev/model.py#L150

sayakpaul · 2023-04-11T08:47:47Z

Could you try it out from here?

camenduru · 2023-04-11T08:54:01Z

woohoo 🥳 now it is working with enable_attention_slicing() thanks ❤

camenduru · 2023-04-12T03:38:28Z

Unfortunately, this has not been solved. It just looks like it has been solved 😭 There is still a temporal consistency problem, even with the latest code.

camenduru · 2023-04-12T03:39:39Z

enable_attention_slicing() and enable_xformers_memory_efficient_attention() causing the problem

camenduru · 2023-04-12T03:43:35Z

without enable_attention_slicing() or enable_xformers_memory_efficient_attention()
https://github.com/Picsart-AI-Research/Text2Video-Zero#text-to-video-1

sayakpaul · 2023-04-12T03:44:38Z

I am still not sure why do you think it's not working.

Please provide elaborate examples as to why you do you think it's not the case.

I will also let one of the authors @19and99 comment on this.

camenduru · 2023-04-12T03:59:14Z

please fork https://github.com/Picsart-AI-Research/Text2Video-Zero and add self.pipe.enable_attention_slicing() or enable_xformers_memory_efficient_attention() here https://github.com/Picsart-AI-Research/Text2Video-Zero/blob/44b1639baed624800cb9dce43a29e2549024bc76/model.py#L261-L288

and test with free colab T4 and colab pro A100

%cd /content
!git clone  https://github.com/your_username/Text2Video-Zero
!pip install -q gradio==3.23.0 decord==0.6.0 diffusers==0.14.0 accelerate==0.17.0 safetensors==0.2.7 einops==0.6.0 transformers==4.26.0
!pip install -q torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 torchtext==0.14.1 torchdata==0.5.1 --extra-index-url https://download.pytorch.org/whl/cu116 -U
!pip install -q xformers==0.0.16 triton==2.0.0 -U
!pip install -q kornia==0.6 tomesd basicsr==1.4.2 timm==0.6.12
%cd /content/Text2Video-Zero
!python app.py --public_access

sayakpaul · 2023-04-12T04:01:19Z

Unfortunately, we can only debug stuff from diffusers and not from external libraries.

Could you replicate the issues you're facing with the text-to-video zero pipeline from diffusers as in #2898 (comment)?

camenduru · 2023-04-12T04:03:30Z

You have already replicated the problem at #2898 (comment)

camenduru · 2023-04-12T04:11:19Z

we are getting slideshow with xformers 😐 #2898 (comment)

sayakpaul · 2023-04-12T04:29:25Z

I will let @19and99 comment further.

19and99 · 2023-04-12T12:18:51Z

Thanks for the report @camenduru, @sayakpaul, @patrickvonplaten.
We are using a custom attention processor, while pipe.enable_xformers_memory_efficient_attention() resets all the attention processers to XFormersAttnProcessor, causing the consistency issue.

evancasey · 2023-04-12T15:52:00Z

Sorry if I'm missing something here but isn't a lot of the consistency stuff not enabled when you use the controlnet text-to-video (which this implementation does)?

The TextToVideoZeroPipeline does the latent warping and cross frame attention, but the controlnet text-to-video is only doing cross frame attention.

~~So it would not be expected to have full consistency as compared to the vanilla TextToVideoZeroPipeline example~~

EDIT: nevermind, it looks like cross frame attention is all you need, wow!

camenduru · 2023-04-13T03:33:12Z

thanks @19and99 ❤ is it possible to reduce the model size? #2898 (comment)

sayakpaul · 2023-04-13T03:59:55Z

We are using a custom attention processor, while pipe.enable_xformers_memory_efficient_attention() resets all the attention processers to XFormersAttnProcessor, causing the consistency issue.

Thanks for this hint, @19and99!

is it possible to reduce the model size?

@camenduru, I think for that we need to implement a corresponding xFormers attention processor. Ccing @patrickvonplaten in case he has any other suggestions.

camenduru · 2023-04-13T04:55:50Z

Hi @sayakpaul 👋 if someone has time yes it will be super cool 🔥 to implement xformers attention processor or attention_slicing attention processor

is it possible to reduce the model size?

is convert_original_controlnet_to_diffusers.py converting to fp32 or fp16 is it possible like --half pipe.to(torch_dtype=torch.float16)

and what is this

diffusers/scripts/convert_original_controlnet_to_diffusers.py

Line 29 in 46c52f9

"--original_config_file",

I tried this but I don't understand the --original_config_file 😐 please help

!pip install -q git+https://github.com/huggingface/diffusers transformers omegaconf
!git clone https://github.com/huggingface/diffusers
!wget https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_openpose.pth
!wget https://huggingface.co/lllyasviel/sd-controlnet-openpose/raw/main/config.json ?????
!python /content/diffusers/scripts/convert_original_controlnet_to_diffusers.py --checkpoint_path /content/control_sd15_openpose.pth --dump_path /content/cnet --original_config_file /content/config.json

sayakpaul · 2023-04-13T05:09:46Z

Regarding

diffusers/scripts/convert_original_controlnet_to_diffusers.py

Line 29 in 46c52f9

"--original_config_file",

It's used to convert the original ControlNet parameters to the one we have in diffusers. For the config file, @williamberman can provide more details.

Reducing the size of diffusion_pytorch_model.bin should be possible, pinging @patrickvonplaten for that.

Also, since the questions you're asking are not related to xFormers, we'd appreciate it if you could open a separate thread.

camenduru · 2023-04-13T05:18:43Z

In this thread, everything is connected to each other. In my opinion, we should continue here.

sayakpaul · 2023-04-13T05:31:05Z

The easiest options are attention slicing or xformers attention processor. These need to be implemented.

Second option would be to run the pipeline in half precision which can be done by specifying the torch_dtype argument of the pipeline to torch.float16. For running on a 15GB card, one will likely need to couple half-precision and / or attention slicing, xFormers attention processing.

camenduru · 2023-04-13T07:20:13Z

update solved: --original_config_file https://raw.githubusercontent.com/Mikubill/sd-webui-controlnet/main/models/cldm_v15.yaml also news https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main thanks to lllyasviel ❤

camenduru · 2023-04-13T08:12:26Z

I converted to diffusion_pytorch_model.bin 🥳 is it possible to convent to fp16 then save like controlnet.to(torch_dtype=torch.float16)

camenduru added the bug Something isn't working label Mar 30, 2023

camenduru closed this as completed Apr 2, 2023

camenduru reopened this Apr 10, 2023

camenduru mentioned this issue Apr 11, 2023

👾 8-bit model converter #3051

Closed

camenduru closed this as completed Apr 11, 2023

camenduru reopened this Apr 12, 2023

camenduru changed the title ~~Is it possible to use xformers with text2video-zero~~ Is it possible to run text2video-zero with less than 15GB GPU? Apr 13, 2023

camenduru closed this as completed Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to run text2video-zero with less than 15GB GPU? #2898

Is it possible to run text2video-zero with less than 15GB GPU? #2898

camenduru commented Mar 30, 2023

patrickvonplaten commented Mar 31, 2023

camenduru commented Mar 31, 2023

camenduru commented Apr 11, 2023

sayakpaul commented Apr 11, 2023

camenduru commented Apr 11, 2023

sayakpaul commented Apr 11, 2023 •

edited

camenduru commented Apr 11, 2023

sayakpaul commented Apr 11, 2023

camenduru commented Apr 11, 2023

camenduru commented Apr 12, 2023

camenduru commented Apr 12, 2023

camenduru commented Apr 12, 2023 •

edited

sayakpaul commented Apr 12, 2023 •

edited

camenduru commented Apr 12, 2023 •

edited

sayakpaul commented Apr 12, 2023

camenduru commented Apr 12, 2023 •

edited

camenduru commented Apr 12, 2023

sayakpaul commented Apr 12, 2023

19and99 commented Apr 12, 2023

evancasey commented Apr 12, 2023 •

edited

camenduru commented Apr 13, 2023

sayakpaul commented Apr 13, 2023

camenduru commented Apr 13, 2023

sayakpaul commented Apr 13, 2023

camenduru commented Apr 13, 2023

sayakpaul commented Apr 13, 2023

camenduru commented Apr 13, 2023

camenduru commented Apr 13, 2023

Is it possible to run text2video-zero with less than 15GB GPU? #2898

Is it possible to run text2video-zero with less than 15GB GPU? #2898

Comments

camenduru commented Mar 30, 2023

Describe the bug

Reproduction

Logs

System Info

patrickvonplaten commented Mar 31, 2023

camenduru commented Mar 31, 2023

camenduru commented Apr 11, 2023

sayakpaul commented Apr 11, 2023

camenduru commented Apr 11, 2023

sayakpaul commented Apr 11, 2023 • edited

camenduru commented Apr 11, 2023

sayakpaul commented Apr 11, 2023

camenduru commented Apr 11, 2023

camenduru commented Apr 12, 2023

camenduru commented Apr 12, 2023

camenduru commented Apr 12, 2023 • edited

sayakpaul commented Apr 12, 2023 • edited

camenduru commented Apr 12, 2023 • edited

sayakpaul commented Apr 12, 2023

camenduru commented Apr 12, 2023 • edited

camenduru commented Apr 12, 2023

sayakpaul commented Apr 12, 2023

19and99 commented Apr 12, 2023

evancasey commented Apr 12, 2023 • edited

camenduru commented Apr 13, 2023

sayakpaul commented Apr 13, 2023

camenduru commented Apr 13, 2023

sayakpaul commented Apr 13, 2023

camenduru commented Apr 13, 2023

sayakpaul commented Apr 13, 2023

camenduru commented Apr 13, 2023

camenduru commented Apr 13, 2023

sayakpaul commented Apr 11, 2023 •

edited

camenduru commented Apr 12, 2023 •

edited

sayakpaul commented Apr 12, 2023 •

edited

camenduru commented Apr 12, 2023 •

edited

camenduru commented Apr 12, 2023 •

edited

evancasey commented Apr 12, 2023 •

edited