New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to run text2video-zero with less than 15GB GPU? #2898
Comments
That's very interesting! Did you train text2video-zero? Or does it just use pretrained checkpoints? The other possibility to reduce memory would be to make use of attention slicing: https://huggingface.co/docs/diffusers/api/diffusion_pipeline#diffusers.DiffusionPipeline.enable_attention_slicing |
oops maybe this is not only with model: https://huggingface.co/plasmo/woolitize-768sd1-5/tree/main this is the colab you can test https://github.com/camenduru/text2video-zero-colab how can I reduce the size of the model https://huggingface.co/webui/ControlNet-modules-safetensors/tree/main https://huggingface.co/lllyasviel/sd-controlnet-openpose/tree/main |
Hi @sayakpaul 👋 perhaps the issue is with the diffuser's core. Can you explain why this is happening or if it's expected behavior? |
Does this exist with the latest The above was generated with the latest Cc: @19and99 |
yep same problem maybe switch |
Did not get you. #2898 (comment) already has with and without |
the code using control net pose for consistency between frames https://gitlab.com/camenduru/text2-video-zero/-/blob/dev/model.py#L150 |
Could you try it out from here? |
Unfortunately, this has not been solved. It just looks like it has been solved 😭 There is still a temporal consistency problem, even with the latest code. |
|
without |
I am still not sure why do you think it's not working. Please provide elaborate examples as to why you do you think it's not the case. I will also let one of the authors @19and99 comment on this. |
please fork https://github.com/Picsart-AI-Research/Text2Video-Zero and add and test with free colab T4 and colab pro A100 %cd /content
!git clone https://github.com/your_username/Text2Video-Zero
!pip install -q gradio==3.23.0 decord==0.6.0 diffusers==0.14.0 accelerate==0.17.0 safetensors==0.2.7 einops==0.6.0 transformers==4.26.0
!pip install -q torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 torchtext==0.14.1 torchdata==0.5.1 --extra-index-url https://download.pytorch.org/whl/cu116 -U
!pip install -q xformers==0.0.16 triton==2.0.0 -U
!pip install -q kornia==0.6 tomesd basicsr==1.4.2 timm==0.6.12
%cd /content/Text2Video-Zero
!python app.py --public_access |
Unfortunately, we can only debug stuff from Could you replicate the issues you're facing with the text-to-video zero pipeline from |
You have already replicated the problem at #2898 (comment) |
we are getting slideshow with xformers 😐 #2898 (comment) |
I will let @19and99 comment further. |
Thanks for the report @camenduru, @sayakpaul, @patrickvonplaten. |
Sorry if I'm missing something here but isn't a lot of the consistency stuff not enabled when you use the controlnet text-to-video (which this implementation does)? The
EDIT: nevermind, it looks like cross frame attention is all you need, wow! |
thanks @19and99 ❤ is it possible to reduce the model size? #2898 (comment) |
Thanks for this hint, @19and99!
@camenduru, I think for that we need to implement a corresponding xFormers attention processor. Ccing @patrickvonplaten in case he has any other suggestions. |
Hi @sayakpaul 👋 if someone has time yes it will be super cool 🔥 to implement is it possible to reduce the model size? is and what is this
I tried this but I don't understand the !pip install -q git+https://github.com/huggingface/diffusers transformers omegaconf
!git clone https://github.com/huggingface/diffusers
!wget https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_openpose.pth
!wget https://huggingface.co/lllyasviel/sd-controlnet-openpose/raw/main/config.json ?????
!python /content/diffusers/scripts/convert_original_controlnet_to_diffusers.py --checkpoint_path /content/control_sd15_openpose.pth --dump_path /content/cnet --original_config_file /content/config.json |
Regarding
It's used to convert the original ControlNet parameters to the one we have in Reducing the size of Also, since the questions you're asking are not related to xFormers, we'd appreciate it if you could open a separate thread. |
In this thread, everything is connected to each other. In my opinion, we should continue here. |
The easiest options are attention slicing or xformers attention processor. These need to be implemented. Second option would be to run the pipeline in half precision which can be done by specifying the |
update solved: |
Describe the bug
Hi 👋 everybody, I have a question is it possible to use
xformers
withtext2video-zero
when I add
self.pipe.enable_xformers_memory_efficient_attention()
I am getting a temporal consistency problemcode is here https://gitlab.com/camenduru/text2-video-zero/-/blob/dev/model.py
Reproduction
🧬
Logs
No response
System Info
🥔
The text was updated successfully, but these errors were encountered: