-
Notifications
You must be signed in to change notification settings - Fork 6.3k
[Pipeline] Add TextToVideoZeroSDXLPipeline #4695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pipeline] Add TextToVideoZeroSDXLPipeline #4695
Conversation
@vahramtadevosyan thanks for this PR. Could you also provide some comparative results (SD vs. SDXL) for us to see? |
@sayakpaul please find a few comparative results of text2video-zero for SD-v1.5 (left) and SDXL (right) here. Please note that the SDXL results were generated with 1024x1024 resolution and downsampled to 512x512 to match the shape of SD-based results for concatenation. If you need the results with the original resolution as well please let me know. |
@vahramtadevosyan let's fix the CI issues and then let me know when the PR is ready for a review? |
The documentation is not available anymore as the PR was closed or merged. |
…text2video-zero-sdxl
…text2video-zero-sdxl
There should not be any changes to the alt diffusion scripts. Could you elaborate why is that the case? |
@sayakpaul I made no explicit changes there. I suppose the changes to these scripts were done automatically after running |
Then maybe there's something wrong in the copied from statement you used in the pipeline implementation? |
prompt = "A panda dancing in Antarctica" | ||
result = pipe(prompt=prompt, generator=generator).images | ||
|
||
first_frame_slice = result[0, -3:, -3:, -1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How were these outputs generated? In which setup?
return warped | ||
|
||
|
||
def create_motion_field(motion_field_strength_x, motion_field_strength_y, frame_ids, device, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the utility functions copied from the SD text-to-video zero implementation, then let's add "# Copied from ..." statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking tight! Thanks for the contribution. While we fix the CI issue, let's also add some docs and some fast tests to the PR?
Also, let's make sure to leverage the "# Copied from ..." statement as much as possible.
@sayakpaul sure, thank you! Will make the requested updates and will let you know. |
Cool quality is not looking bad actually! |
…ffusers into text2video-zero-sdxl
@sayakpaul i've pushed all the necessary updates requested. |
@DN6 done, thanks for the help! All checks pass now. |
pass | ||
|
||
|
||
@nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nightly | |
@slow |
to begin with let's maybe try to run the tests in slow mode to capture bugs
|
||
model_id = "stabilityai/stable-diffusion-xl-base-1.0" | ||
pipe = TextToVideoZeroSDXLPipeline.from_pretrained( | ||
model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work out of the box?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good catch. @vahramtadevosyan Can you confirm this works out of the box with SDXL base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DN6 @patrickvonplaten can you explain what do you mean by working out of the box
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for me to add like this @DN6 wdyt?
@patrickvonplaten I'm good to merge once @vahramtadevosyan can confirm the pipeline test works with the SDXL base model |
from diffusers.models import AutoencoderKL, UNet2DConditionModel | ||
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipeline | ||
from diffusers.schedulers import KarrasDiffusionSchedulers | ||
from diffusers.utils import BaseOutput |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caught an issue here which was causing the test to fail on a GPU machine
from diffusers.utils import BaseOutput | |
from diffusers.utils import BaseOutput | |
from diffusers.utils.torch_utils import randn_tensor |
x_t1: | ||
Forward process applied to x_t0 from time t0 to t1. | ||
""" | ||
eps = torch.randn(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caught this issue which makes tests fail on GPU.
eps = torch.randn(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device) | |
eps = randn_tensor(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device) |
t0=timesteps[-t0 - 1].item(), | ||
t1=timesteps[-t1 - 1].item(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some schedulers will return float timesteps. Cast to int to ensure that the indexing in the forward loop works.
t0=timesteps[-t0 - 1].item(), | |
t1=timesteps[-t1 - 1].item(), | |
t0=timesteps[-t0 - 1].to(torch.long), | |
t1=timesteps[-t1 - 1].to(torch.long), |
pipe = self.pipeline_class.from_pretrained( | ||
model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True | ||
).to("cuda") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid OOM errors in the slow tests
pipe = self.pipeline_class.from_pretrained( | |
model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True | |
).to("cuda") | |
pipe = self.pipeline_class.from_pretrained( | |
model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True | |
) | |
pipe.enable_model_cpu_offload() | |
pipe.enable_vae_slicing() |
@vahramtadevosyan I ran your PR on a GPU and came across a few issues. Left some comments. @patrickvonplaten I also verified that the pipeline works with the SDXL base checkpoint. |
…ffusers into text2video-zero-sdxl
hey @DN6 , after making the changes you've mentioned above, some fast tests are failing, but everything seems ok for me locally. Can you please take a look? |
The failing tests seem to be very much related to the PR - could you try to fix them @vahramtadevosyan ? |
@patrickvonplaten the issues are fixed now. |
This reverts commit d63a498.
* integrated sdxl for the text2video-zero pipeline * make fix-copies * fixed CI issues * make fix-copies * added docs and `copied from` statements * added fast tests * made a small change in docs * quality+style check fix * updated docs. added controlnet inference with sdxl * added device compatibility for fast tests * fixed docstrings * changing vae upcasting * remove torch.empty_cache to speed up inference Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * made fast tests to run on dummy models only, fixed copied from statements * fixed testing utils imports * Added bullet points for SDXL support * fixed formatting & quality * Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fixed minor error for merging * fixed updates of sdxl * made fast tests inherit from `PipelineTesterMixin` and run in 3-4secs on CPU * make style && make quality * reimplemented fast tests w/o default attn processor * make style & make quality * make fix-copies * make fix-copies * fixed docs * make style & make quality & make fix-copies * bug fix in cross attention * make style && make quality * make fix-copies * fix gpu issues * make fix-copies * updated pipeline signature --------- Co-authored-by: Vahram <vahram.tadevosyan@lambda-loginnode02.cm.cluster> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* integrated sdxl for the text2video-zero pipeline * make fix-copies * fixed CI issues * make fix-copies * added docs and `copied from` statements * added fast tests * made a small change in docs * quality+style check fix * updated docs. added controlnet inference with sdxl * added device compatibility for fast tests * fixed docstrings * changing vae upcasting * remove torch.empty_cache to speed up inference Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * made fast tests to run on dummy models only, fixed copied from statements * fixed testing utils imports * Added bullet points for SDXL support * fixed formatting & quality * Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fixed minor error for merging * fixed updates of sdxl * made fast tests inherit from `PipelineTesterMixin` and run in 3-4secs on CPU * make style && make quality * reimplemented fast tests w/o default attn processor * make style & make quality * make fix-copies * make fix-copies * fixed docs * make style & make quality & make fix-copies * bug fix in cross attention * make style && make quality * make fix-copies * fix gpu issues * make fix-copies * updated pipeline signature --------- Co-authored-by: Vahram <vahram.tadevosyan@lambda-loginnode02.cm.cluster> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
This pull request adds
TextToVideoZeroSDXLPipeline
to diffusers library.Materials
Sample Code for Inference