Skip to content

Conversation

vahramtadevosyan
Copy link
Contributor

This pull request adds TextToVideoZeroSDXLPipeline to diffusers library.

Materials

Sample Code for Inference

import torch
from diffusers import TextToVideoZeroSDXLPipeline

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = TextToVideoZeroSDXLPipeline.from_pretrained(
    model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")

prompt = "A panda dancing in Antarctica"
result = pipe(prompt=prompt).images

@vahramtadevosyan
Copy link
Contributor Author

@sayakpaul

@sayakpaul
Copy link
Member

@vahramtadevosyan thanks for this PR. Could you also provide some comparative results (SD vs. SDXL) for us to see?

@vahramtadevosyan
Copy link
Contributor Author

@sayakpaul please find a few comparative results of text2video-zero for SD-v1.5 (left) and SDXL (right) here. Please note that the SDXL results were generated with 1024x1024 resolution and downsampled to 512x512 to match the shape of SD-based results for concatenation. If you need the results with the original resolution as well please let me know.

@sayakpaul
Copy link
Member

@vahramtadevosyan let's fix the CI issues and then let me know when the PR is ready for a review?

@sayakpaul
Copy link
Member

Let me also upload some of the GIFs you provided for reference:

0_A_bear_dancing_on_times_square

0_A_bonded_mating_pair_of_wild_lions_in_la

0_A_ghost_dancing_in_a_haunted_mansion

0_A_horse_galloping_on_a_street

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 25, 2023

The documentation is not available anymore as the PR was closed or merged.

@sayakpaul
Copy link
Member

There should not be any changes to the alt diffusion scripts. Could you elaborate why is that the case?

@vahramtadevosyan
Copy link
Contributor Author

@sayakpaul I made no explicit changes there. I suppose the changes to these scripts were done automatically after running make style, make quality, or make fix-copies, as the difference seems to be in the formatting only.

@sayakpaul
Copy link
Member

Then maybe there's something wrong in the copied from statement you used in the pipeline implementation?

prompt = "A panda dancing in Antarctica"
result = pipe(prompt=prompt, generator=generator).images

first_frame_slice = result[0, -3:, -3:, -1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How were these outputs generated? In which setup?

return warped


def create_motion_field(motion_field_strength_x, motion_field_strength_y, frame_ids, device, dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the utility functions copied from the SD text-to-video zero implementation, then let's add "# Copied from ..." statement.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking tight! Thanks for the contribution. While we fix the CI issue, let's also add some docs and some fast tests to the PR?

Also, let's make sure to leverage the "# Copied from ..." statement as much as possible.

@vahramtadevosyan
Copy link
Contributor Author

@sayakpaul sure, thank you! Will make the requested updates and will let you know.

@patrickvonplaten
Copy link
Contributor

Cool quality is not looking bad actually!

@vahramtadevosyan
Copy link
Contributor Author

vahramtadevosyan commented Aug 29, 2023

@sayakpaul i've pushed all the necessary updates requested.

@vahramtadevosyan
Copy link
Contributor Author

@DN6 done, thanks for the help! All checks pass now.

pass


@nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@nightly
@slow

to begin with let's maybe try to run the tests in slow mode to capture bugs


model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = TextToVideoZeroSDXLPipeline.from_pretrained(
model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work out of the box?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good catch. @vahramtadevosyan Can you confirm this works out of the box with SDXL base?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DN6 @patrickvonplaten can you explain what do you mean by working out of the box?

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for me to add like this @DN6 wdyt?

@DN6
Copy link
Collaborator

DN6 commented Nov 14, 2023

@patrickvonplaten I'm good to merge once @vahramtadevosyan can confirm the pipeline test works with the SDXL base model

from diffusers.models import AutoencoderKL, UNet2DConditionModel
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipeline
from diffusers.schedulers import KarrasDiffusionSchedulers
from diffusers.utils import BaseOutput
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caught an issue here which was causing the test to fail on a GPU machine

Suggested change
from diffusers.utils import BaseOutput
from diffusers.utils import BaseOutput
from diffusers.utils.torch_utils import randn_tensor

x_t1:
Forward process applied to x_t0 from time t0 to t1.
"""
eps = torch.randn(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caught this issue which makes tests fail on GPU.

Suggested change
eps = torch.randn(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device)
eps = randn_tensor(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device)

Comment on lines 785 to 786
t0=timesteps[-t0 - 1].item(),
t1=timesteps[-t1 - 1].item(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some schedulers will return float timesteps. Cast to int to ensure that the indexing in the forward loop works.

Suggested change
t0=timesteps[-t0 - 1].item(),
t1=timesteps[-t1 - 1].item(),
t0=timesteps[-t0 - 1].to(torch.long),
t1=timesteps[-t1 - 1].to(torch.long),

Comment on lines 385 to 387
pipe = self.pipeline_class.from_pretrained(
model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid OOM errors in the slow tests

Suggested change
pipe = self.pipeline_class.from_pretrained(
model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")
pipe = self.pipeline_class.from_pretrained(
model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()

@DN6
Copy link
Collaborator

DN6 commented Nov 15, 2023

@vahramtadevosyan I ran your PR on a GPU and came across a few issues. Left some comments.

@patrickvonplaten I also verified that the pipeline works with the SDXL base checkpoint.

@vahramtadevosyan
Copy link
Contributor Author

hey @DN6 , after making the changes you've mentioned above, some fast tests are failing, but everything seems ok for me locally. Can you please take a look?

@patrickvonplaten
Copy link
Contributor

The failing tests seem to be very much related to the PR - could you try to fix them @vahramtadevosyan ?

@vahramtadevosyan
Copy link
Contributor Author

@patrickvonplaten the issues are fixed now.

@patrickvonplaten patrickvonplaten merged commit d63a498 into huggingface:main Nov 29, 2023
patrickvonplaten added a commit that referenced this pull request Dec 18, 2023
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* integrated sdxl for the text2video-zero pipeline

* make fix-copies

* fixed CI issues

* make fix-copies

* added docs and `copied from` statements

* added fast tests

* made a small change in docs

* quality+style check fix

* updated docs. added controlnet inference with sdxl

* added device compatibility for fast tests

* fixed docstrings

* changing vae upcasting

* remove torch.empty_cache to speed up inference

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* made fast tests to run on dummy models only, fixed copied from statements

* fixed testing utils imports

* Added bullet points for SDXL support

* fixed formatting & quality

* Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fixed minor error for merging

* fixed updates of sdxl

* made fast tests inherit from `PipelineTesterMixin` and run in 3-4secs on CPU

* make style && make quality

* reimplemented fast tests w/o default attn processor

* make style & make quality

* make fix-copies

* make fix-copies

* fixed docs

* make style & make quality & make fix-copies

* bug fix in cross attention

* make style && make quality

* make fix-copies

* fix gpu issues

* make fix-copies

* updated pipeline signature

---------

Co-authored-by: Vahram <vahram.tadevosyan@lambda-loginnode02.cm.cluster>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* integrated sdxl for the text2video-zero pipeline

* make fix-copies

* fixed CI issues

* make fix-copies

* added docs and `copied from` statements

* added fast tests

* made a small change in docs

* quality+style check fix

* updated docs. added controlnet inference with sdxl

* added device compatibility for fast tests

* fixed docstrings

* changing vae upcasting

* remove torch.empty_cache to speed up inference

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* made fast tests to run on dummy models only, fixed copied from statements

* fixed testing utils imports

* Added bullet points for SDXL support

* fixed formatting & quality

* Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fixed minor error for merging

* fixed updates of sdxl

* made fast tests inherit from `PipelineTesterMixin` and run in 3-4secs on CPU

* make style && make quality

* reimplemented fast tests w/o default attn processor

* make style & make quality

* make fix-copies

* make fix-copies

* fixed docs

* make style & make quality & make fix-copies

* bug fix in cross attention

* make style && make quality

* make fix-copies

* fix gpu issues

* make fix-copies

* updated pipeline signature

---------

Co-authored-by: Vahram <vahram.tadevosyan@lambda-loginnode02.cm.cluster>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants