[Pipeline] Add TextToVideoZeroSDXLPipeline #4695

vahramtadevosyan · 2023-08-21T13:50:41Z

This pull request adds TextToVideoZeroSDXLPipeline to diffusers library.

Materials

Text2Video-Zero: https://github.com/Picsart-AI-Research/Text2Video-Zero
Paper: https://arxiv.org/abs/2303.13439

Sample Code for Inference

import torch
from diffusers import TextToVideoZeroSDXLPipeline

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = TextToVideoZeroSDXLPipeline.from_pretrained(
    model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")

prompt = "A panda dancing in Antarctica"
result = pipe(prompt=prompt).images

…text2video-zero-sdxl

vahramtadevosyan · 2023-08-22T08:58:13Z

@sayakpaul

sayakpaul · 2023-08-22T09:05:00Z

@vahramtadevosyan thanks for this PR. Could you also provide some comparative results (SD vs. SDXL) for us to see?

vahramtadevosyan · 2023-08-22T14:33:11Z

@sayakpaul please find a few comparative results of text2video-zero for SD-v1.5 (left) and SDXL (right) here. Please note that the SDXL results were generated with 1024x1024 resolution and downsampled to 512x512 to match the shape of SD-based results for concatenation. If you need the results with the original resolution as well please let me know.

sayakpaul · 2023-08-23T05:44:09Z

@vahramtadevosyan let's fix the CI issues and then let me know when the PR is ready for a review?

sayakpaul · 2023-08-23T05:48:27Z

Let me also upload some of the GIFs you provided for reference:

HuggingFaceDocBuilderDev · 2023-08-25T08:24:52Z

The documentation is not available anymore as the PR was closed or merged.

…text2video-zero-sdxl

sayakpaul · 2023-08-25T12:02:39Z

There should not be any changes to the alt diffusion scripts. Could you elaborate why is that the case?

vahramtadevosyan · 2023-08-25T12:20:34Z

@sayakpaul I made no explicit changes there. I suppose the changes to these scripts were done automatically after running make style, make quality, or make fix-copies, as the difference seems to be in the formatting only.

sayakpaul · 2023-08-25T12:23:36Z

Then maybe there's something wrong in the copied from statement you used in the pipeline implementation?

sayakpaul · 2023-08-25T12:27:21Z

tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py

+        prompt = "A panda dancing in Antarctica"
+        result = pipe(prompt=prompt, generator=generator).images
+
+        first_frame_slice = result[0, -3:, -3:, -1]


How were these outputs generated? In which setup?

tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py

sayakpaul · 2023-08-25T12:29:34Z

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero_sdxl.py

+    return warped
+
+
+def create_motion_field(motion_field_strength_x, motion_field_strength_y, frame_ids, device, dtype):


If the utility functions copied from the SD text-to-video zero implementation, then let's add "# Copied from ..." statement.

sayakpaul

Looking tight! Thanks for the contribution. While we fix the CI issue, let's also add some docs and some fast tests to the PR?

Also, let's make sure to leverage the "# Copied from ..." statement as much as possible.

vahramtadevosyan · 2023-08-25T12:33:31Z

@sayakpaul sure, thank you! Will make the requested updates and will let you know.

patrickvonplaten · 2023-08-25T18:36:41Z

Cool quality is not looking bad actually!

…ffusers into text2video-zero-sdxl

vahramtadevosyan · 2023-08-29T09:49:24Z

@sayakpaul i've pushed all the necessary updates requested.

docs/source/en/api/pipelines/text_to_video_zero.md

vahramtadevosyan · 2023-11-10T14:24:00Z

@DN6 done, thanks for the help! All checks pass now.

patrickvonplaten · 2023-11-13T18:31:03Z

tests/pipelines/text_to_video_synthesis/test_text_to_video_zero_sdxl.py

+        pass
+
+
+@nightly


Suggested change

@nightly

@slow

to begin with let's maybe try to run the tests in slow mode to capture bugs

patrickvonplaten · 2023-11-13T18:32:35Z

docs/source/en/api/pipelines/text_to_video_zero.md

+
+model_id = "stabilityai/stable-diffusion-xl-base-1.0"
+pipe = TextToVideoZeroSDXLPipeline.from_pretrained(
+    model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True


Does this work out of the box?

Hmm good catch. @vahramtadevosyan Can you confirm this works out of the box with SDXL base?

@DN6 @patrickvonplaten can you explain what do you mean by working out of the box?

patrickvonplaten

Ok for me to add like this @DN6 wdyt?

DN6 · 2023-11-14T12:27:47Z

@patrickvonplaten I'm good to merge once @vahramtadevosyan can confirm the pipeline test works with the SDXL base model

DN6 · 2023-11-15T08:55:16Z

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero_sdxl.py

+from diffusers.models import AutoencoderKL, UNet2DConditionModel
+from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipeline
+from diffusers.schedulers import KarrasDiffusionSchedulers
+from diffusers.utils import BaseOutput


Caught an issue here which was causing the test to fail on a GPU machine

Suggested change

from diffusers.utils import BaseOutput

from diffusers.utils import BaseOutput

from diffusers.utils.torch_utils import randn_tensor

DN6 · 2023-11-15T08:56:33Z

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero_sdxl.py

+            x_t1:
+                Forward process applied to x_t0 from time t0 to t1.
+        """
+        eps = torch.randn(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device)


Caught this issue which makes tests fail on GPU.

Suggested change

eps = torch.randn(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device)

eps = randn_tensor(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device)

DN6 · 2023-11-15T14:35:11Z

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero_sdxl.py

+            t0=timesteps[-t0 - 1].item(),
+            t1=timesteps[-t1 - 1].item(),


Some schedulers will return float timesteps. Cast to int to ensure that the indexing in the forward loop works.

Suggested change

t0=timesteps[-t0 - 1].item(),

t1=timesteps[-t1 - 1].item(),

t0=timesteps[-t0 - 1].to(torch.long),

t1=timesteps[-t1 - 1].to(torch.long),

DN6 · 2023-11-15T14:43:57Z

tests/pipelines/text_to_video_synthesis/test_text_to_video_zero_sdxl.py

+        pipe = self.pipeline_class.from_pretrained(
+            model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
+        ).to("cuda")


To avoid OOM errors in the slow tests

Suggested change

pipe = self.pipeline_class.from_pretrained(

model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True

).to("cuda")

pipe = self.pipeline_class.from_pretrained(

model_id, torch_dtype=torch.float16, variant="fp16", use_safetensors=True

)

pipe.enable_model_cpu_offload()

pipe.enable_vae_slicing()

DN6 · 2023-11-15T14:44:51Z

@vahramtadevosyan I ran your PR on a GPU and came across a few issues. Left some comments.

@patrickvonplaten I also verified that the pipeline works with the SDXL base checkpoint.

…eo-zero-sdxl

…ffusers into text2video-zero-sdxl

…eo-zero-sdxl

vahramtadevosyan · 2023-11-22T12:46:23Z

hey @DN6 , after making the changes you've mentioned above, some fast tests are failing, but everything seems ok for me locally. Can you please take a look?

patrickvonplaten · 2023-11-27T11:41:31Z

The failing tests seem to be very much related to the PR - could you try to fix them @vahramtadevosyan ?

…eo-zero-sdxl

vahramtadevosyan · 2023-11-28T12:52:03Z

@patrickvonplaten the issues are fixed now.

This reverts commit d63a498.

* integrated sdxl for the text2video-zero pipeline * make fix-copies * fixed CI issues * make fix-copies * added docs and `copied from` statements * added fast tests * made a small change in docs * quality+style check fix * updated docs. added controlnet inference with sdxl * added device compatibility for fast tests * fixed docstrings * changing vae upcasting * remove torch.empty_cache to speed up inference Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * made fast tests to run on dummy models only, fixed copied from statements * fixed testing utils imports * Added bullet points for SDXL support * fixed formatting & quality * Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fixed minor error for merging * fixed updates of sdxl * made fast tests inherit from `PipelineTesterMixin` and run in 3-4secs on CPU * make style && make quality * reimplemented fast tests w/o default attn processor * make style & make quality * make fix-copies * make fix-copies * fixed docs * make style & make quality & make fix-copies * bug fix in cross attention * make style && make quality * make fix-copies * fix gpu issues * make fix-copies * updated pipeline signature --------- Co-authored-by: Vahram <vahram.tadevosyan@lambda-loginnode02.cm.cluster> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

Vahram added 2 commits August 21, 2023 08:16

integrated sdxl for the text2video-zero pipeline

42d49af

Merge branch 'main' of https://github.com/huggingface/diffusers into …

7221015

…text2video-zero-sdxl

make fix-copies

4469455

Vahram added 4 commits August 25, 2023 04:23

Merge branch 'main' of https://github.com/huggingface/diffusers into …

e00f93e

…text2video-zero-sdxl

fixed CI issues

0591367

make fix-copies

14479ad

Merge branch 'main' of https://github.com/huggingface/diffusers into …

fe50922

…text2video-zero-sdxl

sayakpaul reviewed Aug 25, 2023

View reviewed changes

tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py Outdated Show resolved Hide resolved

sayakpaul reviewed Aug 25, 2023

View reviewed changes

vahramtadevosyan and others added 6 commits August 28, 2023 10:59

added docs and copied from statements

b19d32d

added fast tests

cf7aa3b

synced with original code

97aa3d2

Merge branch 'main' into text2video-zero-sdxl

c1e86e8

Merge branch 'text2video-zero-sdxl' of github.com:vahramtadevosyan/di…

c8e2838

…ffusers into text2video-zero-sdxl

made a small change in docs

71fe6cc

sayakpaul reviewed Aug 29, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.md Outdated Show resolved Hide resolved

make fix-copies

78e8a44

patrickvonplaten reviewed Nov 13, 2023

View reviewed changes

patrickvonplaten approved these changes Nov 13, 2023

View reviewed changes

DN6 approved these changes Nov 14, 2023

View reviewed changes

DN6 reviewed Nov 15, 2023

View reviewed changes

patrickvonplaten and others added 6 commits November 20, 2023 12:12

Merge branch 'main' into text2video-zero-sdxl

b22091e

Merge branch 'main' of github.com:huggingface/diffusers into text2vid…

9ee0304

…eo-zero-sdxl

Merge branch 'text2video-zero-sdxl' of github.com:vahramtadevosyan/di…

d395fa0

…ffusers into text2video-zero-sdxl

Merge branch 'main' of github.com:huggingface/diffusers into text2vid…

52461ae

…eo-zero-sdxl

fix gpu issues

2ae9893

make fix-copies

ed78680

Merge branch 'main' into text2video-zero-sdxl

3890c34

vahramtadevosyan added 3 commits November 27, 2023 07:58

Merge branch 'main' of github.com:huggingface/diffusers into text2vid…

46ac35e

…eo-zero-sdxl

Merge branch 'main' of github.com:huggingface/diffusers into text2vid…

0f2c69e

…eo-zero-sdxl

updated pipeline signature

f60409c

patrickvonplaten merged commit d63a498 into huggingface:main Nov 29, 2023

patrickvonplaten added a commit that referenced this pull request Dec 18, 2023

Revert "[Pipeline] Add TextToVideoZeroSDXLPipeline (#4695)"

43806ac

This reverts commit d63a498.

patrickvonplaten mentioned this pull request Dec 18, 2023

[Text-to-Video] Clean up pipeline #6213

Merged

		return warped


		def create_motion_field(motion_field_strength_x, motion_field_strength_y, frame_ids, device, dtype):

	from diffusers.utils import BaseOutput
	from diffusers.utils import BaseOutput
	from diffusers.utils.torch_utils import randn_tensor

	eps = torch.randn(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device)
	eps = randn_tensor(x_t0.size(), generator=generator, dtype=x_t0.dtype, device=x_t0.device)

[Pipeline] Add TextToVideoZeroSDXLPipeline #4695

[Pipeline] Add TextToVideoZeroSDXLPipeline #4695

Uh oh!

Conversation

vahramtadevosyan commented Aug 21, 2023

Materials

Sample Code for Inference

Uh oh!

vahramtadevosyan commented Aug 22, 2023

Uh oh!

sayakpaul commented Aug 22, 2023

Uh oh!

vahramtadevosyan commented Aug 22, 2023

Uh oh!

sayakpaul commented Aug 23, 2023

Uh oh!

sayakpaul commented Aug 23, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Aug 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Aug 25, 2023

Uh oh!

vahramtadevosyan commented Aug 25, 2023

Uh oh!

sayakpaul commented Aug 25, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

vahramtadevosyan commented Aug 25, 2023

Uh oh!

patrickvonplaten commented Aug 25, 2023

Uh oh!

vahramtadevosyan commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vahramtadevosyan commented Nov 10, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

DN6 commented Nov 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DN6 commented Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vahramtadevosyan commented Nov 22, 2023

Uh oh!

patrickvonplaten commented Nov 27, 2023

Uh oh!

vahramtadevosyan commented Nov 28, 2023

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 25, 2023 •

edited

Loading

vahramtadevosyan commented Aug 29, 2023 •

edited

Loading

DN6 commented Nov 15, 2023 •

edited

Loading