[@cene555][Kandinsky 3.0] Add Kandinsky 3.0 #5913

patrickvonplaten · 2023-11-23T13:25:24Z

What does this PR do?

@yiyixuxu given the limited time, I've made sure that the public API and that the weights are correctly named. I've left a lot of TODOs in the code that should be completed next week ideally.

HuggingFaceDocBuilderDev · 2023-11-23T13:42:57Z

The documentation is not available anymore as the PR was closed or merged.

src/diffusers/models/attention_processor.py

patrickvonplaten · 2023-11-24T16:22:08Z

This PR superseeds #5899 . All credit goes to the original author @cene555 . The PR was created because I couldn't adapt #5899 and given that the model has already been published, I wanted to make sure things were merged quickly. Hopefully that was ok 🙏

patrickvonplaten · 2023-11-24T16:41:39Z

TODOs for next week (cc @yiyixuxu):

Treat all the TODO statemens in the code
Add better docs
Publish on discord etc...
Add tests for img2img
Rename the pipeline and model files
Clean up all the unet blocks and get rid of hard to read code
...

BenjaminBossan

Thanks for moving quickly on this. I don't have the appropriate context knowledge to judge the implementation and how well it conforms to existing implementations, so my comments are more about details.

BenjaminBossan · 2023-11-24T17:10:34Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+    new_height = height // scale_factor**2
+    if height % scale_factor**2 != 0:


Division result and remainder can be obtained in a single operation by using divmod.

BenjaminBossan · 2023-11-24T17:14:19Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+
+    def process_embeds(self, embeddings, attention_mask, cut_context):
+        if cut_context:
+            embeddings[attention_mask == 0] = torch.zeros_like(embeddings[attention_mask == 0])


Could be made more efficient with torch.where.

BenjaminBossan · 2023-11-24T17:15:49Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+        Encodes the prompt into text encoder hidden states.
+
+        Args:
+             prompt (`str` or `List[str]`, *optional*):


One space too many

BenjaminBossan · 2023-11-24T17:16:05Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+        Args:
+             prompt (`str` or `List[str]`, *optional*):
+                prompt to be encoded
+            device: (`torch.device`, *optional*):


The order here doesn't correspond to the order of the arguments.

BenjaminBossan · 2023-11-24T17:16:32Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+        if prompt is not None and negative_prompt is not None:
+            if type(prompt) is not type(negative_prompt):
+                raise TypeError(
+                    f"`negative_prompt` should be the same type to `prompt`, but got {type(negative_prompt)} !="


"type to" => "type as", remove "="

BenjaminBossan · 2023-11-24T17:20:40Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+            num_inference_steps (`int`, *optional*, defaults to 50):
+                The number of denoising steps. More denoising steps usually lead to a higher quality image at the
+                expense of slower inference.
+            timesteps (`List[int]`, *optional*):


Argument doesn't exist.

BenjaminBossan · 2023-11-24T17:21:39Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+                The height in pixels of the generated image.
+            width (`int`, *optional*, defaults to self.unet.config.sample_size):
+                The width in pixels of the generated image.
+            eta (`float`, *optional*, defaults to 0.0):


Argument doesn't exist.

BenjaminBossan · 2023-11-24T17:22:41Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+            callback_steps (`int`, *optional*, defaults to 1):
+                The frequency at which the `callback` function will be called. If not specified, the callback will be
+                called at every step.
+            clean_caption (`bool`, *optional*, defaults to `True`):


This and next argument don't exist, but latents is missing

BenjaminBossan · 2023-11-24T17:23:59Z

src/diffusers/pipelines/kandinsky3/kandinsky3img2img_pipeline.py

+        Args:
+             prompt (`str` or `List[str]`, *optional*):
+                prompt to be encoded
+            device: (`torch.device`, *optional*):


Again, not the same order as arguments.

BenjaminBossan · 2023-11-24T17:24:51Z

src/diffusers/pipelines/kandinsky3/kandinsky3img2img_pipeline.py

+            image = [image]
+        if not all(isinstance(i, (PIL.Image.Image, torch.Tensor)) for i in image):
+            raise ValueError(
+                f"Input is in incorrect format: {[type(i) for i in image]}. Currently, we only support  PIL image and pytorch tensor"


There is a double space. [type(i) for i in image] could be a set instead to avoid potentially very long error message.

sayakpaul · 2023-11-27T05:37:26Z

scripts/convert_kandinsky3_unet.py

@@ -0,0 +1,98 @@
+#!/usr/bin/env python3
+import argparse
+import fnmatch


This one's new!

sayakpaul · 2023-11-27T05:39:45Z

src/diffusers/models/unet_kandi3.py

@@ -0,0 +1,589 @@
+import math


The naming should (read must) be changed: unet_2d_model_for_kandinsky3.py.

sayakpaul · 2023-11-27T05:44:08Z

src/diffusers/models/unet_kandi3.py

@@ -0,0 +1,589 @@
+import math
+from dataclasses import dataclass


Todo: missing licensing.

sayakpaul · 2023-11-27T05:44:49Z

tests/convert_kandinsky3_unet.py

+#!/usr/bin/env python3
+import argparse
+import fnmatch
+
+from safetensors.torch import load_file
+
+from diffusers import Kandinsky3UNet


Should be removed from here?

sayakpaul · 2023-11-27T05:49:08Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+            attention_mask = attention_mask[:, :max_seq_length]
+        return embeddings, attention_mask
+
+    @torch.no_grad()


Why are we introducing this decorator here? Apart from IF, no other pipeline does it.

sayakpaul · 2023-11-27T05:49:45Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+            negative_attention_mask = None
+        return prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask
+
+    def prepare_latents(self, shape, dtype, device, generator, latents, scheduler):


sayakpaul · 2023-11-27T05:50:57Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+    return new_height * scale_factor, new_width * scale_factor
+
+
+class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin):


TODO: documentation.

sayakpaul · 2023-11-27T05:51:39Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+                `self.processor` in
+                [diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
+        """
+        cut_context = True


Should be an arg and be defaulted to True.

sayakpaul · 2023-11-27T05:51:59Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+        latents = latents * scheduler.init_noise_sigma
+        return latents
+
+    def check_inputs(


sayakpaul · 2023-11-27T05:52:43Z

src/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py

+            if output_type not in ["pt", "np", "pil"]:
+                raise ValueError(
+                    f"Only the output types `pt`, `pil` and `np` are supported not output_type={output_type}"
+                )
+
+            if output_type in ["np", "pil"]:
+                image = image * 0.5 + 0.5
+                image = image.clamp(0, 1)
+                image = image.cpu().permute(0, 2, 3, 1).float().numpy()
+
+            if output_type == "pil":
+                image = self.numpy_to_pil(image)
+
+            if not return_dict:
+                return (image,)


It might make sense to use the image processor module here. Why are we not using it?

* finalize * finalize * finalize * add slow test * add slow test * add slow test * Fix more * add slow test * fix more * fix more * fix more * fix more * fix more * fix more * fix more * fix more * fix more * Better * Fix more * Fix more * add slow test * Add auto pipelines * add slow test * Add all * add slow test * add slow test * add slow test * add slow test * add slow test * Apply suggestions from code review * add slow test * add slow test

patrickvonplaten added 3 commits November 23, 2023 13:25

finalize

3e2ff83

finalize

69702a6

finalize

233970e

patrickvonplaten changed the title ~~finalize~~ [WIP][Kadinsky 3.0] Nov 23, 2023

patrickvonplaten added 25 commits November 23, 2023 13:56

add slow test

ed509c8

add slow test

b75ca86

add slow test

2c2dac8

Fix more

368e70e

add slow test

446a9d4

fix more

b89a387

fix more

97f621e

fix more

fe0a4ed

fix more

92711d5

fix more

22dfa36

fix more

5781f73

fix more

20c78cf

fix more

efe9a7e

fix more

fb37208

Better

37684a9

Fix more

f16f1c3

Fix more

15c6e85

add slow test

33febd4

Add auto pipelines

d810bb8

add slow test

fa33cce

Add all

3fef9ea

add slow test

18a542c

add slow test

f561034

add slow test

c3417ac

add slow test

b8c0c13

add slow test

52ec729

patrickvonplaten commented Nov 24, 2023

View reviewed changes

src/diffusers/models/attention_processor.py Outdated Show resolved Hide resolved

patrickvonplaten added 2 commits November 24, 2023 16:59

Apply suggestions from code review

6a6fd2a

add slow test

b1ca9ac

patrickvonplaten changed the title ~~[WIP][Kadinsky 3.0]~~ [Kandinsky 3.0] Add Kandinsky 3.0 Nov 24, 2023

add slow test

0e276a4

patrickvonplaten changed the title ~~[Kandinsky 3.0] Add Kandinsky 3.0~~ [@ Contributor cene555][Kandinsky 3.0] Add Kandinsky 3.0 Nov 24, 2023

patrickvonplaten changed the title ~~[@ Contributor cene555][Kandinsky 3.0] Add Kandinsky 3.0~~ [@cene555][Kandinsky 3.0] Add Kandinsky 3.0 Nov 24, 2023

patrickvonplaten merged commit b978334 into main Nov 24, 2023

BenjaminBossan reviewed Nov 24, 2023

View reviewed changes

shauray8 mentioned this pull request Nov 25, 2023

Kandinsky 3.0 #5926

Closed

2 tasks

yiyixuxu mentioned this pull request Nov 26, 2023

[doc] add a doc page for kandinsky 3.0! #5936

Closed

sayakpaul mentioned this pull request Nov 27, 2023

Kandinsky3 #5899

Closed

6 tasks

sayakpaul reviewed Nov 27, 2023

View reviewed changes

scripts/convert_kandinsky3_unet.py

@@ -0,0 +1,98 @@

#!/usr/bin/env python3

import argparse

import fnmatch

Copy link

Member

sayakpaul Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one's new!

sayakpaul reviewed Nov 27, 2023

View reviewed changes

src/diffusers/models/unet_kandi3.py

@@ -0,0 +1,589 @@

import math

from dataclasses import dataclass

Copy link

Member

sayakpaul Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: missing licensing.

sayakpaul reviewed Nov 27, 2023

View reviewed changes

yiyixuxu mentioned this pull request Nov 27, 2023

[Kandinsky 3.0] Follow-up TODOs #5944

Merged

6 tasks

kashif deleted the kandinsky_30 branch December 5, 2023 08:59

		new_height = height // scale_factor**2
		if height % scale_factor**2 != 0:

		@@ -0,0 +1,589 @@
		import math
		from dataclasses import dataclass

		return new_height * scale_factor, new_width * scale_factor


		class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin):

[@cene555][Kandinsky 3.0] Add Kandinsky 3.0 #5913

[@cene555][Kandinsky 3.0] Add Kandinsky 3.0 #5913

Uh oh!

Conversation

patrickvonplaten commented Nov 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten commented Nov 24, 2023

Uh oh!

patrickvonplaten commented Nov 24, 2023

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

patrickvonplaten commented Nov 23, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 23, 2023 •

edited

Loading