You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/using-diffusers/kandinsky.md
+63-11Lines changed: 63 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@
4
4
5
5
The Kandinsky models are a series of multilingual text-to-image generation models. The Kandinsky 2.0 model uses two multilingual text encoders and concatenates those results for the UNet.
6
6
7
-
[Kandinsky 2.1](../api/pipelines/kandinsky) changes the architecture to include an image prior model ([`CLIP`](https://huggingface.co/docs/transformers/model_doc/clip)) to generate a mapping between text and image embeddings. The mapping embedding provides better text-image alignment are is used with the text embeddings for training, leading to higher quality results. Finally, Kandinsky 2.1 uses a [Modulating Quantized Vectors (MoVQ)](https://huggingface.co/papers/2209.09002) decoder - adds a spatial conditional normalization layer to increase photorealism - to decode the latents into images.
7
+
[Kandinsky 2.1](../api/pipelines/kandinsky) changes the architecture to include an image prior model ([`CLIP`](https://huggingface.co/docs/transformers/model_doc/clip)) to generate a mapping between text and image embeddings. The mapping provides better text-image alignment and it is used with the text embeddings during training, leading to higher quality results. Finally, Kandinsky 2.1 uses a [Modulating Quantized Vectors (MoVQ)](https://huggingface.co/papers/2209.09002) decoder - which adds a spatial conditional normalization layer to increase photorealism - to decode the latents into images.
8
8
9
-
[Kandinsky 2.2](../api/pipelines/kandinsky_v22) improves on the previous model by replacing the image encoder of the image prior model with a larger CLIP-ViT-G model to increase quality. The image prior model was also retrained on images with different resolutions and aspect ratios to generate higher-resolution images and different image sizes.
9
+
[Kandinsky 2.2](../api/pipelines/kandinsky_v22) improves on the previous model by replacing the image encoder of the image prior model with a larger CLIP-ViT-G model to improve quality. The image prior model was also retrained on images with different resolutions and aspect ratios to generate higher-resolution images and different image sizes.
10
10
11
11
This guide will show you how to use the Kandinsky models for text-to-image, image-to-image, inpainting, interpolation, and more.
12
12
@@ -19,20 +19,24 @@ Before you begin, make sure you have the following libraries installed:
19
19
20
20
<Tipwarning={true}>
21
21
22
-
Kandinsky 2.1 and 2.2 usage is very similar! The only difference is Kandinsky 2.2 no longer accepts`prompt` as an input when decoding the latents. Instead, Kandinsky 2.2 only accepts `image_embeds` during decoding.
22
+
Kandinsky 2.1 and 2.2 usage is very similar! The only difference is Kandinsky 2.2 doesn't accept`prompt` as an input when decoding the latents. Instead, Kandinsky 2.2 only accepts `image_embeds` during decoding.
23
23
24
24
</Tip>
25
25
26
26
## Text-to-image
27
27
28
28
To use the Kandinsky models for any task, you always start by setting up the prior pipeline to encode the prompt and generate the image embeddings. The prior pipeline also generates `negative_image_embeds` that correspond to the negative prompt `""`. For better results, you can pass an actual `negative_prompt` to the prior pipeline, but this'll increase the effective batch size of the prior pipeline by 2x.
29
29
30
+
31
+
<hfoptionsid="kandinsky-text-to-image">
32
+
<hfoptionid="Kandinsky 2.1">
33
+
30
34
```py
31
-
import torch
32
35
from diffusers import KandinskyPriorPipeline, KandinskyPipeline
🤗 Diffusers also provides an end-to-end API with [`KandinskyCombinedPipeline`], meaning you don't have to separately load the prior and text-to-image pipeline. The combined pipeline automatically loads both [`KandinskyPriorPipeline`] and [`KandinskyPipeline`]. You can still set different values for the prior pipeline with the `prior_guidance_scale` and `prior_num_inference_steps` parameters if you want.
57
+
<hfoption>
58
+
<hfoptionid="Kandinsky 2.2">
59
+
60
+
```py
61
+
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline
🤗 Diffusers also provides an end-to-end API with the [`KandinskyCombinedPipeline`] and [`KandinskyV22CombinedPipeline`], meaning you don't have to separately load the prior and text-to-image pipeline. The combined pipeline automatically loads both the prior model and the decoder. You can still set different values for the prior pipeline with the `prior_guidance_scale` and `prior_num_inference_steps` parameters if you want.
54
87
55
-
Use the [`AutoPipelineForText2Image`] to automatically call the [`KandinskyCombinedPipeline`] under the hood:
88
+
Use the [`AutoPipelineForText2Image`] to automatically call the combined pipelines under the hood:
0 commit comments