Skip to content

Conversation

charchit7
Copy link
Contributor

What does this PR do?

Fixes #5936

Before submitting

Who can review?

@patrickvonplaten @yiyixuxu


The description from it's Github page:

*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russian culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*

@patrickvonplaten
Copy link
Contributor

cc @yiyixuxu

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one comment. Thanks for adding this!

@@ -9,7 +9,25 @@ specific language governing permissions and limitations under the License.

# Kandinsky 3

TODO
Kandinsky 3 is created by [Arkhipkin Vladimir](https://github.com/oriBetelgeuse), [Igor Pavlov](https://github.com/boomb0om), [Andrei Filatov](https://github.com/anvilarth), [Zein Shaheen](https://github.com/zeinsh).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the entire list of authors here https://github.com/ai-forever/Kandinsky-3#authors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I missed it. @yiyixuxu Thank you for pointing out :)


The description from it's Github page:

*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's instead talk a little bit more about the architecture here. I think mostly:

  1. a very power text encoder: FLAN-UL2
  2. special Unet blocks that's twice deeper but remain same parameter counts
  3. movq decoder (same as kandinsky 2)


The description from it's Github page:

*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 28, 2023

The documentation is not available anymore as the PR was closed or merged.

@charchit7 charchit7 requested a review from yiyixuxu November 28, 2023 16:33
@yiyixuxu yiyixuxu mentioned this pull request Nov 29, 2023
6 tasks
@patrickvonplaten
Copy link
Contributor

Great job @charchit7

@patrickvonplaten patrickvonplaten merged commit 6031ecb into huggingface:main Nov 29, 2023
@charchit7
Copy link
Contributor Author

Thanks @patrickvonplaten First PR merged to diffusers! more to go.

@charchit7 charchit7 deleted the kd3_doc branch November 30, 2023 20:40
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* added en doc for Kandinsky3.0

* required changes

* Update docs/source/en/api/pipelines/kandinsky3.md

* Update docs/source/en/api/pipelines/kandinsky3.md

* Update docs/source/en/api/pipelines/kandinsky3.md

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[doc] add a doc page for kandinsky 3.0!
5 participants