-
Notifications
You must be signed in to change notification settings - Fork 6.3k
added doc for Kandinsky3.0 #5937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
The description from it's Github page: | ||
|
||
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.* | |
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russian culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.* |
cc @yiyixuxu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left one comment. Thanks for adding this!
@@ -9,7 +9,25 @@ specific language governing permissions and limitations under the License. | |||
|
|||
# Kandinsky 3 | |||
|
|||
TODO | |||
Kandinsky 3 is created by [Arkhipkin Vladimir](https://github.com/oriBetelgeuse), [Igor Pavlov](https://github.com/boomb0om), [Andrei Filatov](https://github.com/anvilarth), [Zein Shaheen](https://github.com/zeinsh). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the entire list of authors here https://github.com/ai-forever/Kandinsky-3#authors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I missed it. @yiyixuxu Thank you for pointing out :)
|
||
The description from it's Github page: | ||
|
||
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's instead talk a little bit more about the architecture here. I think mostly:
- a very power text encoder: FLAN-UL2
- special Unet blocks that's twice deeper but remain same parameter counts
- movq decoder (same as kandinsky 2)
|
||
The description from it's Github page: | ||
|
||
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, Kandinsky 3.0 incorporates more data and specifically related to Russian culture, which allows to generate pictures related to Russin culture. Furthermore, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.* | |
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.* |
The documentation is not available anymore as the PR was closed or merged. |
Great job @charchit7 |
Thanks @patrickvonplaten First PR merged to diffusers! more to go. |
* added en doc for Kandinsky3.0 * required changes * Update docs/source/en/api/pipelines/kandinsky3.md * Update docs/source/en/api/pipelines/kandinsky3.md * Update docs/source/en/api/pipelines/kandinsky3.md --------- Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
What does this PR do?
Fixes #5936
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@patrickvonplaten @yiyixuxu