-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Description
We very often run into issues like the following #4392 where people don't know how to correctly use diffusers or are unaware of all the existing features.
I propose to write three very extensive guides or possible even write whole subsections about the main important tasks:
- Text2Image
- Image2Image
- Inpainting
and how to chain those together.
That will replace the following guides:
- https://huggingface.co/docs/diffusers/using-diffusers/conditional_image_generation
- https://huggingface.co/docs/diffusers/using-diffusers/img2img\
- https://huggingface.co/docs/diffusers/using-diffusers/inpaint
Each guide should be introduced with an easy example (how to use it) and then go deeper into more advanced use cases. This means:
-
Text-to-image.
- We explain how a very simple example works and show how different models generate different result. We could showcase the following models here:
- SD 1.5
- SDXL
- Kandinsky 2.2
- ControlNet
Then we go a bit deeper into "height" & "width" to show the user how the output sizes can be changed
Talk aboutguidance_scale,generator.
Talk about how extra conditionings can be added via controlnetAlso link to the prompt weighting and optimization docs
We then make a transition to "Modifying existing images" inking to the next "Img2Img" and "Inpaint" sections=> All examples here can use the
AutoPipelineForText2Image. - We explain how a very simple example works and show how different models generate different result. We could showcase the following models here:
-
Image-to-Image
-
We show a simple example on how it works, showcasing:
- SD 1.5
- SDXL
- Kandinsky 2.2
We explain how the input image can/should look like.
We then go a bit deeper into the "strength" parameter (super important parameter!!!). We explain how the width & height is determined by the image itself.
We then explain how img2img can be chained right after text-to-image - keeping everything in latent space.
We explain how img2img can be used to make upscaled images sharper.
We then explain how mulitple img2img models can be chained together for just a few steps (e.g. it's totally reasonable to use multiple differently fine-tuned SD checkpoints for image translation.
We show how Kandinsky & Stable Diffusion can be mixed.
We explain how to use controlnet for img2img.
=> All examples here can use the
AutoPipelineForImage2Image. -
-
Inpainting
- We show a simple example on how it works, showcasing:
- SD 1.5 inpainting
- Kandinsky 2.2 inpainting
We explain how the input & mask image can/should look like.
We then go a bit deeper into the "strength" parameter again. We explain how the width & height is determined by the image itself.
We then explain how inpainting can be chained right after text-to-image or image-to-image - keeping everything in latent space and without reloading the whole model.
We explain how img2img and inpainting can be super similar (cc @yiyixuxu - we chatted about this yesterday)
We then explain how mulitple inpainting models can be chained together for just a few steps (e.g. it's totally reasonable to use multiple differently fine-tuned SD checkpoints for image translation.
We show how Kandinsky & Stable Diffusion can be mixed.
We explain how to use controlnet for inpainting.=> All examples here can use the
AutoPipelineForInpainting. - We show a simple example on how it works, showcasing:
I think it's worth to make this a really in-detail / easy-to-understand guide, make sure it works in colab and also think about creating some video content about it.
Thoughts? @pcuenca @williamberman @sayakpaul @yiyixuxu @DN6 @stevhliu @patil-suraj
If you like the idea, maybe @stevhliu and I could look more into this