[Docs] Extensive / Improved guides about "Text2Image", "Image2Image", "Inpainting"

We very often run into issues like the following https://github.com/huggingface/diffusers/issues/4392 where people don't know how to correctly use `diffusers` or are unaware of all the existing features.

I propose to write three very extensive guides or possible even write whole subsections about the main important tasks:

- **Text2Image**
- **Image2Image**
- **Inpainting** 

and how to chain those together.

That will replace the following guides:
- https://huggingface.co/docs/diffusers/using-diffusers/conditional_image_generation
- https://huggingface.co/docs/diffusers/using-diffusers/img2img\
- https://huggingface.co/docs/diffusers/using-diffusers/inpaint

Each guide should be introduced with an easy example (how to use it) and then go deeper into more advanced use cases. This means:

1. Text-to-image.
    - We explain how a very simple example works and show how different models generate different result. We could showcase the following models here:
       - SD 1.5
       - SDXL
       - Kandinsky 2.2
       - ControlNet
     
     Then we go a bit deeper into "height" & "width" to show the user how the output sizes can be changed
      Talk about `guidance_scale`, `generator`.
      Talk about how extra conditionings can be added via controlnet

      Also link to the prompt weighting and optimization docs
     We then make a transition to "Modifying existing images" inking to the next "Img2Img" and "Inpaint" sections

    => All examples here can use the `AutoPipelineForText2Image`.

2. Image-to-Image
   - We show a simple example on how it works, showcasing:
       - SD 1.5
       - SDXL
       - Kandinsky 2.2
     
     We explain how the input image can/should look like.
     We then go a bit deeper into the "strength" parameter (super important parameter!!!). We explain how the width & height is determined by the image itself.
     We then explain how img2img can be chained right after text-to-image - keeping everything in latent space.
     We explain how img2img can be used to make upscaled images sharper.
     We then explain how mulitple img2img models can be chained together for just a few steps (e.g. it's totally reasonable to use multiple differently fine-tuned SD checkpoints for image translation.
     We show how Kandinsky & Stable Diffusion can be mixed.
     We explain how to use controlnet for img2img.

    => All examples here can use the `AutoPipelineForImage2Image`.
     
3. Inpainting
     - We show a simple example on how it works, showcasing:
       - SD 1.5 inpainting
       - Kandinsky 2.2 inpainting
     
     We explain how the input & mask image can/should look like.
     We then go a bit deeper into the "strength" parameter again. We explain how the width & height is determined by the image itself.
     We then explain how inpainting can be chained right after text-to-image or image-to-image - keeping everything in latent space and without reloading the whole model.
     We explain how img2img and inpainting can be super similar (cc @yiyixuxu - we chatted about this yesterday)
     We then explain how mulitple inpainting models can be chained together for just a few steps (e.g. it's totally reasonable to use multiple differently fine-tuned SD checkpoints for image translation.
     We show how Kandinsky & Stable Diffusion can be mixed.
     We explain how to use controlnet for inpainting.

    => All examples here can use the `AutoPipelineForInpainting`.

I think it's worth to make this a really in-detail / easy-to-understand guide, make sure it works in colab and also think about creating some video content about it.

Thoughts? @pcuenca @williamberman @sayakpaul @yiyixuxu @DN6 @stevhliu @patil-suraj 

If you like the idea, maybe @stevhliu and I could look more into this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Docs] Extensive / Improved guides about "Text2Image", "Image2Image", "Inpainting" #4758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Docs] Extensive / Improved guides about "Text2Image", "Image2Image", "Inpainting" #4758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions