Skip to content

[Docs] Extensive / Improved guides about "Text2Image", "Image2Image", "Inpainting" #4758

@patrickvonplaten

Description

@patrickvonplaten

We very often run into issues like the following #4392 where people don't know how to correctly use diffusers or are unaware of all the existing features.

I propose to write three very extensive guides or possible even write whole subsections about the main important tasks:

  • Text2Image
  • Image2Image
  • Inpainting

and how to chain those together.

That will replace the following guides:

Each guide should be introduced with an easy example (how to use it) and then go deeper into more advanced use cases. This means:

  1. Text-to-image.

    • We explain how a very simple example works and show how different models generate different result. We could showcase the following models here:
      • SD 1.5
      • SDXL
      • Kandinsky 2.2
      • ControlNet

    Then we go a bit deeper into "height" & "width" to show the user how the output sizes can be changed
    Talk about guidance_scale, generator.
    Talk about how extra conditionings can be added via controlnet

    Also link to the prompt weighting and optimization docs
    We then make a transition to "Modifying existing images" inking to the next "Img2Img" and "Inpaint" sections

    => All examples here can use the AutoPipelineForText2Image.

  2. Image-to-Image

    • We show a simple example on how it works, showcasing:

      • SD 1.5
      • SDXL
      • Kandinsky 2.2

      We explain how the input image can/should look like.
      We then go a bit deeper into the "strength" parameter (super important parameter!!!). We explain how the width & height is determined by the image itself.
      We then explain how img2img can be chained right after text-to-image - keeping everything in latent space.
      We explain how img2img can be used to make upscaled images sharper.
      We then explain how mulitple img2img models can be chained together for just a few steps (e.g. it's totally reasonable to use multiple differently fine-tuned SD checkpoints for image translation.
      We show how Kandinsky & Stable Diffusion can be mixed.
      We explain how to use controlnet for img2img.

    => All examples here can use the AutoPipelineForImage2Image.

  3. Inpainting

    • We show a simple example on how it works, showcasing:
      • SD 1.5 inpainting
      • Kandinsky 2.2 inpainting

    We explain how the input & mask image can/should look like.
    We then go a bit deeper into the "strength" parameter again. We explain how the width & height is determined by the image itself.
    We then explain how inpainting can be chained right after text-to-image or image-to-image - keeping everything in latent space and without reloading the whole model.
    We explain how img2img and inpainting can be super similar (cc @yiyixuxu - we chatted about this yesterday)
    We then explain how mulitple inpainting models can be chained together for just a few steps (e.g. it's totally reasonable to use multiple differently fine-tuned SD checkpoints for image translation.
    We show how Kandinsky & Stable Diffusion can be mixed.
    We explain how to use controlnet for inpainting.

    => All examples here can use the AutoPipelineForInpainting.

I think it's worth to make this a really in-detail / easy-to-understand guide, make sure it works in colab and also think about creating some video content about it.

Thoughts? @pcuenca @williamberman @sayakpaul @yiyixuxu @DN6 @stevhliu @patil-suraj

If you like the idea, maybe @stevhliu and I could look more into this

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions