Skip to content

Commit d74561d

Browse files
[SDXL] Improve docs (#4196)
* Improve docs * Correct docs * Add better example inpaint * make style * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
1 parent a0422ed commit d74561d

File tree

4 files changed

+97
-49
lines changed

4 files changed

+97
-49
lines changed

src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -754,11 +754,19 @@ def __call__(
754754
control_guidance_end (`float` or `List[float]`, *optional*, defaults to 1.0):
755755
The percentage of total steps at which the controlnet stops applying.
756756
original_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
757-
TODO
757+
If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled.
758+
`original_size` defaults to `(width, height)` if not specified. Part of SDXL's micro-conditioning as
759+
explained in section 2.2 of
760+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
758761
crops_coords_top_left (`Tuple[int]`, *optional*, defaults to (0, 0)):
759-
TODO
762+
`crops_coords_top_left` can be used to generate an image that appears to be "cropped" from the position
763+
`crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting
764+
`crops_coords_top_left` to (0, 0). Part of SDXL's micro-conditioning as explained in section 2.2 of
765+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
760766
target_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
761-
TODO
767+
For most cases, `target_size` should be set to the desired height and width of the generated image. If
768+
not specified it will default to `(width, height)`. Part of SDXL's micro-conditioning as explained in
769+
section 2.2 of [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
762770
Examples:
763771
764772
Returns:

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -660,11 +660,19 @@ def __call__(
660660
[Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf).
661661
Guidance rescale factor should fix overexposure when using zero terminal SNR.
662662
original_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
663-
TODO
663+
If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled.
664+
`original_size` defaults to `(width, height)` if not specified. Part of SDXL's micro-conditioning as
665+
explained in section 2.2 of
666+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
664667
crops_coords_top_left (`Tuple[int]`, *optional*, defaults to (0, 0)):
665-
TODO
668+
`crops_coords_top_left` can be used to generate an image that appears to be "cropped" from the position
669+
`crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting
670+
`crops_coords_top_left` to (0, 0). Part of SDXL's micro-conditioning as explained in section 2.2 of
671+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
666672
target_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
667-
TODO
673+
For most cases, `target_size` should be set to the desired height and width of the generated image. If
674+
not specified it will default to `(width, height)`. Part of SDXL's micro-conditioning as explained in
675+
section 2.2 of [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
668676
669677
Examples:
670678

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -780,24 +780,34 @@ def __call__(
780780
[Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf).
781781
Guidance rescale factor should fix overexposure when using zero terminal SNR.
782782
original_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
783-
TODO
783+
If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled.
784+
`original_size` defaults to `(width, height)` if not specified. Part of SDXL's micro-conditioning as
785+
explained in section 2.2 of
786+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
784787
crops_coords_top_left (`Tuple[int]`, *optional*, defaults to (0, 0)):
785-
TODO
788+
`crops_coords_top_left` can be used to generate an image that appears to be "cropped" from the position
789+
`crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting
790+
`crops_coords_top_left` to (0, 0). Part of SDXL's micro-conditioning as explained in section 2.2 of
791+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
786792
target_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
787-
TODO
793+
For most cases, `target_size` should be set to the desired height and width of the generated image. If
794+
not specified it will default to `(width, height)`. Part of SDXL's micro-conditioning as explained in
795+
section 2.2 of [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
788796
aesthetic_score (`float`, *optional*, defaults to 6.0):
789-
TODO
797+
Used to simulate an aesthetic score of the generated image by influencing the positive text condition.
798+
Part of SDXL's micro-conditioning as explained in section 2.2 of
799+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
790800
negative_aesthetic_score (`float`, *optional*, defaults to 2.5):
791-
TDOO
801+
Part of SDXL's micro-conditioning as explained in section 2.2 of
802+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). Can be used to
803+
simulate an aesthetic score of the generated image by influencing the negative text condition.
792804
793805
Examples:
794806
795807
Returns:
796808
[`~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput`] or `tuple`:
797809
[`~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput`] if `return_dict` is True, otherwise a
798-
`tuple. When returning a tuple, the first element is a list with the generated images, and the second
799-
element is a list of `bool`s denoting whether the corresponding generated image likely represents
800-
"not-safe-for-work" (nsfw) content, according to the `safety_checker`.
810+
`tuple. When returning a tuple, the first element is a list with the generated images.
801811
"""
802812
# 1. Check inputs. Raise error if not correct
803813
self.check_inputs(

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

Lines changed: 57 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
XFormersAttnProcessor,
3131
)
3232
from ...schedulers import KarrasDiffusionSchedulers
33-
from ...utils import is_accelerate_available, is_accelerate_version, logging, randn_tensor
33+
from ...utils import is_accelerate_available, is_accelerate_version, logging, randn_tensor, replace_example_docstring
3434
from ..pipeline_utils import DiffusionPipeline
3535
from . import StableDiffusionXLPipelineOutput
3636
from .watermark import StableDiffusionXLWatermarker
@@ -39,6 +39,35 @@
3939
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
4040

4141

42+
EXAMPLE_DOC_STRING = """
43+
Examples:
44+
```py
45+
>>> import torch
46+
>>> from diffusers import StableDiffusionXLInpaintPipeline
47+
>>> from diffusers.utils import load_image
48+
49+
>>> pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
50+
... "stabilityai/stable-diffusion-xl-base-0.9",
51+
... torch_dtype=torch.float16,
52+
... variant="fp16",
53+
... use_safetensors=True,
54+
... )
55+
>>> pipe.to("cuda")
56+
57+
>>> img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
58+
>>> mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
59+
60+
>>> init_image = load_image(img_url).convert("RGB")
61+
>>> mask_image = load_image(mask_url).convert("RGB")
62+
63+
>>> prompt = "A majestic tiger sitting on a bench"
64+
>>> image = pipe(
65+
... prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=50, strength=0.80
66+
... ).images[0]
67+
```
68+
"""
69+
70+
4271
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg
4372
def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
4473
"""
@@ -810,6 +839,7 @@ def upcast_vae(self):
810839
self.vae.decoder.mid_block.to(dtype)
811840

812841
@torch.no_grad()
842+
@replace_example_docstring(EXAMPLE_DOC_STRING)
813843
def __call__(
814844
self,
815845
prompt: Union[str, List[str]] = None,
@@ -948,43 +978,35 @@ def __call__(
948978
A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
949979
`self.processor` in
950980
[diffusers.cross_attention](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py).
951-
Examples:
981+
original_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
982+
If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled.
983+
`original_size` defaults to `(width, height)` if not specified. Part of SDXL's micro-conditioning as
984+
explained in section 2.2 of
985+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
986+
crops_coords_top_left (`Tuple[int]`, *optional*, defaults to (0, 0)):
987+
`crops_coords_top_left` can be used to generate an image that appears to be "cropped" from the position
988+
`crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting
989+
`crops_coords_top_left` to (0, 0). Part of SDXL's micro-conditioning as explained in section 2.2 of
990+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
991+
target_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
992+
For most cases, `target_size` should be set to the desired height and width of the generated image. If
993+
not specified it will default to `(width, height)`. Part of SDXL's micro-conditioning as explained in
994+
section 2.2 of [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
995+
aesthetic_score (`float`, *optional*, defaults to 6.0):
996+
Used to simulate an aesthetic score of the generated image by influencing the positive text condition.
997+
Part of SDXL's micro-conditioning as explained in section 2.2 of
998+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
999+
negative_aesthetic_score (`float`, *optional*, defaults to 2.5):
1000+
Part of SDXL's micro-conditioning as explained in section 2.2 of
1001+
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). Can be used to
1002+
simulate an aesthetic score of the generated image by influencing the negative text condition.
9521003
953-
```py
954-
>>> import PIL
955-
>>> import requests
956-
>>> import torch
957-
>>> from io import BytesIO
958-
959-
>>> from diffusers import StableDiffusionInpaintPipeline
960-
961-
962-
>>> def download_image(url):
963-
... response = requests.get(url)
964-
... return PIL.Image.open(BytesIO(response.content)).convert("RGB")
965-
966-
967-
>>> img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
968-
>>> mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
969-
970-
>>> init_image = download_image(img_url).resize((512, 512))
971-
>>> mask_image = download_image(mask_url).resize((512, 512))
972-
973-
>>> pipe = StableDiffusionInpaintPipeline.from_pretrained(
974-
... "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16
975-
... )
976-
>>> pipe = pipe.to("cuda")
977-
978-
>>> prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
979-
>>> image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
980-
```
1004+
Examples:
9811005
9821006
Returns:
983-
[`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`:
984-
[`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] if `return_dict` is True, otherwise a `tuple.
985-
When returning a tuple, the first element is a list with the generated images, and the second element is a
986-
list of `bool`s denoting whether the corresponding generated image likely represents "not-safe-for-work"
987-
(nsfw) content, according to the `safety_checker`.
1007+
[`~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput`] or `tuple`:
1008+
[`~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput`] if `return_dict` is True, otherwise a
1009+
`tuple. `tuple. When returning a tuple, the first element is a list with the generated images.
9881010
"""
9891011
# 0. Default height and width to unet
9901012
height = height or self.unet.config.sample_size * self.vae_scale_factor

0 commit comments

Comments
 (0)