From 5792c15da0b19a9facc9aa385959af6702d0bd29 Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Wed, 28 Feb 2024 15:13:00 -0800 Subject: [PATCH 1/3] tips --- docs/source/en/_toctree.yml | 2 + docs/source/en/using-diffusers/tips.md | 167 +++++++++++++++++++++++++ 2 files changed, 169 insertions(+) create mode 100644 docs/source/en/using-diffusers/tips.md diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index ba94de59219c..4995fcce7414 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -74,6 +74,8 @@ title: Prompt weighting - local: using-diffusers/freeu title: Improve generation quality with FreeU + - local: using-diffusers/tips + title: Community tips and tricks title: Techniques - sections: - local: using-diffusers/pipeline_overview diff --git a/docs/source/en/using-diffusers/tips.md b/docs/source/en/using-diffusers/tips.md new file mode 100644 index 000000000000..2b5907198945 --- /dev/null +++ b/docs/source/en/using-diffusers/tips.md @@ -0,0 +1,167 @@ + + +# Community tips and tricks + +Diffusers owes much of its success to its community of users and contributors. ❤️ This guide is a collection of tips and tricks for using Diffusers shared by community members. It includes helpful advice such as how to customize and implement specific features through callbacks and how to generate high-quality images. + +If you have a tip or trick you'd like to share, we'd love to [hear from you](https://github.com/huggingface/diffusers/issues/new/choose)! + +## Callback to display image after each generation step + +> [!TIP] +> This tip was contributed by [asomoza](https://github.com/asomoza). + +Display an image after each generation step by using a [callback](../using-diffusers/callback) to access and manipulate the latents after each step and convert them into an image. + +1. Use the function below to convert the SDXL latents (4 channels) to RGB tensors (3 channels) as explained in the [Explaining the SDXL latent space](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space) blog post: + +```py +def latents_to_rgb(latents): + weights = ( + (60, -60, 25, -70), + (60, -5, 15, -50), + (60, 10, -5, -35) + ) + + weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device)) + biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device) + rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1) + image_array = rgb_tensor.clamp(0, 255)[0].byte().cpu().numpy() + image_array = image_array.transpose(1, 2, 0) + + return Image.fromarray(image_array) +``` + +2. Create a function to decode and save the latents into an image. + +```py +def decode_tensors(pipe, step, timestep, callback_kwargs): + latents = callback_kwargs["latents"] + + image = latents_to_rgb(latents) + image.save(f"{step}.png") + + return callback_kwargs +``` + +3. Pass the `decode_tensors` function to the `callback_on_step_end` parameter to decode the tensors after each step. You need to also specify what you want to modify in the `callback_on_step_end_tensor_inputs` parameter, which in this case are the latents. + +```py +from diffusers import AutoPipelineForText2Image +import torch +from PIL import Image + +pipeline = AutoPipelineForText2Image.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True +).to("cuda") + +image = pipe( + prompt = "A croissant shaped like a cute bear." + negative_prompt = "Deformed, ugly, bad anatomy" + callback_on_step_end=decode_tensors, + callback_on_step_end_tensor_inputs=["latents"], +).images[0] +``` + +> [!TIP] +> The latent space is compressed to 128x128 so the images are also 128x128 which is useful for a quick preview. + +
+
+ +
step 0
+
+
+ +
step 19 +
+
+
+ +
step 29
+
+
+ +
step 39
+
+
+ +
step 49
+
+
+ +## High quality anime images + +> [!TIP] +> This tip was contributed by [asomoza](https://github.com/asomoza). + +Generating high-quality anime images is a popular application of diffusion models. To achieve this in Diffusers: + +1. Choose a good anime model like [Counterfeit](https://hf.co/gsdf/Counterfeit-V3.0) and pair it with negative prompt embeddings such as EasyNegative to further improve the quality of the generated images. + +```py +from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler +import torch + +pipeline = StableDiffusionPipeline.from_single_file( + "https://huggingface.co/gsdf/Counterfeit-V3.0/blob/main/Counterfeit-V3.0_fix_fp16.safetensors", + torch_dtype=torch.float16, +) +pipeline.load_textual_inversion( + "embed/EasyNegative", + weight_name="EasyNegative.safetensors", + token="EasyNegative" +) +``` + +2. This is optional, but if there is a specific style (typically a LoRA adapter) you want to apply to the images, download the weights and use the [`load_lora_weights`] method to add it to the pipeline. This example uses the [Dungeon Meshi Marcille Character Lora](https://civitai.com/models/106199/dungeon-meshi-marcille-character-lora). + +```py +!wget https://civitai.com/api/download/models/114049 -O marcille.safetensors +pipeline.load_lora_weights('.', weight_name="marcille.safetensors") +``` + +3. Load a scheduler and set `use_karras_sigmas=True` to use the DPM++ 2M Karras scheduler (take a look at this [scheduler table](../api/schedulers/overview.) to find the A1111 equivalent scheduler in Diffusers). + +```py +pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) +pipeline.scheduler.config.use_karras_sigmas=True +pipeline.to('cuda') +``` + +4. Create your prompt and negative prompts, and remember to use the trigger words for this specific LoRA adapter (`dmarci`) and embeddings (`EasyNegative`). It is also important to set the: + + - `lora_scale` parameter to control how to scale the output with the LoRA weights. + - `clip_skip` parameter to specify the layers of the CLIP model to use. This parameter is especially important for anime checkpoints because it controls how closely aligned the text prompt and image are. A higher `clip_skip` value produces more abstract images. + +```py +generator = torch.Generator("cpu").manual_seed(0) + +prompt = "dmarci, masterpiece, best quality, 1girl, solo, marcillessa, red choker, detailed and beautiful eyes, (cowboy shot:1.2), HAPPY, walking, jumping,(Turtleneck_sweater:1.4), (Leather_skirt:1.3)" +negative_prompt = "EasyNegative, (worst quality, low quality, bad quality, normal quality:2), logo, text, blurry, low quality, bad anatomy, lowres, normal quality, monochrome, grayscale, worstquality, signature, watermark, cropped, bad proportions, out of focus, username, bad body, long body, (fat:1.2), long neck, deformed, mutated, mutation, ugly, disfigured, poorly drawn face, skin blemishes, skin spots, acnes, missing limb, malformed limbs, floating limbs, disconnected limbs, extra limb, extra arms, mutated hands, poorly drawn hands, malformed hands, mutated hands and fingers, bad hands, missing fingers, fused fingers, too many fingers, extra legs, bad feet, backlighting" + +lora_scale = 1.0 +images = pipeline(prompt, width=768, height=768, negative_prompt=negative_prompt, num_inference_steps=20, cross_attention_kwargs={"scale": lora_scale}, generator=generator, num_images_per_prompt=4, clip_skip=2, guidance_scale=7).images[0] +images +``` + +
+ +
+ +## Increase image details with negative noise + +> [!TIP] +> This tip was contributed by [asomoza](https://github.com/asomoza). + +Negative noise can increase the level of details in the generated image because it allows the model more "creative freedom". You can pass a noisy image created from the original image or a noise algorithms to the model. \ No newline at end of file From 1e93fb40d5697aecb66f06f16e1ad31f1ce6436a Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Thu, 29 Feb 2024 13:10:42 -0800 Subject: [PATCH 2/3] feedback --- docs/source/en/using-diffusers/tips.md | 33 ++++++++++---------------- 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/docs/source/en/using-diffusers/tips.md b/docs/source/en/using-diffusers/tips.md index 2b5907198945..2dd837a58360 100644 --- a/docs/source/en/using-diffusers/tips.md +++ b/docs/source/en/using-diffusers/tips.md @@ -23,7 +23,10 @@ If you have a tip or trick you'd like to share, we'd love to [hear from you](htt Display an image after each generation step by using a [callback](../using-diffusers/callback) to access and manipulate the latents after each step and convert them into an image. -1. Use the function below to convert the SDXL latents (4 channels) to RGB tensors (3 channels) as explained in the [Explaining the SDXL latent space](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space) blog post: +1. Use the function below to convert the SDXL latents (4 channels) to RGB tensors (3 channels) as explained in the [Explaining the SDXL latent space](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space) blog post. + +> [!TIP] +> The latent space is compressed to 128x128 so the images are also 128x128 which is useful for a quick preview. ```py def latents_to_rgb(latents): @@ -54,7 +57,7 @@ def decode_tensors(pipe, step, timestep, callback_kwargs): return callback_kwargs ``` -3. Pass the `decode_tensors` function to the `callback_on_step_end` parameter to decode the tensors after each step. You need to also specify what you want to modify in the `callback_on_step_end_tensor_inputs` parameter, which in this case are the latents. +3. Pass the `decode_tensors` function to the `callback_on_step_end` parameter to decode the tensors after each step. You also need to specify what you want to modify in the `callback_on_step_end_tensor_inputs` parameter, which in this case are the latents. ```py from diffusers import AutoPipelineForText2Image @@ -73,9 +76,6 @@ image = pipe( ).images[0] ``` -> [!TIP] -> The latent space is compressed to 128x128 so the images are also 128x128 which is useful for a quick preview. -
@@ -105,9 +105,9 @@ image = pipe( > [!TIP] > This tip was contributed by [asomoza](https://github.com/asomoza). -Generating high-quality anime images is a popular application of diffusion models. To achieve this in Diffusers: +Generating high-quality anime images is a very popular application of diffusion models. To achieve this in Diffusers: -1. Choose a good anime model like [Counterfeit](https://hf.co/gsdf/Counterfeit-V3.0) and pair it with negative prompt embeddings such as EasyNegative to further improve the quality of the generated images. +1. Choose a good anime model like [Counterfeit](https://hf.co/gsdf/Counterfeit-V3.0) and pair it with negative prompt embeddings such as [EasyNegative](https://huggingface.co/embed/EasyNegative) to further improve the quality of the generated images. ```py from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler @@ -124,7 +124,7 @@ pipeline.load_textual_inversion( ) ``` -2. This is optional, but if there is a specific style (typically a LoRA adapter) you want to apply to the images, download the weights and use the [`load_lora_weights`] method to add it to the pipeline. This example uses the [Dungeon Meshi Marcille Character Lora](https://civitai.com/models/106199/dungeon-meshi-marcille-character-lora). +2. Download the weights (typically a LoRA adapter) of a specific style to apply to the image and use the [`load_lora_weights`] method to add it to the pipeline. This example uses the [Dungeon Meshi Marcille Character Lora](https://civitai.com/models/106199/dungeon-meshi-marcille-character-lora). ```py !wget https://civitai.com/api/download/models/114049 -O marcille.safetensors @@ -136,19 +136,19 @@ pipeline.load_lora_weights('.', weight_name="marcille.safetensors") ```py pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) pipeline.scheduler.config.use_karras_sigmas=True -pipeline.to('cuda') +pipeline.to("cuda") ``` 4. Create your prompt and negative prompts, and remember to use the trigger words for this specific LoRA adapter (`dmarci`) and embeddings (`EasyNegative`). It is also important to set the: - - `lora_scale` parameter to control how to scale the output with the LoRA weights. + - `lora_scale` parameter to control how much to scale the output with the LoRA weights by. - `clip_skip` parameter to specify the layers of the CLIP model to use. This parameter is especially important for anime checkpoints because it controls how closely aligned the text prompt and image are. A higher `clip_skip` value produces more abstract images. ```py -generator = torch.Generator("cpu").manual_seed(0) +generator = torch.Generator("cpu").manual_seed(77) -prompt = "dmarci, masterpiece, best quality, 1girl, solo, marcillessa, red choker, detailed and beautiful eyes, (cowboy shot:1.2), HAPPY, walking, jumping,(Turtleneck_sweater:1.4), (Leather_skirt:1.3)" -negative_prompt = "EasyNegative, (worst quality, low quality, bad quality, normal quality:2), logo, text, blurry, low quality, bad anatomy, lowres, normal quality, monochrome, grayscale, worstquality, signature, watermark, cropped, bad proportions, out of focus, username, bad body, long body, (fat:1.2), long neck, deformed, mutated, mutation, ugly, disfigured, poorly drawn face, skin blemishes, skin spots, acnes, missing limb, malformed limbs, floating limbs, disconnected limbs, extra limb, extra arms, mutated hands, poorly drawn hands, malformed hands, mutated hands and fingers, bad hands, missing fingers, fused fingers, too many fingers, extra legs, bad feet, backlighting" +prompt = "dmarci, masterpiece, best quality, 1girl, solo, marcillessa, red choker, detailed and beautiful eyes, cowboy shot, HAPPY, walking, jumping, turtleneck sweater, leather skirt" +negative_prompt = "EasyNegative, worst quality, low quality, bad quality, blurry, bad anatomy, lowres, monochrome, grayscale, bad proportions, out of focus, bad body, deformed, mutated, mutation, ugly, disfigured, poorly drawn face, skin spots, malformed limbs, extra arms, mutated hands and fingers, fused fingers, bad fingers, too many fingers" lora_scale = 1.0 images = pipeline(prompt, width=768, height=768, negative_prompt=negative_prompt, num_inference_steps=20, cross_attention_kwargs={"scale": lora_scale}, generator=generator, num_images_per_prompt=4, clip_skip=2, guidance_scale=7).images[0] @@ -158,10 +158,3 @@ images
- -## Increase image details with negative noise - -> [!TIP] -> This tip was contributed by [asomoza](https://github.com/asomoza). - -Negative noise can increase the level of details in the generated image because it allows the model more "creative freedom". You can pass a noisy image created from the original image or a noise algorithms to the model. \ No newline at end of file From f589b0d3d18da40a1c75c83fe8e401cb4c49a3f1 Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Thu, 7 Mar 2024 14:09:23 -0800 Subject: [PATCH 3/3] callback only --- docs/source/en/_toctree.yml | 2 - docs/source/en/using-diffusers/callback.md | 143 ++++++++++++++---- docs/source/en/using-diffusers/tips.md | 160 --------------------- 3 files changed, 112 insertions(+), 193 deletions(-) delete mode 100644 docs/source/en/using-diffusers/tips.md diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 4995fcce7414..ba94de59219c 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -74,8 +74,6 @@ title: Prompt weighting - local: using-diffusers/freeu title: Improve generation quality with FreeU - - local: using-diffusers/tips - title: Community tips and tricks title: Techniques - sections: - local: using-diffusers/pipeline_overview diff --git a/docs/source/en/using-diffusers/callback.md b/docs/source/en/using-diffusers/callback.md index 9059930251f1..296245c3abe2 100644 --- a/docs/source/en/using-diffusers/callback.md +++ b/docs/source/en/using-diffusers/callback.md @@ -12,13 +12,18 @@ specific language governing permissions and limitations under the License. # Pipeline callbacks -The denoising loop of a pipeline can be modified with custom defined functions using the `callback_on_step_end` parameter. This can be really useful for *dynamically* adjusting certain pipeline attributes, or modifying tensor variables. The flexibility of callbacks opens up some interesting use-cases such as changing the prompt embeddings at each timestep, assigning different weights to the prompt embeddings, and editing the guidance scale. +The denoising loop of a pipeline can be modified with custom defined functions using the `callback_on_step_end` parameter. The callback function is executed at the end of each step, and modifies the pipeline attributes and variables for the next step. This is really useful for *dynamically* adjusting certain pipeline attributes or modifying tensor variables. This versatility allows for interesting use-cases such as changing the prompt embeddings at each timestep, assigning different weights to the prompt embeddings, and editing the guidance scale. With callbacks, you can implement new features without modifying the underlying code! -This guide will show you how to use the `callback_on_step_end` parameter to disable classifier-free guidance (CFG) after 40% of the inference steps to save compute with minimal cost to performance. +> [!TIP] +> 🤗 Diffusers currently only supports `callback_on_step_end`, but feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) if you have a cool use-case and require a callback function with a different execution point! -The callback function should have the following arguments: +This guide will demonstrate how callbacks work by a few features you can implement with them. -* `pipe` (or the pipeline instance) provides access to useful properties such as `num_timesteps` and `guidance_scale`. You can modify these properties by updating the underlying attributes. For this example, you'll disable CFG by setting `pipe._guidance_scale=0.0`. +## Dynamic classifier-free guidance + +Dynamic classifier-free guidance (CFG) is a feature that allows you to disable CFG after a certain number of inference steps which can help you save compute with minimal cost to performance. The callback function for this should have the following arguments: + +* `pipeline` (or the pipeline instance) provides access to important properties such as `num_timesteps` and `guidance_scale`. You can modify these properties by updating the underlying attributes. For this example, you'll disable CFG by setting `pipeline._guidance_scale=0.0`. * `step_index` and `timestep` tell you where you are in the denoising loop. Use `step_index` to turn off CFG after reaching 40% of `num_timesteps`. * `callback_kwargs` is a dict that contains tensor variables you can modify during the denoising loop. It only includes variables specified in the `callback_on_step_end_tensor_inputs` argument, which is passed to the pipeline's `__call__` method. Different pipelines may use different sets of variables, so please check a pipeline's `_callback_tensor_inputs` attribute for the list of variables you can modify. Some common variables include `latents` and `prompt_embeds`. For this function, change the batch size of `prompt_embeds` after setting `guidance_scale=0.0` in order for it to work properly. @@ -27,12 +32,12 @@ Your callback function should look something like this: ```python def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs): # adjust the batch_size of prompt_embeds according to guidance_scale - if step_index == int(pipe.num_timesteps * 0.4): + if step_index == int(pipeline.num_timesteps * 0.4): prompt_embeds = callback_kwargs["prompt_embeds"] prompt_embeds = prompt_embeds.chunk(2)[-1] # update guidance_scale and prompt_embeds - pipe._guidance_scale = 0.0 + pipeline._guidance_scale = 0.0 callback_kwargs["prompt_embeds"] = prompt_embeds return callback_kwargs ``` @@ -43,58 +48,134 @@ Now, you can pass the callback function to the `callback_on_step_end` parameter import torch from diffusers import StableDiffusionPipeline -pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) -pipe = pipe.to("cuda") +pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) +pipeline = pipeline.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" generator = torch.Generator(device="cuda").manual_seed(1) -out = pipe(prompt, generator=generator, callback_on_step_end=callback_dynamic_cfg, callback_on_step_end_tensor_inputs=['prompt_embeds']) +out = pipeline( + prompt, + generator=generator, + callback_on_step_end=callback_dynamic_cfg, + callback_on_step_end_tensor_inputs=['prompt_embeds'] +) out.images[0].save("out_custom_cfg.png") ``` -The callback function is executed at the end of each denoising step, and modifies the pipeline attributes and tensor variables for the next denoising step. - -With callbacks, you can implement features such as dynamic CFG without having to modify the underlying code at all! - - - -🤗 Diffusers currently only supports `callback_on_step_end`, but feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) if you have a cool use-case and require a callback function with a different execution point! - - - ## Interrupt the diffusion process -Interrupting the diffusion process is particularly useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback. +> [!TIP] +> The interruption callback is supported for text-to-image, image-to-image, and inpainting for the [StableDiffusionPipeline](../api/pipelines/stable_diffusion/overview) and [StableDiffusionXLPipeline](../api/pipelines/stable_diffusion/stable_diffusion_xl). - +Stopping the diffusion process early is useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback. -The interruption callback is supported for text-to-image, image-to-image, and inpainting for the [StableDiffusionPipeline](../api/pipelines/stable_diffusion/overview) and [StableDiffusionXLPipeline](../api/pipelines/stable_diffusion/stable_diffusion_xl). - - - -This callback function should take the following arguments: `pipe`, `i`, `t`, and `callback_kwargs` (this must be returned). Set the pipeline's `_interrupt` attribute to `True` to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback. +This callback function should take the following arguments: `pipeline`, `i`, `t`, and `callback_kwargs` (this must be returned). Set the pipeline's `_interrupt` attribute to `True` to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback. In this example, the diffusion process is stopped after 10 steps even though `num_inference_steps` is set to 50. ```python from diffusers import StableDiffusionPipeline -pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") -pipe.enable_model_cpu_offload() +pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") +pipeline.enable_model_cpu_offload() num_inference_steps = 50 -def interrupt_callback(pipe, i, t, callback_kwargs): +def interrupt_callback(pipeline, i, t, callback_kwargs): stop_idx = 10 if i == stop_idx: - pipe._interrupt = True + pipeline._interrupt = True return callback_kwargs -pipe( +pipeline( "A photo of a cat", num_inference_steps=num_inference_steps, callback_on_step_end=interrupt_callback, ) ``` + +## Display image after each generation step + +> [!TIP] +> This tip was contributed by [asomoza](https://github.com/asomoza). + +Display an image after each generation step by accessing and converting the latents after each step into an image. The latent space is compressed to 128x128, so the images are also 128x128 which is useful for a quick preview. + +1. Use the function below to convert the SDXL latents (4 channels) to RGB tensors (3 channels) as explained in the [Explaining the SDXL latent space](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space) blog post. + +```py +def latents_to_rgb(latents): + weights = ( + (60, -60, 25, -70), + (60, -5, 15, -50), + (60, 10, -5, -35) + ) + + weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device)) + biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device) + rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1) + image_array = rgb_tensor.clamp(0, 255)[0].byte().cpu().numpy() + image_array = image_array.transpose(1, 2, 0) + + return Image.fromarray(image_array) +``` + +2. Create a function to decode and save the latents into an image. + +```py +def decode_tensors(pipe, step, timestep, callback_kwargs): + latents = callback_kwargs["latents"] + + image = latents_to_rgb(latents) + image.save(f"{step}.png") + + return callback_kwargs +``` + +3. Pass the `decode_tensors` function to the `callback_on_step_end` parameter to decode the tensors after each step. You also need to specify what you want to modify in the `callback_on_step_end_tensor_inputs` parameter, which in this case are the latents. + +```py +from diffusers import AutoPipelineForText2Image +import torch +from PIL import Image + +pipeline = AutoPipelineForText2Image.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", + torch_dtype=torch.float16, + variant="fp16", + use_safetensors=True +).to("cuda") + +image = pipe( + prompt = "A croissant shaped like a cute bear." + negative_prompt = "Deformed, ugly, bad anatomy" + callback_on_step_end=decode_tensors, + callback_on_step_end_tensor_inputs=["latents"], +).images[0] +``` + +
+
+ +
step 0
+
+
+ +
step 19 +
+
+
+ +
step 29
+
+
+ +
step 39
+
+
+ +
step 49
+
+
diff --git a/docs/source/en/using-diffusers/tips.md b/docs/source/en/using-diffusers/tips.md deleted file mode 100644 index 2dd837a58360..000000000000 --- a/docs/source/en/using-diffusers/tips.md +++ /dev/null @@ -1,160 +0,0 @@ - - -# Community tips and tricks - -Diffusers owes much of its success to its community of users and contributors. ❤️ This guide is a collection of tips and tricks for using Diffusers shared by community members. It includes helpful advice such as how to customize and implement specific features through callbacks and how to generate high-quality images. - -If you have a tip or trick you'd like to share, we'd love to [hear from you](https://github.com/huggingface/diffusers/issues/new/choose)! - -## Callback to display image after each generation step - -> [!TIP] -> This tip was contributed by [asomoza](https://github.com/asomoza). - -Display an image after each generation step by using a [callback](../using-diffusers/callback) to access and manipulate the latents after each step and convert them into an image. - -1. Use the function below to convert the SDXL latents (4 channels) to RGB tensors (3 channels) as explained in the [Explaining the SDXL latent space](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space) blog post. - -> [!TIP] -> The latent space is compressed to 128x128 so the images are also 128x128 which is useful for a quick preview. - -```py -def latents_to_rgb(latents): - weights = ( - (60, -60, 25, -70), - (60, -5, 15, -50), - (60, 10, -5, -35) - ) - - weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device)) - biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device) - rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1) - image_array = rgb_tensor.clamp(0, 255)[0].byte().cpu().numpy() - image_array = image_array.transpose(1, 2, 0) - - return Image.fromarray(image_array) -``` - -2. Create a function to decode and save the latents into an image. - -```py -def decode_tensors(pipe, step, timestep, callback_kwargs): - latents = callback_kwargs["latents"] - - image = latents_to_rgb(latents) - image.save(f"{step}.png") - - return callback_kwargs -``` - -3. Pass the `decode_tensors` function to the `callback_on_step_end` parameter to decode the tensors after each step. You also need to specify what you want to modify in the `callback_on_step_end_tensor_inputs` parameter, which in this case are the latents. - -```py -from diffusers import AutoPipelineForText2Image -import torch -from PIL import Image - -pipeline = AutoPipelineForText2Image.from_pretrained( - "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") - -image = pipe( - prompt = "A croissant shaped like a cute bear." - negative_prompt = "Deformed, ugly, bad anatomy" - callback_on_step_end=decode_tensors, - callback_on_step_end_tensor_inputs=["latents"], -).images[0] -``` - -
-
- -
step 0
-
-
- -
step 19 -
-
-
- -
step 29
-
-
- -
step 39
-
-
- -
step 49
-
-
- -## High quality anime images - -> [!TIP] -> This tip was contributed by [asomoza](https://github.com/asomoza). - -Generating high-quality anime images is a very popular application of diffusion models. To achieve this in Diffusers: - -1. Choose a good anime model like [Counterfeit](https://hf.co/gsdf/Counterfeit-V3.0) and pair it with negative prompt embeddings such as [EasyNegative](https://huggingface.co/embed/EasyNegative) to further improve the quality of the generated images. - -```py -from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler -import torch - -pipeline = StableDiffusionPipeline.from_single_file( - "https://huggingface.co/gsdf/Counterfeit-V3.0/blob/main/Counterfeit-V3.0_fix_fp16.safetensors", - torch_dtype=torch.float16, -) -pipeline.load_textual_inversion( - "embed/EasyNegative", - weight_name="EasyNegative.safetensors", - token="EasyNegative" -) -``` - -2. Download the weights (typically a LoRA adapter) of a specific style to apply to the image and use the [`load_lora_weights`] method to add it to the pipeline. This example uses the [Dungeon Meshi Marcille Character Lora](https://civitai.com/models/106199/dungeon-meshi-marcille-character-lora). - -```py -!wget https://civitai.com/api/download/models/114049 -O marcille.safetensors -pipeline.load_lora_weights('.', weight_name="marcille.safetensors") -``` - -3. Load a scheduler and set `use_karras_sigmas=True` to use the DPM++ 2M Karras scheduler (take a look at this [scheduler table](../api/schedulers/overview.) to find the A1111 equivalent scheduler in Diffusers). - -```py -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) -pipeline.scheduler.config.use_karras_sigmas=True -pipeline.to("cuda") -``` - -4. Create your prompt and negative prompts, and remember to use the trigger words for this specific LoRA adapter (`dmarci`) and embeddings (`EasyNegative`). It is also important to set the: - - - `lora_scale` parameter to control how much to scale the output with the LoRA weights by. - - `clip_skip` parameter to specify the layers of the CLIP model to use. This parameter is especially important for anime checkpoints because it controls how closely aligned the text prompt and image are. A higher `clip_skip` value produces more abstract images. - -```py -generator = torch.Generator("cpu").manual_seed(77) - -prompt = "dmarci, masterpiece, best quality, 1girl, solo, marcillessa, red choker, detailed and beautiful eyes, cowboy shot, HAPPY, walking, jumping, turtleneck sweater, leather skirt" -negative_prompt = "EasyNegative, worst quality, low quality, bad quality, blurry, bad anatomy, lowres, monochrome, grayscale, bad proportions, out of focus, bad body, deformed, mutated, mutation, ugly, disfigured, poorly drawn face, skin spots, malformed limbs, extra arms, mutated hands and fingers, fused fingers, bad fingers, too many fingers" - -lora_scale = 1.0 -images = pipeline(prompt, width=768, height=768, negative_prompt=negative_prompt, num_inference_steps=20, cross_attention_kwargs={"scale": lora_scale}, generator=generator, num_images_per_prompt=4, clip_skip=2, guidance_scale=7).images[0] -images -``` - -
- -