diff --git a/docs/source/en/api/attnprocessor.md b/docs/source/en/api/attnprocessor.md index 3c0ee0563f07..ab89d4d260f0 100644 --- a/docs/source/en/api/attnprocessor.md +++ b/docs/source/en/api/attnprocessor.md @@ -41,12 +41,6 @@ An attention processor is a class for applying different types of attention mech ## FusedAttnProcessor2_0 [[autodoc]] models.attention_processor.FusedAttnProcessor2_0 -## LoRAAttnProcessor -[[autodoc]] models.attention_processor.LoRAAttnProcessor - -## LoRAAttnProcessor2_0 -[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0 - ## LoRAAttnAddedKVProcessor [[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor diff --git a/docs/source/en/optimization/fp16.md b/docs/source/en/optimization/fp16.md index 72e881e4f0c5..7a2cf934985c 100644 --- a/docs/source/en/optimization/fp16.md +++ b/docs/source/en/optimization/fp16.md @@ -66,3 +66,9 @@ image = pipe(prompt).images[0] Don't use [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) in any of the pipelines as it can lead to black images and is always slower than pure float16 precision. + +## Distilled model + +You could also use a distilled Stable Diffusion model and autoencoder to speed up inference. During distillation, many of the UNet's residual and attention blocks are shed to reduce the model size. The distilled model is faster and uses less memory while generating images of comparable quality to the full Stable Diffusion model. + +Learn more about in the [Distilled Stable Diffusion inference](../using-diffusers/distilled_sd) guide! diff --git a/docs/source/en/optimization/torch2.0.md b/docs/source/en/optimization/torch2.0.md index 9c6febb06f07..2475bb525ddd 100644 --- a/docs/source/en/optimization/torch2.0.md +++ b/docs/source/en/optimization/torch2.0.md @@ -75,6 +75,9 @@ Compilation requires some time to complete, so it is best suited for situations For more information and different options about `torch.compile`, refer to the [`torch_compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) tutorial. +> [!TIP] +> Learn more about other ways PyTorch 2.0 can help optimize your model in the [Accelerate inference of text-to-image diffusion models](../tutorials/fast_diffusion) tutorial. + ## Benchmark We conducted a comprehensive benchmark with PyTorch 2.0's efficient attention implementation and `torch.compile` across different GPUs and batch sizes for five of our most used pipelines. The code is benchmarked on 🤗 Diffusers v0.17.0.dev0 to optimize `torch.compile` usage (see [here](https://github.com/huggingface/diffusers/pull/3313) for more details). diff --git a/docs/source/en/training/lora.md b/docs/source/en/training/lora.md index b4f9e14bbcb6..9e87c224f428 100644 --- a/docs/source/en/training/lora.md +++ b/docs/source/en/training/lora.md @@ -113,36 +113,50 @@ The dataset preprocessing code and training loop are found in the [`main()`](htt As with the script parameters, a walkthrough of the training script is provided in the [Text-to-image](text2image#training-script) training guide. Instead, this guide takes a look at the LoRA relevant parts of the script. -The script begins by adding the [new LoRA weights](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L447) to the attention layers. This involves correctly configuring the weight size for each block in the UNet. You'll see the `rank` parameter is used to create the [`~models.attention_processor.LoRAAttnProcessor`]: + + + +Diffusers uses [`~peft.LoraConfig`] from the [PEFT](https://hf.co/docs/peft) library to set up the parameters of the LoRA adapter such as the rank, alpha, and which modules to insert the LoRA weights into. The adapter is added to the UNet, and only the LoRA layers are filtered for optimization in `lora_layers`. + +```py +unet_lora_config = LoraConfig( + r=args.rank, + lora_alpha=args.rank, + init_lora_weights="gaussian", + target_modules=["to_k", "to_q", "to_v", "to_out.0"], +) + +unet.add_adapter(unet_lora_config) +lora_layers = filter(lambda p: p.requires_grad, unet.parameters()) +``` + + + + +Diffusers also supports finetuning the text encoder with LoRA from the [PEFT](https://hf.co/docs/peft) library when necessary such as finetuning Stable Diffusion XL (SDXL). The [`~peft.LoraConfig`] is used to configure the parameters of the LoRA adapter which are then added to the text encoder, and only the LoRA layers are filtered for training. ```py -lora_attn_procs = {} -for name in unet.attn_processors.keys(): - cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim - if name.startswith("mid_block"): - hidden_size = unet.config.block_out_channels[-1] - elif name.startswith("up_blocks"): - block_id = int(name[len("up_blocks.")]) - hidden_size = list(reversed(unet.config.block_out_channels))[block_id] - elif name.startswith("down_blocks"): - block_id = int(name[len("down_blocks.")]) - hidden_size = unet.config.block_out_channels[block_id] - - lora_attn_procs[name] = LoRAAttnProcessor( - hidden_size=hidden_size, - cross_attention_dim=cross_attention_dim, - rank=args.rank, - ) - -unet.set_attn_processor(lora_attn_procs) -lora_layers = AttnProcsLayers(unet.attn_processors) +text_lora_config = LoraConfig( + r=args.rank, + lora_alpha=args.rank, + init_lora_weights="gaussian", + target_modules=["q_proj", "k_proj", "v_proj", "out_proj"], +) + +text_encoder_one.add_adapter(text_lora_config) +text_encoder_two.add_adapter(text_lora_config) +text_lora_parameters_one = list(filter(lambda p: p.requires_grad, text_encoder_one.parameters())) +text_lora_parameters_two = list(filter(lambda p: p.requires_grad, text_encoder_two.parameters())) ``` -The [optimizer](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L519) is initialized with the `lora_layers` because these are the only weights that'll be optimized: + + + +The [optimizer](https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/examples/text_to_image/train_text_to_image_lora.py#L529) is initialized with the `lora_layers` because these are the only weights that'll be optimized: ```py optimizer = optimizer_cls( - lora_layers.parameters(), + lora_layers, lr=args.learning_rate, betas=(args.adam_beta1, args.adam_beta2), weight_decay=args.adam_weight_decay, diff --git a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md index 2de2af96df46..5b2c68853d14 100644 --- a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md +++ b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md @@ -217,3 +217,9 @@ Check your image dimensions to see if they're correct: images.shape # (8, 1, 512, 512, 3) ``` + +## Resources + +To learn more about how JAX works with Stable Diffusion, you may be interested in reading: + +* [Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e](https://hf.co/blog/sdxl_jax)