From d38a3d12416e4f533a9499a46a655b62ad38bdf1 Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Wed, 15 Nov 2023 14:53:51 -0800 Subject: [PATCH 1/7] first draft --- docs/source/en/_toctree.yml | 12 +++++- docs/source/en/api/loaders/lora.md | 32 ++++++++++++++++ docs/source/en/api/loaders/single_file.md | 37 +++++++++++++++++++ .../en/api/loaders/textual_inversion.md | 27 ++++++++++++++ docs/source/en/api/loaders/unet.md | 27 ++++++++++++++ 5 files changed, 133 insertions(+), 2 deletions(-) create mode 100644 docs/source/en/api/loaders/lora.md create mode 100644 docs/source/en/api/loaders/single_file.md create mode 100644 docs/source/en/api/loaders/textual_inversion.md create mode 100644 docs/source/en/api/loaders/unet.md diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 150464b09795..d2583121418e 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -186,13 +186,21 @@ - sections: - local: api/configuration title: Configuration - - local: api/loaders - title: Loaders - local: api/logging title: Logging - local: api/outputs title: Outputs title: Main Classes + - sections: + - local: api/loaders/lora + title: LoRA + - local: api/loaders/single_file + title: Single files + - local: api/loaders/textual_inversion + title: Textual Inversion + - local: api/loaders/unet + title: UNet + title: Loaders - sections: - local: api/models/overview title: Overview diff --git a/docs/source/en/api/loaders/lora.md b/docs/source/en/api/loaders/lora.md new file mode 100644 index 000000000000..0739e3b477c3 --- /dev/null +++ b/docs/source/en/api/loaders/lora.md @@ -0,0 +1,32 @@ + + +# LoRA + +LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the UNet, text encoder or both. There are two classes for loading LoRA weights: + +- [`LoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model. +- [`StableDiffusionXLLoraLoaderMixin`] is a Stable Diffusion (SDXL) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model. + + + +To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide. + + + +## LoraLoaderMixin + +[[autodoc]] loaders.lora.LoraLoaderMixin + +## StableDiffusionXLLoraLoaderMixin + +[[autodoc]] loaders.lora.StableDiffusionXLLoraLoaderMixin \ No newline at end of file diff --git a/docs/source/en/api/loaders/single_file.md b/docs/source/en/api/loaders/single_file.md new file mode 100644 index 000000000000..6a07229bb665 --- /dev/null +++ b/docs/source/en/api/loaders/single_file.md @@ -0,0 +1,37 @@ + + +# Single files + +Diffusers supports loading pretrained pipeline (or model) weights stored in a single file, such as a `ckpt` or `safetensors` file. These single file types are typically produced from community trained models. There are three classes for loading single file weights: + +- [`FromSingleFileMixin`] supports loading pretrained pipeline weights stored in a single file, which can either be a `ckpt` or `safetensors` file. +- [`FromOriginalVAEMixin`] supports loading a pretrained [`AutoencoderKL`] from pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file. +- [`FromOriginalControlnetMixin`] supports loading pretrained [`ControlNet`] weights stored in a single file, which can either be a `ckpt` or `safetensors` file. + + + +To learn more about how to load single file weights, see the [Load different Stable Diffusion formats](../../using-diffusers/other-formats) loading guide. + + + +## FromSingleFileMixin + +[[autodoc]] loaders.single_file.FromSingleFileMixin + +## FromOriginalVAEMixin + +[[autodoc]] loaders.single_file.FromOriginalVAEMixin + +## FromOriginalControlnetMixin + +[[autodoc]] loaders.single_file.FromOriginalControlnetMixin \ No newline at end of file diff --git a/docs/source/en/api/loaders/textual_inversion.md b/docs/source/en/api/loaders/textual_inversion.md new file mode 100644 index 000000000000..28d38ddb5bf2 --- /dev/null +++ b/docs/source/en/api/loaders/textual_inversion.md @@ -0,0 +1,27 @@ + + +# Textual Inversion + +Textual Inversion is a training method for personalizing models by learning new text embeddings from a few example images. The file produced from training is extremely small (a few KBs) and the new embeddings can be loaded into the text encoder. + +[`TextualInversionLoaderMixin`] provides a function for loading Textual Inversion embeddings from Diffusers and Automatic1111 into the text encoder and loading a special token to activate the embeddings. + + + +To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/loading_adapters#textual-inversion) loading guide. + + + +## TextualInversionLoaderMixin + +[[autodoc]] loaders.textual_inversion.TextualInversionLoaderMixin \ No newline at end of file diff --git a/docs/source/en/api/loaders/unet.md b/docs/source/en/api/loaders/unet.md new file mode 100644 index 000000000000..ffa8f13e372d --- /dev/null +++ b/docs/source/en/api/loaders/unet.md @@ -0,0 +1,27 @@ + + +# UNet + +Some training methods - like LoRA and Custom Diffusion - target the UNet's attention processor layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead. + +The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters. + + + +To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide. + + + +## UNet2DConditionLoadersMixin + +[[autodoc]] loaders.unet.UNet2DConditionLoadersMixin \ No newline at end of file From bec9afb8899766b49c1fb29cd36468feccd7b932 Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Wed, 15 Nov 2023 14:59:39 -0800 Subject: [PATCH 2/7] remove old loader doc --- docs/source/en/api/loaders.md | 49 ----------------------------------- 1 file changed, 49 deletions(-) delete mode 100644 docs/source/en/api/loaders.md diff --git a/docs/source/en/api/loaders.md b/docs/source/en/api/loaders.md deleted file mode 100644 index d81b0eb1abcb..000000000000 --- a/docs/source/en/api/loaders.md +++ /dev/null @@ -1,49 +0,0 @@ - - -# Loaders - -Adapters (textual inversion, LoRA, hypernetworks) allow you to modify a diffusion model to generate images in a specific style without training or finetuning the entire model. The adapter weights are very portable because they're typically only a tiny fraction of the pretrained model weights. ๐Ÿค— Diffusers provides an easy-to-use `LoaderMixin` API to load adapter weights. - - - -๐Ÿงช The `LoaderMixin`s are highly experimental and prone to future changes. To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`. - - - -## UNet2DConditionLoadersMixin - -[[autodoc]] loaders.UNet2DConditionLoadersMixin - -## TextualInversionLoaderMixin - -[[autodoc]] loaders.TextualInversionLoaderMixin - -## StableDiffusionXLLoraLoaderMixin - -[[autodoc]] loaders.StableDiffusionXLLoraLoaderMixin - -## LoraLoaderMixin - -[[autodoc]] loaders.LoraLoaderMixin - -## FromSingleFileMixin - -[[autodoc]] loaders.FromSingleFileMixin - -## FromOriginalControlnetMixin - -[[autodoc]] loaders.FromOriginalControlnetMixin - -## FromOriginalVAEMixin - -[[autodoc]] loaders.FromOriginalVAEMixin From 9c1e67e3fcb68dbdcbd6ceeba704f507fabf9e12 Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Wed, 15 Nov 2023 18:20:25 -0800 Subject: [PATCH 3/7] start adding lora code examples --- src/diffusers/loaders/lora.py | 174 ++++++++++++++++++++++++++++------ 1 file changed, 143 insertions(+), 31 deletions(-) diff --git a/src/diffusers/loaders/lora.py b/src/diffusers/loaders/lora.py index 532a59f3b9bd..450ca0241205 100644 --- a/src/diffusers/loaders/lora.py +++ b/src/diffusers/loaders/lora.py @@ -69,7 +69,7 @@ class LoraLoaderMixin: r""" Load LoRA layers into [`UNet2DConditionModel`] and - [`CLIPTextModel`](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel). + [`~transformers.CLIPTextModel`]. """ text_encoder_name = TEXT_ENCODER_NAME unet_name = UNET_NAME @@ -137,13 +137,11 @@ def lora_state_dict( **kwargs, ): r""" - Return state dict for lora weights and the network alphas. + Return state dict and network alphas of the LoRA weights. - We support loading A1111 formatted LoRA checkpoints in a limited capacity. - - This function is experimental and might change in the future. + A1111 formatted LoRA checkpoints are supported in a limited capacity. This function is experimental and might change in the future. @@ -191,6 +189,15 @@ def lora_state_dict( guarantee the timeliness or safety of the source, and you should refer to the mirror site for more information. + Example: + + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline.lora_state_dict("nerijs/pixel-art-xl") + ``` """ # Load the main state dict first which has the LoRA layers for either of # UNet and text encoder or both. @@ -883,11 +890,11 @@ def save_lora_weights( safe_serialization: bool = True, ): r""" - Save the LoRA parameters corresponding to the UNet and text encoder. + Save the UNet and text encoder LoRA parameters. Arguments: save_directory (`str` or `os.PathLike`): - Directory to save LoRA parameters to. Will be created if it doesn't exist. + Directory to save LoRA parameters to (will be created if it doesn't exist). unet_lora_layers (`Dict[str, torch.nn.Module]` or `Dict[str, torch.Tensor]`): State dict of the LoRA layers corresponding to the `unet`. text_encoder_lora_layers (`Dict[str, torch.nn.Module]` or `Dict[str, torch.Tensor]`): @@ -898,11 +905,23 @@ def save_lora_weights( need to call this function on all processes. In this case, set `is_main_process=True` only on the main process to avoid race conditions. save_function (`Callable`): - The function to use to save the state dictionary. Useful during distributed training when you need to + The function to use to save the state dict. Useful during distributed training when you need to replace `torch.save` with another method. Can be configured with the environment variable `DIFFUSERS_SAVE_MODE`. safe_serialization (`bool`, *optional*, defaults to `True`): - Whether to save the model using `safetensors` or the traditional PyTorch way with `pickle`. + Whether to save the model using `safetensors` or with `pickle`. + + Example: + + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.fuse_lora(lora_scale=0.7) + pipeline.save_lora_weights("your-username/model") + ``` """ # Create a flat dictionary. state_dict = {} @@ -1138,14 +1157,17 @@ def _convert_kohya_lora_to_diffusers(cls, state_dict): def unload_lora_weights(self): """ - Unloads the LoRA parameters. + Unload the LoRA parameters from a pipeline. Examples: - ```python - >>> # Assuming `pipeline` is already loaded with the LoRA parameters. - >>> pipeline.unload_lora_weights() - >>> ... + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.unload_lora_weights() ``` """ if not USE_PEFT_BACKEND: @@ -1174,7 +1196,7 @@ def fuse_lora( safe_fusing: bool = False, ): r""" - Fuses the LoRA parameters into the original parameters of the corresponding blocks. + Fuse the LoRA parameters with the original parameters in their corresponding blocks. @@ -1188,9 +1210,20 @@ def fuse_lora( Whether to fuse the text encoder LoRA parameters. If the text encoder wasn't monkey-patched with the LoRA parameters then it won't have any effect. lora_scale (`float`, defaults to 1.0): - Controls how much to influence the outputs with the LoRA parameters. + Controls LoRA influence on the outputs. safe_fusing (`bool`, defaults to `False`): - Whether to check fused weights for NaN values before fusing and if values are NaN not fusing them. + Whether to check fused weights for `NaN` values before fusing and if values are `NaN`, then don't fuse them. + + Example: + + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.fuse_lora(lora_scale=0.7) + ``` """ if fuse_unet or fuse_text_encoder: self.num_fused_loras += 1 @@ -1239,8 +1272,8 @@ def fuse_text_encoder_lora(text_encoder, lora_scale=1.0, safe_fusing=False): def unfuse_lora(self, unfuse_unet: bool = True, unfuse_text_encoder: bool = True): r""" - Reverses the effect of - [`pipe.fuse_lora()`](https://huggingface.co/docs/diffusers/main/en/api/loaders#diffusers.loaders.LoraLoaderMixin.fuse_lora). + Unfuse the LoRA parameters from the original parameters in their corresponding blocks. + @@ -1253,6 +1286,18 @@ def unfuse_lora(self, unfuse_unet: bool = True, unfuse_text_encoder: bool = True unfuse_text_encoder (`bool`, defaults to `True`): Whether to unfuse the text encoder LoRA parameters. If the text encoder wasn't monkey-patched with the LoRA parameters then it won't have any effect. + + Example: + + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.fuse_lora(lora_scale=0.7) + pipeline.unfuse_lora() + ``` """ if unfuse_unet: if not USE_PEFT_BACKEND: @@ -1304,16 +1349,26 @@ def set_adapters_for_text_encoder( text_encoder_weights: List[float] = None, ): """ - Sets the adapter layers for the text encoder. + Only activate an adapter for the text encoder. Args: adapter_names (`List[str]` or `str`): - The names of the adapters to use. + The adapter to activate. text_encoder (`torch.nn.Module`, *optional*): - The text encoder module to set the adapter layers for. If `None`, it will try to get the `text_encoder` + The text encoder module to activate the adapter layers for. If `None`, it will try to get the `text_encoder` attribute. text_encoder_weights (`List[float]`, *optional*): The weights to use for the text encoder. If `None`, the weights are set to `1.0` for all the adapters. + + Example: + + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.set_adapters_for_text_encoder("pixel") """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") @@ -1341,12 +1396,23 @@ def process_weights(adapter_names, weights): def disable_lora_for_text_encoder(self, text_encoder: Optional["PreTrainedModel"] = None): """ - Disables the LoRA layers for the text encoder. + Disable the text encoder's LoRA layers. Args: text_encoder (`torch.nn.Module`, *optional*): The text encoder module to disable the LoRA layers for. If `None`, it will try to get the `text_encoder` attribute. + + Example: + + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.disable_lora_for_text_encoder() + ``` """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") @@ -1358,12 +1424,23 @@ def disable_lora_for_text_encoder(self, text_encoder: Optional["PreTrainedModel" def enable_lora_for_text_encoder(self, text_encoder: Optional["PreTrainedModel"] = None): """ - Enables the LoRA layers for the text encoder. + Enables the text encoder's LoRA layers. Args: text_encoder (`torch.nn.Module`, *optional*): The text encoder module to enable the LoRA layers for. If `None`, it will try to get the `text_encoder` attribute. + + Example: + + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.enable_lora_for_text_encoder() + ``` """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") @@ -1414,10 +1491,22 @@ def enable_lora(self): def delete_adapters(self, adapter_names: Union[List[str], str]): """ + Delete an adapter's LoRA layers from the UNet and text encoder(s). + Args: - Deletes the LoRA layers of `adapter_name` for the unet and text-encoder(s). adapter_names (`Union[List[str], str]`): - The names of the adapter to delete. Can be a single string or a list of strings + The names (single string or list of strings) of the adapter to delete. + + Example: + + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.delete_adapters("pixel") + ``` """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") @@ -1437,7 +1526,7 @@ def delete_adapters(self, adapter_names: Union[List[str], str]): def get_active_adapters(self) -> List[str]: """ - Gets the list of the current active adapters. + Get a list of currently active adapters. Example: @@ -1469,7 +1558,19 @@ def get_active_adapters(self) -> List[str]: def get_list_adapters(self) -> Dict[str, List[str]]: """ - Gets the current list of all available adapters in the pipeline. + Get a list of all currently available adapters for each component in the pipeline. + + Example: + + ```py + from diffusers import DiffusionPipeline + + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", + ).to("cuda") + pipeline.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy") + pipeline.get_list_adapters() + ``` """ if not USE_PEFT_BACKEND: raise ValueError( @@ -1491,14 +1592,25 @@ def get_list_adapters(self) -> Dict[str, List[str]]: def set_lora_device(self, adapter_names: List[str], device: Union[torch.device, str, int]) -> None: """ - Moves the LoRAs listed in `adapter_names` to a target device. Useful for offloading the LoRA to the CPU in case + Move a LoRA to a target device. Useful for offloading a LoRA to the CPU in case you want to load multiple adapters and free some GPU memory. Args: adapter_names (`List[str]`): - List of adapters to send device to. + List of adapters to send to device. device (`Union[torch.device, str, int]`): - Device to send the adapters to. Can be either a torch device, a str or an integer. + Device (can be a `torch.device`, `str` or `int`) to place adapters on. + + Example: + + ```py + from diffusers import DiffusionPipeline + + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", + ).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.set_lora_device(["pixel"], device="cuda") """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") From 41bb936374f5dd4049909b98fb960fdf56563536 Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Thu, 16 Nov 2023 10:50:07 -0800 Subject: [PATCH 4/7] finish --- src/diffusers/loaders/lora.py | 162 +++++++++++++-------- src/diffusers/loaders/single_file.py | 19 ++- src/diffusers/loaders/textual_inversion.py | 8 +- src/diffusers/loaders/unet.py | 97 ++++++++++-- 4 files changed, 207 insertions(+), 79 deletions(-) diff --git a/src/diffusers/loaders/lora.py b/src/diffusers/loaders/lora.py index 450ca0241205..1968bdfb8f02 100644 --- a/src/diffusers/loaders/lora.py +++ b/src/diffusers/loaders/lora.py @@ -68,8 +68,7 @@ class LoraLoaderMixin: r""" - Load LoRA layers into [`UNet2DConditionModel`] and - [`~transformers.CLIPTextModel`]. + Load LoRA layers into [`UNet2DConditionModel`] and [`~transformers.CLIPTextModel`]. """ text_encoder_name = TEXT_ENCODER_NAME unet_name = UNET_NAME @@ -94,12 +93,28 @@ def load_lora_weights( Parameters: pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`): - See [`~loaders.LoraLoaderMixin.lora_state_dict`]. + A string (model id of a pretrained model hosted on the Hub), a path to a directory containing the model + weights saved with [`ModelMixin.save_pretrained`], or a [torch state + dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict). kwargs (`dict`, *optional*): See [`~loaders.LoraLoaderMixin.lora_state_dict`]. adapter_name (`str`, *optional*): - Adapter name to be used for referencing the loaded adapter model. If not specified, it will use - `default_{i}` where i is the total number of adapters being loaded. + Name for referencing the loaded adapter model. If not specified, it will use `default_{i}` where `i` is + the total number of adapters being loaded. + + Example: + + ```py + from diffusers import DiffusionPipeline + import torch + + pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to( + "cuda" + ) + pipeline.load_lora_weights( + "Yntec/pineappleAnimeMix", weight_name="pineappleAnimeMix_pineapple10.1.safetensors", adapter_name="anime" + ) + ``` """ # First, ensure that the checkpoint is a compatible one and can be successfully loaded. state_dict, network_alphas = self.lora_state_dict(pretrained_model_name_or_path_or_dict, **kwargs) @@ -141,7 +156,8 @@ def lora_state_dict( - A1111 formatted LoRA checkpoints are supported in a limited capacity. This function is experimental and might change in the future. + A1111 formatted LoRA checkpoints are supported in a limited capacity. This function is experimental and might + change in the future. @@ -195,7 +211,9 @@ def lora_state_dict( from diffusers import DiffusionPipeline import torch - pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") pipeline.lora_state_dict("nerijs/pixel-art-xl") ``` """ @@ -474,25 +492,25 @@ def load_lora_into_unet( cls, state_dict, network_alphas, unet, low_cpu_mem_usage=None, adapter_name=None, _pipeline=None ): """ - This will load the LoRA layers specified in `state_dict` into `unet`. + Load LoRA layers specified in `state_dict` into `unet`. Parameters: state_dict (`dict`): - A standard state dict containing the lora layer parameters. The keys can either be indexed directly - into the unet or prefixed with an additional `unet` which can be used to distinguish between text - encoder lora layers. + A standard state dict containing the LoRA layer parameters. The keys can either be indexed directly + into the `unet` or prefixed with an additional `unet`, which can be used to distinguish between text + encoder LoRA layers. network_alphas (`Dict[str, float]`): See `LoRALinearLayer` for more details. unet (`UNet2DConditionModel`): The UNet model to load the LoRA layers into. low_cpu_mem_usage (`bool`, *optional*, defaults to `True` if torch version >= 1.9.0 else `False`): - Speed up model loading only loading the pretrained weights and not initializing the weights. This also - tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model. - Only supported for PyTorch >= 1.9.0. If you are using an older version of PyTorch, setting this - argument to `True` will raise an error. + Only load and not initialize the pretrained weights. This can speedup model loading and also tries to + not use more than 1x model size in CPU memory (including peak memory) while loading the model. Only + supported for PyTorch >= 1.9.0. If you are using an older version of PyTorch, setting this argument to + `True` will raise an error. adapter_name (`str`, *optional*): - Adapter name to be used for referencing the loaded adapter model. If not specified, it will use - `default_{i}` where i is the total number of adapters being loaded. + Name for referencing the loaded adapter model. If not specified, it will use `default_{i}` where `i` is + the total number of adapters being loaded. """ low_cpu_mem_usage = low_cpu_mem_usage if low_cpu_mem_usage is not None else _LOW_CPU_MEM_USAGE_DEFAULT # If the serialization format is new (introduced in https://github.com/huggingface/diffusers/pull/2918), @@ -586,12 +604,12 @@ def load_lora_into_text_encoder( _pipeline=None, ): """ - This will load the LoRA layers specified in `state_dict` into `text_encoder` + Load LoRA layers specified in `state_dict` into `text_encoder`. Parameters: state_dict (`dict`): - A standard state dict containing the lora layer parameters. The key should be prefixed with an - additional `text_encoder` to distinguish between unet lora layers. + A standard state dict containing the LoRA layer parameters. The key should be prefixed with an + additional `text_encoder` to distinguish between UNet LoRA layers. network_alphas (`Dict[str, float]`): See `LoRALinearLayer` for more details. text_encoder (`CLIPTextModel`): @@ -599,13 +617,12 @@ def load_lora_into_text_encoder( prefix (`str`): Expected prefix of the `text_encoder` in the `state_dict`. lora_scale (`float`): - How much to scale the output of the lora linear layer before it is added with the output of the regular - lora layer. + Scale of `LoRALinearLayer`'s output before it is added with the output of the regular LoRA layer. low_cpu_mem_usage (`bool`, *optional*, defaults to `True` if torch version >= 1.9.0 else `False`): - Speed up model loading only loading the pretrained weights and not initializing the weights. This also - tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model. - Only supported for PyTorch >= 1.9.0. If you are using an older version of PyTorch, setting this - argument to `True` will raise an error. + Only load and not initialize the pretrained weights. This can speedup model loading and also tries to + not use more than 1x model size in CPU memory (including peak memory) while loading the model. Only + supported for PyTorch >= 1.9.0. If you are using an older version of PyTorch, setting this argument to + `True` will raise an error. adapter_name (`str`, *optional*): Adapter name to be used for referencing the loaded adapter model. If not specified, it will use `default_{i}` where i is the total number of adapters being loaded. @@ -905,23 +922,11 @@ def save_lora_weights( need to call this function on all processes. In this case, set `is_main_process=True` only on the main process to avoid race conditions. save_function (`Callable`): - The function to use to save the state dict. Useful during distributed training when you need to - replace `torch.save` with another method. Can be configured with the environment variable + The function to use to save the state dict. Useful during distributed training when you need to replace + `torch.save` with another method. Can be configured with the environment variable `DIFFUSERS_SAVE_MODE`. safe_serialization (`bool`, *optional*, defaults to `True`): Whether to save the model using `safetensors` or with `pickle`. - - Example: - - ```py - from diffusers import DiffusionPipeline - import torch - - pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") - pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") - pipeline.fuse_lora(lora_scale=0.7) - pipeline.save_lora_weights("your-username/model") - ``` """ # Create a flat dictionary. state_dict = {} @@ -1165,7 +1170,9 @@ def unload_lora_weights(self): from diffusers import DiffusionPipeline import torch - pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") pipeline.unload_lora_weights() ``` @@ -1212,7 +1219,8 @@ def fuse_lora( lora_scale (`float`, defaults to 1.0): Controls LoRA influence on the outputs. safe_fusing (`bool`, defaults to `False`): - Whether to check fused weights for `NaN` values before fusing and if values are `NaN`, then don't fuse them. + Whether to check fused weights for `NaN` values before fusing and if values are `NaN`, then don't fuse + them. Example: @@ -1220,7 +1228,9 @@ def fuse_lora( from diffusers import DiffusionPipeline import torch - pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") pipeline.fuse_lora(lora_scale=0.7) ``` @@ -1274,7 +1284,6 @@ def unfuse_lora(self, unfuse_unet: bool = True, unfuse_text_encoder: bool = True r""" Unfuse the LoRA parameters from the original parameters in their corresponding blocks. - This is an experimental API. @@ -1293,7 +1302,9 @@ def unfuse_lora(self, unfuse_unet: bool = True, unfuse_text_encoder: bool = True from diffusers import DiffusionPipeline import torch - pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") pipeline.fuse_lora(lora_scale=0.7) pipeline.unfuse_lora() @@ -1349,14 +1360,14 @@ def set_adapters_for_text_encoder( text_encoder_weights: List[float] = None, ): """ - Only activate an adapter for the text encoder. + Set the currently active adapter for use in the text encoder. Args: adapter_names (`List[str]` or `str`): The adapter to activate. text_encoder (`torch.nn.Module`, *optional*): - The text encoder module to activate the adapter layers for. If `None`, it will try to get the `text_encoder` - attribute. + The text encoder module to activate the adapter layers for. If `None`, it will try to get the + `text_encoder` attribute. text_encoder_weights (`List[float]`, *optional*): The weights to use for the text encoder. If `None`, the weights are set to `1.0` for all the adapters. @@ -1366,9 +1377,15 @@ def set_adapters_for_text_encoder( from diffusers import DiffusionPipeline import torch - pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.load_lora_weights( + "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic" + ) pipeline.set_adapters_for_text_encoder("pixel") + ``` """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") @@ -1409,7 +1426,9 @@ def disable_lora_for_text_encoder(self, text_encoder: Optional["PreTrainedModel" from diffusers import DiffusionPipeline import torch - pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") pipeline.disable_lora_for_text_encoder() ``` @@ -1437,7 +1456,9 @@ def enable_lora_for_text_encoder(self, text_encoder: Optional["PreTrainedModel"] from diffusers import DiffusionPipeline import torch - pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") pipeline.enable_lora_for_text_encoder() ``` @@ -1503,7 +1524,9 @@ def delete_adapters(self, adapter_names: Union[List[str], str]): from diffusers import DiffusionPipeline import torch - pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") + pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") pipeline.delete_adapters("pixel") ``` @@ -1568,7 +1591,10 @@ def get_list_adapters(self) -> Dict[str, List[str]]: pipeline = DiffusionPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", ).to("cuda") - pipeline.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy") + pipeline.load_lora_weights( + "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic" + ) + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") pipeline.get_list_adapters() ``` """ @@ -1592,8 +1618,8 @@ def get_list_adapters(self) -> Dict[str, List[str]]: def set_lora_device(self, adapter_names: List[str], device: Union[torch.device, str, int]) -> None: """ - Move a LoRA to a target device. Useful for offloading a LoRA to the CPU in case - you want to load multiple adapters and free some GPU memory. + Move a LoRA to a target device. Useful for offloading a LoRA to the CPU in case you want to load multiple + adapters and free some GPU memory. Args: adapter_names (`List[str]`): @@ -1605,12 +1631,14 @@ def set_lora_device(self, adapter_names: List[str], device: Union[torch.device, ```py from diffusers import DiffusionPipeline + import torch pipeline = DiffusionPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", ).to("cuda") pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") pipeline.set_lora_device(["pixel"], device="cuda") + ``` """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") @@ -1642,7 +1670,7 @@ def set_lora_device(self, adapter_names: List[str], device: Union[torch.device, class StableDiffusionXLLoraLoaderMixin(LoraLoaderMixin): - """This class overrides `LoraLoaderMixin` with LoRA loading/saving code that's specific to SDXL""" + """This class overrides [`LoraLoaderMixin`] with LoRA loading/saving code that's specific to SDXL.""" # Overrride to properly handle the loading and unloading of the additional text encoder. def load_lora_weights( @@ -1667,12 +1695,26 @@ def load_lora_weights( Parameters: pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`): - See [`~loaders.LoraLoaderMixin.lora_state_dict`]. - adapter_name (`str`, *optional*): - Adapter name to be used for referencing the loaded adapter model. If not specified, it will use - `default_{i}` where i is the total number of adapters being loaded. + A string (model id of a pretrained model hosted on the Hub), a path to a directory containing the model + weights saved with [`ModelMixin.save_pretrained`], or a [torch state + dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict). kwargs (`dict`, *optional*): See [`~loaders.LoraLoaderMixin.lora_state_dict`]. + adapter_name (`str`, *optional*): + Name for referencing the loaded adapter model. If not specified, it will use `default_{i}` where `i` is + the total number of adapters being loaded. + + Example: + + ```py + from diffusers import StableDiffusionXLPipeline + import torch + + pipeline = StableDiffusionXLPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + ``` """ # We could have accessed the unet config from `lora_state_dict()` too. We pass # it here explicitly to be able to tell that it's coming from an SDXL diff --git a/src/diffusers/loaders/single_file.py b/src/diffusers/loaders/single_file.py index 8a4f1a0541fd..696be5fd3a65 100644 --- a/src/diffusers/loaders/single_file.py +++ b/src/diffusers/loaders/single_file.py @@ -288,12 +288,15 @@ def from_single_file(cls, pretrained_model_link_or_path, **kwargs): class FromOriginalVAEMixin: + """ + Load pretrained ControlNet weights saved in the `.ckpt` or `.safetensors` format into an [`AutoencoderKL`]. + """ + @classmethod def from_single_file(cls, pretrained_model_link_or_path, **kwargs): r""" - Instantiate a [`AutoencoderKL`] from pretrained controlnet weights saved in the original `.ckpt` or - `.safetensors` format. The pipeline is format. The pipeline is set in evaluation mode (`model.eval()`) by - default. + Instantiate a [`AutoencoderKL`] from pretrained ControlNet weights saved in the original `.ckpt` or + `.safetensors` format. The pipeline is set in evaluation mode (`model.eval()`) by default. Parameters: pretrained_model_link_or_path (`str` or `os.PathLike`, *optional*): @@ -348,8 +351,8 @@ def from_single_file(cls, pretrained_model_link_or_path, **kwargs): - Make sure to pass both `image_size` and `scaling_factor` to `from_single_file()` if you want to load - a VAE that does accompany a stable diffusion model of v2 or higher or SDXL. + Make sure to pass both `image_size` and `scaling_factor` to `from_single_file()` if you're loading + a VAE from SDXL or a Stable Diffusion v2 model or higher. @@ -482,10 +485,14 @@ def from_single_file(cls, pretrained_model_link_or_path, **kwargs): class FromOriginalControlnetMixin: + """ + Load pretrained ControlNet weights saved in the `.ckpt` or `.safetensors` format into a [`ControlNetModel`]. + """ + @classmethod def from_single_file(cls, pretrained_model_link_or_path, **kwargs): r""" - Instantiate a [`ControlNetModel`] from pretrained controlnet weights saved in the original `.ckpt` or + Instantiate a [`ControlNetModel`] from pretrained ControlNet weights saved in the original `.ckpt` or `.safetensors` format. The pipeline is set in evaluation mode (`model.eval()`) by default. Parameters: diff --git a/src/diffusers/loaders/textual_inversion.py b/src/diffusers/loaders/textual_inversion.py index 4890810d49a6..e36f03437a45 100644 --- a/src/diffusers/loaders/textual_inversion.py +++ b/src/diffusers/loaders/textual_inversion.py @@ -116,7 +116,7 @@ def load_textual_inversion_state_dicts(pretrained_model_name_or_paths, **kwargs) class TextualInversionLoaderMixin: r""" - Load textual inversion tokens and embeddings to the tokenizer and text encoder. + Load Textual Inversion tokens and embeddings to the tokenizer and text encoder. """ def maybe_convert_prompt(self, prompt: Union[str, List[str]], tokenizer: "PreTrainedTokenizer"): # noqa: F821 @@ -276,7 +276,7 @@ def load_textual_inversion( **kwargs, ): r""" - Load textual inversion embeddings into the text encoder of [`StableDiffusionPipeline`] (both ๐Ÿค— Diffusers and + Load Textual Inversion embeddings into the text encoder of [`StableDiffusionPipeline`] (both ๐Ÿค— Diffusers and Automatic1111 formats are supported). Parameters: @@ -335,7 +335,7 @@ def load_textual_inversion( Example: - To load a textual inversion embedding vector in ๐Ÿค— Diffusers format: + To load a Textual Inversion embedding vector in ๐Ÿค— Diffusers format: ```py from diffusers import StableDiffusionPipeline @@ -352,7 +352,7 @@ def load_textual_inversion( image.save("cat-backpack.png") ``` - To load a textual inversion embedding vector in Automatic1111 format, make sure to download the vector first + To load a Textual Inversion embedding vector in Automatic1111 format, make sure to download the vector first (for example from [civitAI](https://civitai.com/models/3036?modelVersionId=9857)) and then load the vector locally: diff --git a/src/diffusers/loaders/unet.py b/src/diffusers/loaders/unet.py index 3f63e73d9cec..0d759e6cc349 100644 --- a/src/diffusers/loaders/unet.py +++ b/src/diffusers/loaders/unet.py @@ -53,6 +53,10 @@ class UNet2DConditionLoadersMixin: + """ + Load LoRA layers into a [`UNet2DCondtionModel`]. + """ + text_encoder_name = TEXT_ENCODER_NAME unet_name = UNET_NAME @@ -107,6 +111,19 @@ def load_attn_procs(self, pretrained_model_name_or_path_or_dict: Union[str, Dict guarantee the timeliness or safety of the source, and you should refer to the mirror site for more information. + Example: + + ```py + from diffusers import AutoPipelineForText2Image + import torch + + pipeline = AutoPipelineForText2Image.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") + pipeline.unet.load_attn_procs( + "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic" + ) + ``` """ from ..models.attention_processor import CustomDiffusionAttnProcessor from ..models.lora import LoRACompatibleConv, LoRACompatibleLinear, LoRAConv2dLayer, LoRALinearLayer @@ -393,12 +410,12 @@ def save_attn_procs( **kwargs, ): r""" - Save an attention processor to a directory so that it can be reloaded using the + Save attention processor layers to a directory so that it can be reloaded with the [`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] method. Arguments: save_directory (`str` or `os.PathLike`): - Directory to save an attention processor to. Will be created if it doesn't exist. + Directory to save an attention processor to (will be created if it doesn't exist). is_main_process (`bool`, *optional*, defaults to `True`): Whether the process calling this is the main process or not. Useful during distributed training and you need to call this function on all processes. In this case, set `is_main_process=True` only on the main @@ -408,7 +425,7 @@ def save_attn_procs( replace `torch.save` with another method. Can be configured with the environment variable `DIFFUSERS_SAVE_MODE`. safe_serialization (`bool`, *optional*, defaults to `True`): - Whether to save the model using `safetensors` or the traditional PyTorch way with `pickle`. + Whether to save the model using `safetensors` or with `pickle`. """ from ..models.attention_processor import ( CustomDiffusionAttnProcessor, @@ -507,14 +524,30 @@ def set_adapters( weights: Optional[Union[List[float], float]] = None, ): """ - Sets the adapter layers for the unet. + Set the currently active adapters for use in the UNet. Args: adapter_names (`List[str]` or `str`): The names of the adapters to use. - weights (`Union[List[float], float]`, *optional*): + adapter_weights (`Union[List[float], float]`, *optional*): The adapter(s) weights to use with the UNet. If `None`, the weights are set to `1.0` for all the adapters. + + Example: + + ```py + from diffusers import AutoPipelineForText2Image + import torch + + pipeline = AutoPipelineForText2Image.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") + pipeline.load_lora_weights( + "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic" + ) + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.set_adapters(["cinematic", "pixel"], adapter_weights=[0.5, 0.5]) + ``` """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for `set_adapters()`.") @@ -535,7 +568,22 @@ def set_adapters( def disable_lora(self): """ - Disables the active LoRA layers for the unet. + Disable the UNet's active LoRA layers. + + Example: + + ```py + from diffusers import AutoPipelineForText2Image + import torch + + pipeline = AutoPipelineForText2Image.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") + pipeline.load_lora_weights( + "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic" + ) + pipeline.disable_lora() + ``` """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") @@ -543,7 +591,22 @@ def disable_lora(self): def enable_lora(self): """ - Enables the active LoRA layers for the unet. + Enable the UNet's active LoRA layers. + + Example: + + ```py + from diffusers import AutoPipelineForText2Image + import torch + + pipeline = AutoPipelineForText2Image.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") + pipeline.load_lora_weights( + "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic" + ) + pipeline.enable_lora() + ``` """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") @@ -551,10 +614,26 @@ def enable_lora(self): def delete_adapters(self, adapter_names: Union[List[str], str]): """ + Delete an adapter's LoRA layers from the UNet. + Args: - Deletes the LoRA layers of `adapter_name` for the unet. adapter_names (`Union[List[str], str]`): - The names of the adapter to delete. Can be a single string or a list of strings + The names (single string or list of strings) of the adapter to delete. + + Example: + + ```py + from diffusers import AutoPipelineForText2Image + import torch + + pipeline = AutoPipelineForText2Image.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") + pipeline.load_lora_weights( + "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_names="cinematic" + ) + pipeline.delete_adapters("cinematic") + ``` """ if not USE_PEFT_BACKEND: raise ValueError("PEFT backend is required for this method.") From 314f892746007686e1592ad3e8e0cd1c4e650fec Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Thu, 16 Nov 2023 16:31:20 -0800 Subject: [PATCH 5/7] add link to loralinearlayer --- src/diffusers/loaders/lora.py | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/diffusers/loaders/lora.py b/src/diffusers/loaders/lora.py index 1968bdfb8f02..bbeb2a6b84c4 100644 --- a/src/diffusers/loaders/lora.py +++ b/src/diffusers/loaders/lora.py @@ -500,7 +500,9 @@ def load_lora_into_unet( into the `unet` or prefixed with an additional `unet`, which can be used to distinguish between text encoder LoRA layers. network_alphas (`Dict[str, float]`): - See `LoRALinearLayer` for more details. + See + [`LoRALinearLayer`](https://github.com/huggingface/diffusers/blob/c697f524761abd2314c030221a3ad2f7791eab4e/src/diffusers/models/lora.py#L182) + for more details. unet (`UNet2DConditionModel`): The UNet model to load the LoRA layers into. low_cpu_mem_usage (`bool`, *optional*, defaults to `True` if torch version >= 1.9.0 else `False`): @@ -611,7 +613,9 @@ def load_lora_into_text_encoder( A standard state dict containing the LoRA layer parameters. The key should be prefixed with an additional `text_encoder` to distinguish between UNet LoRA layers. network_alphas (`Dict[str, float]`): - See `LoRALinearLayer` for more details. + See + [`LoRALinearLayer`](https://github.com/huggingface/diffusers/blob/c697f524761abd2314c030221a3ad2f7791eab4e/src/diffusers/models/lora.py#L182) + for more details. text_encoder (`CLIPTextModel`): The text encoder model to load the LoRA layers into. prefix (`str`): From 0cd100664beed41b13b8c404bdb78dbdfe231997 Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Fri, 17 Nov 2023 12:50:10 -0800 Subject: [PATCH 6/7] feedback --- docs/source/en/api/loaders/lora.md | 2 +- docs/source/en/api/loaders/unet.md | 2 +- src/diffusers/loaders/lora.py | 49 +++++++++++++++--------------- src/diffusers/loaders/unet.py | 14 +++++++++ 4 files changed, 40 insertions(+), 27 deletions(-) diff --git a/docs/source/en/api/loaders/lora.md b/docs/source/en/api/loaders/lora.md index 0739e3b477c3..05ff11afc5d4 100644 --- a/docs/source/en/api/loaders/lora.md +++ b/docs/source/en/api/loaders/lora.md @@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License. LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the UNet, text encoder or both. There are two classes for loading LoRA weights: - [`LoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model. -- [`StableDiffusionXLLoraLoaderMixin`] is a Stable Diffusion (SDXL) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model. +- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model. diff --git a/docs/source/en/api/loaders/unet.md b/docs/source/en/api/loaders/unet.md index ffa8f13e372d..df896a065eb3 100644 --- a/docs/source/en/api/loaders/unet.md +++ b/docs/source/en/api/loaders/unet.md @@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. # UNet -Some training methods - like LoRA and Custom Diffusion - target the UNet's attention processor layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead. +Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead. The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters. diff --git a/src/diffusers/loaders/lora.py b/src/diffusers/loaders/lora.py index bbeb2a6b84c4..d260da9c1298 100644 --- a/src/diffusers/loaders/lora.py +++ b/src/diffusers/loaders/lora.py @@ -94,13 +94,13 @@ def load_lora_weights( Parameters: pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`): A string (model id of a pretrained model hosted on the Hub), a path to a directory containing the model - weights saved with [`ModelMixin.save_pretrained`], or a [torch state + weights, or a [torch state dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict). kwargs (`dict`, *optional*): See [`~loaders.LoraLoaderMixin.lora_state_dict`]. adapter_name (`str`, *optional*): Name for referencing the loaded adapter model. If not specified, it will use `default_{i}` where `i` is - the total number of adapters being loaded. + the total number of adapters being loaded. Must have PEFT installed to use. Example: @@ -154,21 +154,13 @@ def lora_state_dict( r""" Return state dict and network alphas of the LoRA weights. - - - A1111 formatted LoRA checkpoints are supported in a limited capacity. This function is experimental and might - change in the future. - - - Parameters: pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`): Can be either: - A string, the *model id* (for example `google/ddpm-celebahq-256`) of a pretrained model hosted on the Hub. - - A path to a *directory* (for example `./my_model_directory`) containing the model weights saved - with [`ModelMixin.save_pretrained`]. + - A path to a *directory* (for example `./my_model_directory`) containing the model weights. - A [torch state dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict). @@ -204,18 +196,6 @@ def lora_state_dict( Mirror source to resolve accessibility issues if you're downloading a model in China. We do not guarantee the timeliness or safety of the source, and you should refer to the mirror site for more information. - - Example: - - ```py - from diffusers import DiffusionPipeline - import torch - - pipeline = DiffusionPipeline.from_pretrained( - "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 - ).to("cuda") - pipeline.lora_state_dict("nerijs/pixel-art-xl") - ``` """ # Load the main state dict first which has the LoRA layers for either of # UNet and text encoder or both. @@ -931,6 +911,25 @@ def save_lora_weights( `DIFFUSERS_SAVE_MODE`. safe_serialization (`bool`, *optional*, defaults to `True`): Whether to save the model using `safetensors` or with `pickle`. + + Example: + + ```py + from diffusers import StableDiffusionXLPipeline + from peft.utils import get_peft_model_state_dict + import torch + + pipeline = StableDiffusionXLPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 + ).to("cuda") + pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel") + pipeline.fuse_lora() + + # get and save unet state dict + unet_state_dict = get_peft_model_state_dict(pipeline.unet, adapter_name="pixel") + pipeline.save_lora_weights("fused-model", unet_lora_layers=unet_state_dict) + pipeline.load_lora_weights("fused-model", weight_name="pytorch_lora_weights.safetensors") + ``` """ # Create a flat dictionary. state_dict = {} @@ -1700,13 +1699,13 @@ def load_lora_weights( Parameters: pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`): A string (model id of a pretrained model hosted on the Hub), a path to a directory containing the model - weights saved with [`ModelMixin.save_pretrained`], or a [torch state + weights, or a [torch state dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict). kwargs (`dict`, *optional*): See [`~loaders.LoraLoaderMixin.lora_state_dict`]. adapter_name (`str`, *optional*): Name for referencing the loaded adapter model. If not specified, it will use `default_{i}` where `i` is - the total number of adapters being loaded. + the total number of adapters being loaded. Must have PEFT installed to use. Example: diff --git a/src/diffusers/loaders/unet.py b/src/diffusers/loaders/unet.py index 0d759e6cc349..9555ac9e7d8b 100644 --- a/src/diffusers/loaders/unet.py +++ b/src/diffusers/loaders/unet.py @@ -426,6 +426,20 @@ def save_attn_procs( `DIFFUSERS_SAVE_MODE`. safe_serialization (`bool`, *optional*, defaults to `True`): Whether to save the model using `safetensors` or with `pickle`. + + Example: + + ```py + import torch + from diffusers import DiffusionPipeline + + pipeline = DiffusionPipeline.from_pretrained( + "CompVis/stable-diffusion-v1-4", + torch_dtype=torch.float16, + ).to("cuda") + pipeline.unet.load_attn_procs("path-to-save-model", weight_name="pytorch_custom_diffusion_weights.bin") + pipeline.unet.save_attn_procs("path-to-save-model", weight_name="pytorch_custom_diffusion_weights.bin") + ``` """ from ..models.attention_processor import ( CustomDiffusionAttnProcessor, From 9769cf12ef70cadce1fc4272c371d3bab38b1233 Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Fri, 17 Nov 2023 13:25:02 -0800 Subject: [PATCH 7/7] fix --- docs/source/en/api/loaders/single_file.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/api/loaders/single_file.md b/docs/source/en/api/loaders/single_file.md index 6a07229bb665..52e44606455b 100644 --- a/docs/source/en/api/loaders/single_file.md +++ b/docs/source/en/api/loaders/single_file.md @@ -16,7 +16,7 @@ Diffusers supports loading pretrained pipeline (or model) weights stored in a si - [`FromSingleFileMixin`] supports loading pretrained pipeline weights stored in a single file, which can either be a `ckpt` or `safetensors` file. - [`FromOriginalVAEMixin`] supports loading a pretrained [`AutoencoderKL`] from pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file. -- [`FromOriginalControlnetMixin`] supports loading pretrained [`ControlNet`] weights stored in a single file, which can either be a `ckpt` or `safetensors` file. +- [`FromOriginalControlnetMixin`] supports loading pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.