## **2. Load Pipeline**
> Original Source: https://huggingface.co/docs/diffusers/v0.33.1/en/using-diffusers/loading

```
> Load Pipelines
> Load Community Pipelines and Components
> Load Schedulers and Models
> Model Files and Layout
> Load Adapter
```

In [3]:
import os
import numpy as np
import torch
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
from PIL import Image
import gc

from diffusers import DiffusionPipeline
from diffusers import StableDiffusionPipeline
from diffusers import HunyuanVideoPipeline
from diffusers import StableDiffusionXLPipeline, HeunDiscreteScheduler, AutoencoderKL
from diffusers import AnimateDiffPipeline, MotionAdapter, DDIMScheduler
from diffusers import LMSDiscreteScheduler, EulerDiscreteScheduler, EulerAncestralDiscreteScheduler, DPMSolverMultistepScheduler
from diffusers import FlaxStableDiffusionPipeline, FlaxDPMSolverMultistepScheduler
from diffusers import UNet2DConditionModel, UNet2DModel, StableDiffusionControlNetPipeline, ControlNetModel
from diffusers import AutoPipelineForText2Image

from diffusers.utils import load_image
from accelerate.utils import compute_module_sizes
from diffusers.utils import export_to_gif
from transformers import CLIPVisionModelWithProjection

import jax
from flax.jax_utils import replicate
from flax.training.common_utils import shard
from huggingface_hub import export_folder_as_dduf

----

### **Load Pipelines**
- Diffusion systems consist of multiple components like parameterized models and schedulers that interact in complex ways.
  - That is why we designed the `DiffusionPipeline` to wrap the complexity of the entire diffusion system into an easy-to-use API.
  - At the same time, the `DiffusionPipeline` is entirely customizable so you can modify each component to build a diffusion system for your use case.


#### Load a pipeline
- There are two ways to load a pipeline for a task:
1. Load the generic `DiffusionPipeline` class and allow it to automatically detect the correct pipeline class from the checkpoint.
2. Load a specific pipeline class for a specific task.

- The `DiffusionPipeline` class is a simple and generic way to load the latest trending diffusion model from the Hub.
  - It uses the `from_pretrained()` method to automatically detect the correct pipeline class for a task from the checkpoint, downloads and caches all the required configuration and weight files, and returns a pipeline ready for inference.

In [5]:
# Generic Pipeline
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)

# Specific Pipeline
pipeline = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

#### [**👉👉 Diffusers Pipeline Memory Calculator**](https://huggingface.co/docs/diffusers/v0.33.1/en/using-diffusers/loading?pipelines=generic+pipeline#load-a-pipeline)

#### Specifying Component-Specific Data Types
- You can customize the data types for individual sub-models by passing a dictionary to the `torch_dtype` parameter.
- This allows you to load different components of a pipeline in different floating point precisions.
  - If you want to load the transformer with `torch.bfloat16` and all other components with `torch.float16`, you can pass a dictionary mapping:
- If a component is not explicitly specified in the dictionary and no default is provided, it will be loaded with `torch.float32`.

In [None]:
pipe = HunyuanVideoPipeline.from_pretrained(
    "hunyuanvideo-community/HunyuanVideo",
    torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
)
print(pipe.transformer.dtype, pipe.vae.dtype)  # (torch.bfloat16, torch.float16)

#### Local pipeline
- To load a pipeline locally, use git-lfs to manually download a checkpoint to your local disk.

```
git-lfs install
git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
```

- This creates a local folder, `./stable-diffusion-v1-5`, on your disk and you should pass its path to `from_pretrained()`.
  - The `from_pretrained()` method won’t download files from the Hub when it detects a local path, but this also means it **won’t download and cache the latest changes to a checkpoint**.

In [None]:
stable_diffusion = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", use_safetensors=True)

#### Customize a pipeline
- Customize the default `stabilityai/stable-diffusion-xl-base-1.0` checkpoint with:
- The `HeunDiscreteScheduler` to generate higher quality images at the expense of slower generation speed.
  - You must pass the `subfolder="scheduler"` parameter in `from_pretrained()` to load the scheduler configuration into the correct subfolder of the pipeline repository.
  - A more stable VAE that runs in fp16.

In [None]:
scheduler = HeunDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, use_safetensors=True)

In [None]:
# Pass the new schedular and VAE to the StableDiffusionXLPipeline
pipeline = StableDiffusionXLPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  scheduler=scheduler,
  vae=vae,
  torch_dtype=torch.float16,
  variant="fp16",
  use_safetensors=True
).to("cuda")

#### Reuse a pipeline
- When you load multiple pipelines that share the same model components, it makes sense to reuse the shared components instead of reloading everything into memory again, especially if your hardware is memory-constrained.
  - You generated an image with the `StableDiffusionPipeline` but you want to improve its quality with the `StableDiffusionSAGPipeline`.
    - **Both of these pipelines share the same pretrained model**, so it’d be a waste of memory to load the same model twice.
  - You want to add a model component, like a `MotionAdapter`, to `AnimateDiffPipeline` which was instantiated from an existing `StableDiffusionPipeline`.
    - **Both pipelines share the same pretrained model**, so it’d be a waste of memory to load an entirely new pipeline again.
   
- With the `DiffusionPipeline.from_pipe()` API, you can switch between multiple pipelines to take advantage of their different features without increasing memory-usage.

In [11]:
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png")

pipe_sd = DiffusionPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", torch_dtype=torch.float16)
pipe_sd.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipe_sd.set_ip_adapter_scale(0.6)
pipe_sd.to("cuda")

generator = torch.Generator(device="cpu").manual_seed(33)
out_sd = pipe_sd(
    prompt="bear eats pizza",
    negative_prompt="wrong white balance, dark, sketches,worst quality,low quality",
    ip_adapter_image=image,
    num_inference_steps=50,
    generator=generator,
).images[0]
out_sd

- For reference, you can check how much memory this process consumed.

In [None]:
def bytes_to_giga_bytes(bytes):
    return bytes / 1024 / 1024 / 1024
print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB")
"Max memory allocated: 4.406213283538818 GB"

- Reuse the same pipeline components from `StableDiffusionPipeline` in `StableDiffusionSAGPipeline` with the `from_pipe()` method.
- Some pipeline methods may not function properly on new pipelines created with `from_pipe()`.
  - The `enable_model_cpu_offload()` method installs hooks on the model components based on a unique offloading sequence for each pipeline.
  - If the models are executed in a different order in the new pipeline, the CPU offloading may not work correctly.
  - To ensure everything works as expected, we recommend **re-applying a pipeline method on a new pipeline created with `from_pipe()`**.

In [None]:
pipe_sag = StableDiffusionSAGPipeline.from_pipe(
    pipe_sd
)

generator = torch.Generator(device="cpu").manual_seed(33)
out_sag = pipe_sag(
    prompt="bear eats pizza",
    negative_prompt="wrong white balance, dark, sketches,worst quality,low quality",
    ip_adapter_image=image,
    num_inference_steps=50,
    generator=generator,
    guidance_scale=1.0,
    sag_scale=0.75
).images[0]
out_sag

In [None]:
print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB")

#### Modify `from_pipe` components
- Pipelines loaded with `from_pipe()` can be customized with different model components or methods.
  - Whenever you modify the state of the model components, it affects all the other pipelines that share the same components.
  - If you call `unload_ip_adapter()` on the `StableDiffusionSAGPipeline`, you won’t be able to use IP-Adapter with the `StableDiffusionPipeline` because it’s been removed from their shared components.

In [13]:
pipe.sag_unload_ip_adapter()

generator = torch.Generator(device="cpu").manual_seed(33)
out_sd = pipe_sd(
    prompt="bear eats pizza",
    negative_prompt="wrong white balance, dark, sketches,worst quality,low quality",
    ip_adapter_image=image,
    num_inference_steps=50,
    generator=generator,
).images[0]

#### Safety checker

- Diffusers implements a safety checker for Stable Diffusion models which can **generate harmful content**.
  - The safety checker screens the generated output against known hardcoded `not-safe-for-work (NSFW)` content.
  - If for whatever reason you’d like to disable the safety checker, pass `safety_checker=None` to the `from_pretrained()` method.

In [None]:
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", safety_checker=None, use_safetensors=True)

#### Checkpoint variants
- A checkpoint variant is usually a checkpoint whose weights are:
  - Stored in a different floating point type, such as `torch.float16`, because it only requires half the bandwidth and storage to download.
  - You can’t use this variant if you’re continuing training or using a CPU.
- Non-exponential mean averaged (EMA) weights which shouldn’t be used for inference.
  - You should use this variant to continue finetuning a model.
- When the checkpoints have identical model structures, but they were trained on different datasets and with a different training setup, they should be stored in separate repositories.
- A variant is identical to the original checkpoint. They have exactly the same serialization format (like safetensors), model structure, and their weights have identical tensor shapes.
  - `torch_dtype` specifies the floating point precision of the loaded checkpoint.
  - If you want to save bandwidth by loading a fp16 variant, you should set `variant="fp16"` and `torch_dtype=torch.float16` to convert the weights to fp16.
  - If you only set `torch_dtype=torch.float16`, the default fp32 weights are downloaded first and then converted to fp16.

- Variant specifies which files should be loaded from the repository.
  - If you want to load a non-EMA variant of a UNet from `stable-diffusion-v1-5/stable-diffusion-v1-5`, set `variant="non_ema"` to download the `non_ema` file.

In [None]:
# fp16
pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
)

# non-EMA
pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema", use_safetensors=True
)

#### DiffusionPipeline
- As a class method, `DiffusionPipeline.from_pretrained()` is responsible for two things:
  - Download the latest version of the folder structure required for inference and cache it.
    - If the latest folder structure is available in the local cache, `DiffusionPipeline.from_pretrained()` reuses the cache and won’t redownload the files.
  - Load the cached weights into the correct pipeline class - retrieved from the `model_index.json` file - and return an instance of it.
  - The pipelines’ underlying folder structure corresponds directly with their class instances.
  - The `StableDiffusionPipeline` corresponds to the folder structure in `stable-diffusion-v1-5/stable-diffusion-v1-5`.

| Object Name| Description |
|--------------|-------------------------------------|
| feature_extractor | a CLIPImageProcessor from Transformers. |
| safety_checker | a component for screening against harmful content. |
| scheduler  | an instance of PNDMScheduler. |
| text_encoder | a CLIPTextModel from Transformers. |
| tokenizer | a CLIPTokenizer from Transformers. |
| unet |  an instance of UNet2DConditionModel. |
| vae |  an instance of AutoencoderKL. |


- **Components in the repository**
```
.
├── feature_extractor
│   └── preprocessor_config.json
├── model_index.json
├── safety_checker
│   ├── config.json
|   ├── model.fp16.safetensors
│   ├── model.safetensors
│   ├── pytorch_model.bin
|   └── pytorch_model.fp16.bin
├── scheduler
│   └── scheduler_config.json
├── text_encoder
│   ├── config.json
|   ├── model.fp16.safetensors
│   ├── model.safetensors
│   |── pytorch_model.bin
|   └── pytorch_model.fp16.bin
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet
│   ├── config.json
│   ├── diffusion_pytorch_model.bin
|   |── diffusion_pytorch_model.fp16.bin
│   |── diffusion_pytorch_model.f16.safetensors
│   |── diffusion_pytorch_model.non_ema.bin
│   |── diffusion_pytorch_model.non_ema.safetensors
│   └── diffusion_pytorch_model.safetensors
|── vae
.   ├── config.json
.   ├── diffusion_pytorch_model.bin
    ├── diffusion_pytorch_model.fp16.bin
    ├── diffusion_pytorch_model.fp16.safetensors
    └── diffusion_pytorch_model.safetensors
```

In [14]:
repo_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
pipeline = DiffusionPipeline.from_pretrained(repo_id, use_safetensors=True)
print(pipeline)

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

StableDiffusionPipeline {
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.33.1",
  "_name_or_path": "stable-diffusion-v1-5/stable-diffusion-v1-5",
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "image_encoder": [
    null,
    null
  ],
  "requires_safety_checker": true,
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}



---------

### **Load Community Pipelines and Components**
#### Load from a local file
- Community pipelines can also be loaded from a local file if you pass a file path instead.
  - The path to the passed directory must contain a pipeline.py file that contains the pipeline class.

In [None]:
pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    custom_pipeline="./path/to/pipeline_directory/",
    clip_model=clip_model,
    feature_extractor=feature_extractor,
    use_safetensors=True,
)

#### Load from a specific version
- By default, community pipelines are loaded from the latest stable version of Diffusers.
  - To load a community pipeline from another version, use the `custom_revision` parameter.

In [None]:
pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    custom_pipeline="clip_guided_stable_diffusion",
    custom_revision="main",
    clip_model=clip_model,
    feature_extractor=feature_extractor,
    use_safetensors=True,
)

#### Load with from_pipe
- Community pipelines can also be loaded with the `from_pipe()` method which allows you to load and reuse multiple pipelines without any additional memory overhead (learn more in the Reuse a pipeline guide).
  - The memory requirement is determined by the largest single pipeline loaded.
 
- Load a community pipeline that supports long prompts with weighting from a Stable Diffusion pipeline.

In [None]:
pipe_sd = DiffusionPipeline.from_pretrained("emilianJR/CyberRealistic_V3", torch_dtype=torch.float16)
pipe_sd.to("cuda")
# load long prompt weighting pipeline
pipe_lpw = DiffusionPipeline.from_pipe(
    pipe_sd,
    custom_pipeline="lpw_stable_diffusion",
).to("cuda")

prompt = "cat, hiding in the leaves, ((rain)), zazie rainyday, beautiful eyes, macro shot, colorful details, natural lighting, amazing composition, subsurface scattering, amazing textures, filmic, soft light, ultra-detailed eyes, intricate details, detailed texture, light source contrast, dramatic shadows, cinematic light, depth of field, film grain, noise, dark background, hyperrealistic dslr film still, dim volumetric cinematic lighting"
neg_prompt = "(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers:1.4), (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
generator = torch.Generator(device="cpu").manual_seed(20)
out_lpw = pipe_lpw(
    prompt,
    negative_prompt=neg_prompt,
    width=512,
    height=512,
    max_embeddings_multiples=3,
    num_inference_steps=50,
    generator=generator,
    ).images[0]
out_lpw

### **Load Schedulers and Models**
- Diffusion pipelines are a collection of interchangeable schedulers and models that can be mixed and matched to tailor a pipeline to a specific use case.
  - The scheduler encapsulates the entire denoising process such as the number of denoising steps and the algorithm for finding the denoised sample.
  - A scheduler is not parameterized or trained so they don’t take very much memory.
  - The model is usually only concerned with the forward pass of going from a noisy input to a less noisy sample.

In [15]:
pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
).to("cuda")

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

In [16]:
pipeline.scheduler

PNDMScheduler {
  "_class_name": "PNDMScheduler",
  "_diffusers_version": "0.33.1",
  "beta_end": 0.012,
  "beta_schedule": "scaled_linear",
  "beta_start": 0.00085,
  "clip_sample": false,
  "num_train_timesteps": 1000,
  "prediction_type": "epsilon",
  "set_alpha_to_one": false,
  "skip_prk_steps": true,
  "steps_offset": 1,
  "timestep_spacing": "leading",
  "trained_betas": null
}

#### Load a scheduler
- Schedulers are defined by a configuration file that can be used by a variety of schedulers.
  - Load a scheduler with the `SchedulerMixin.from_pretrained()` method, and specify the subfolder parameter to load the configuration file into the correct subfolder of the pipeline repository.

In [20]:
ddim = DDIMScheduler.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="scheduler")

- Then you can pass the newly loaded scheduler to the pipeline.

In [None]:
pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", scheduler=ddim, torch_dtype=torch.float16, use_safetensors=True
).to("cuda")

#### Compare schedulers
- Schedulers have their own unique strengths and weaknesses, making it difficult to quantitatively compare which scheduler works best for a pipeline.
  - You typically have to make a trade-off between denoising speed and denoising quality.
  - We recommend trying out different schedulers to find one that works best for your use case.
  - Call the `pipeline.scheduler.compatibles` attribute to see what schedulers are compatible with a pipeline.

In [30]:
pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
).to("cuda")

prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition."
generator = torch.Generator(device="cuda").manual_seed(8)

- To change the pipelines scheduler, use the `from_config()` method to load a different scheduler’s `pipeline.scheduler.config` into the pipeline.

In [31]:
pipeline.scheduler = LMSDiscreteScheduler.from_config(pipeline.scheduler.config)
pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
image = pipeline(prompt, generator=generator).images[0]
image

#### Flax schedulers
- To compare Flax schedulers, you need to additionally load the scheduler state into the model parameters.
- Change the default scheduler in `FlaxStableDiffusionPipeline` to use the super fast `FlaxDPMSolverMultistepScheduler`.

In [41]:
scheduler, scheduler_state = FlaxDPMSolverMultistepScheduler.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    subfolder="scheduler"
)
pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    scheduler=scheduler,
    variant="bf16",
    dtype=jax.numpy.bfloat16,
)
params["scheduler"] = scheduler_state

- Take advantage of Flax’s compatibility with TPUs to generate a number of images in parallel.
  - You’ll need to make a copy of the model parameters for each available device and then split the inputs across them to generate your desired number of images.

In [None]:
# Generate 1 image per parallel device (8 on TPUv2-8 or TPUv3-8)
prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition."
num_samples = jax.device_count()
prompt_ids = pipeline.prepare_inputs([prompt] * num_samples)

prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 25

# shard inputs and rng
params = replicate(params)
prng_seed = jax.random.split(prng_seed, jax.device_count())
prompt_ids = shard(prompt_ids)

images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))

#### Models
- Models are loaded from the `ModelMixin.from_pretrained()` method, which downloads and caches the latest version of the model weights and configurations.
  - If the latest files are available in the local cache, `from_pretrained()` reuses files in the cache instead of re-downloading them.

- Models can be loaded from a subfolder with the subfolder argument.
  - The model weights for `stable-diffusion-v1-5/stable-diffusion-v1-5` are stored in the unet subfolder.

In [None]:
unet = UNet2DConditionModel.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet", use_safetensors=True)

# Directly loaded from a repository
unet = UNet2DModel.from_pretrained("google/ddpm-cifar10-32", use_safetensors=True)

- To load and save model variants, specify the variant argument in `ModelMixin.from_pretrained()` and `ModelMixin.save_pretrained()`.

In [None]:
unet = UNet2DConditionModel.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet", variant="non_ema", use_safetensors=True
)
unet.save_pretrained("./local-unet", variant="non_ema")

--------------
### **Model Files and Layout**
- Diffusion models are saved in various file types and organized in different layouts.
  - Diffusers stores model weights as safetensors files in Diffusers-multifolder layout and it also supports loading files (like safetensors and ckpt files) from a single-file layout which is commonly used in the diffusion ecosystem.
 
#### Files
- PyTorch model weights are typically saved with Python’s pickle utility as ckpt or bin files.
  - However, pickle is not secure and pickled files may contain malicious code that can be executed.
  - This vulnerability is a serious concern given the popularity of model sharing.
- To address this security issue, the Safetensors library was developed as a secure alternative to pickle, which saves models as safetensors files.

#### SafeTensors
- Safetensors is a safe and fast file format for securely storing and loading tensors.
  - Safetensors restricts the header size to limit certain types of attacks, supports lazy loading (useful for distributed setups), and has generally faster loading speeds.
- Safetensors stores weights in a safetensors file.
  - Diffusers loads safetensors files by default if they’re available and the Safetensors library is installed.
 
- There are two ways safetensors files can be organized:
  - **Diffusers-multifolder layout**: there may be several separate safetensors files, one for each pipeline component (text encoder, UNet, VAE), organized in subfolders (check out the stable-diffusion-v1-5/stable-diffusion-v1-5 repository as an example)
  - **single-file layout**: all the model weights may be saved in a single file (check out the WarriorMama777/OrangeMixs repository as an example)

In [None]:
# Multi-folder
pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    use_safetensors=True
)

# Single-file
pipeline = StableDiffusionPipeline.from_single_file(
    "https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/Models/AbyssOrangeMix/AbyssOrangeMix.safetensors"
)

#### LoRA files
- LoRA is a lightweight adapter that is fast and easy to train, making them especially popular for generating images in a certain way or style.
  - These adapters are commonly stored in a safetensors file, and are widely popular on model sharing platforms like civitai.
  - LoRAs are loaded into a base model with the `load_lora_weights()` method.

In [None]:
# base model
pipeline = StableDiffusionXLPipeline.from_pretrained(
    "Lykon/dreamshaper-xl-1-0", torch_dtype=torch.float16, variant="fp16"
).to("cuda")

# download LoRA weights
!wget https://civitai.com/api/download/models/168776 -O blueprintify.safetensors

# load LoRA weights
pipeline.load_lora_weights(".", weight_name="blueprintify.safetensors")
prompt = "bl3uprint, a highly detailed blueprint of the empire state building, explaining how to build all parts, many txt, blueprint grid backdrop"
negative_prompt = "lowres, cropped, worst quality, low quality, normal quality, artifacts, signature, watermark, username, blurry, more than one bridge, bad architecture"

image = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    generator=torch.manual_seed(0),
).images[0]
image

#### ckpt
- Pickled files may be unsafe because they can be exploited to execute malicious code.
  - It is recommended to use safetensors files instead where possible, or convert the weights to safetensors files.
- PyTorch’s torch.save function uses Python’s pickle utility to serialize and save models.
  - These files are saved as a ckpt file and they contain the entire model’s weights.

- Use the `from_single_file()` method to directly load a ckpt file.

In [None]:
pipeline = StableDiffusionPipeline.from_single_file(
    "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned.ckpt"
)

#### Storage layout: Diffusers-multifolder
- There are two ways model files are organized, either in a Diffusers-multifolder layout or in a single-file layout.
  - The Diffusers-multifolder layout is the default, and each component file (text encoder, UNet, VAE) is stored in a separate subfolder.
  - Diffusers also supports loading models from a single-file layout where all the components are bundled together.

In [None]:
pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
).to("cuda")

- **Benefits**
  - Faster to load each component file individually or in parallel.
  - Reduced memory usage because you only load the components you need.
    - Models like SDXL Turbo, SDXL Lightning, and Hyper-SD have the same components except for the UNet.
    - You can reuse their shared components with the `from_pipe()` method without consuming any additional memory (take a look at the Reuse a pipeline guide) and only load the UNet.
    - This way, you don’t need to download redundant components and unnecessarily use more memory.
  - Reduced storage requirements because if a component, such as the SDXL VAE, is shared across multiple models, you only need to download and store a single copy of it instead of downloading and storing it multiple times.
    - For 10 SDXL models, this can save ~3.5GB of storage. The storage savings is even greater for newer models like PixArt Sigma, where the text encoder alone is ~19GB!
    - Flexibility to replace a component in the model with a newer or better version.
  - Flexibility to replace a component in the model with a newer or better version.
  - More visibility and information about a model’s components, which are stored in a config.json file in each component subfolder.

In [None]:
# download one model
sdxl_pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
).to("cuda")

# switch UNet for another model
unet = UNet2DConditionModel.from_pretrained(
    "stabilityai/sdxl-turbo",
    subfolder="unet",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)
# reuse all the same components in new model except for the UNet
turbo_pipeline = StableDiffusionXLPipeline.from_pipe(
    sdxl_pipeline, unet=unet,
).to("cuda")
turbo_pipeline.scheduler = EulerDiscreteScheduler.from_config(
    turbo_pipeline.scheduler.config,
    timestep+spacing="trailing"
)
image = turbo_pipeline(
    "an astronaut riding a unicorn on mars",
    num_inference_steps=1,
    guidance_scale=0.0,
).images[0]
image

In [None]:
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, use_safetensors=True)
pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
).to("cuda")

#### Storage layout: Single-file
- The single-file layout stores all the model weights in a single file.
  - All the model components (text encoder, UNet, VAE) weights are kept together instead of separately in subfolders.
  - This can be a safetensors or ckpt file.
- To load from a single-file layout, use the `from_single_file()` method.
- **Benefits**
  - Easy compatibility with diffusion interfaces such as ComfyUI or Automatic1111 which commonly use a single-file layout.
  - Easier to manage (download and share) a single file.

In [None]:
pipeline = StableDiffusionXLPipeline.from_single_file(
    "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
).to("cuda")

#### Storage layout: DDUF
- DDUF(DDUF Diffusion Unified Format) is a file format designed to make storing, distributing, and using diffusion models much easier.
  - Built on the ZIP file format, DDUF offers a standardized, efficient, and flexible way to package all parts of a diffusion model into a single, easy-to-manage file.
  - It provides a balance between Diffusers multi-folder format and the widely popular single-file format.
- Pass a checkpoint to the dduf_file parameter to load it in DiffusionPipeline.

In [None]:
pipe = DiffusionPipeline.from_pretrained(
    "DDUF/FLUX.1-dev-DDUF", dduf_file="FLUX.1-dev.dduf", torch_dtype=torch.bfloat16
).to("cuda")
image = pipe(
    "photo a cat holding a sign that says Diffusers", num_inference_steps=50, guidance_scale=3.5
).images[0]
image.save("cat.png")

- To save a pipeline as a .dduf checkpoint, use the `export_folder_as_dduf` utility, which takes care of all the necessary file-level validations.

```
Packaging and loading quantized checkpoints in the DDUF format is supported as long as they respect the multi-folder structure.

## Convert layout and files

Diffusers provides many scripts and methods to convert storage layouts and file formats to enable broader support across the diffusion ecosystem.

Take a look at the [diffusers/scripts](https://github.com/huggingface/diffusers/tree/main/scripts) collection to find a script that fits your conversion needs.

> Scripts that have "`to_diffusers`" appended at the end mean they convert a model to the Diffusers-multifolder layout. Each script has their own specific set of arguments for configuring the conversion, so make sure you check what arguments are available!

For example, to convert a Stable Diffusion XL model stored in Diffusers-multifolder layout to a single-file layout, run the [convert_diffusers_to_original_sdxl.py](https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_to_original_sdxl.py) script. Provide the path to the model to convert, and the path to save the converted model to. You can optionally specify whether you want to save the model as a safetensors file and whether to save the model in half-precision.
```

```bash
python convert_diffusers_to_original_sdxl.py --model_path path/to/model/to/convert --checkpoint_path path/to/save/model/to --use_safetensors
```

In [None]:
pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)

save_folder = "flux-dev"
pipe.save_pretrained("flux-dev")
export_folder_as_dduf("flux-dev.dduf", folder_path=save_folder)

- You can also save a model to Diffusers-multifolder layout with the `save_pretrained()` method.
  - This creates a directory for you if it doesn’t already exist, and it also saves the files as a safetensors file by default.

In [None]:
pipeline = StableDiffusionXLPipeline.from_single_file(
    "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors",
)
pipeline.save_pretrained()

#### Single-file layout usage
- Pass the file path of the pipeline or model to the from_single_file() method to load it.

In [None]:
# Pipeline
ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path)

# Model
ckpt_path = "https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_b_lite.safetensors"
model = StableCascadeUNet.from_single_file(ckpt_path)

- Customize components in the pipeline by passing them directly to the `from_single_file()` method.
  - You can use a different scheduler in a pipeline.

In [None]:
ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
scheduler = DDIMScheduler()
pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path, scheduler=scheduler)

- Or you could use a ControlNet model in the pipeline.

In [None]:
ckpt_path = "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors"
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny")
pipeline = StableDiffusionControlNetPipeline.from_single_file(ckpt_path, controlnet=controlnet)

### **Load Adapter**
- There are several training techniques for personalizing diffusion models to generate images of a specific subject or images in certain styles.
  - Each of these training methods produces a different type of adapter.
  - Some of the adapters generate an entirely new model, while other adapters only modify a smaller set of embeddings or weights.
  - This means the loading process for each adapter is also different.

#### DreamBooth
- DreamBooth finetunes an entire diffusion model on just several images of a subject to generate images of that subject in new styles and settings.
  - This method works by using a special word in the prompt that the model learns to associate with the subject image.
  - Of all the training methods, DreamBooth produces the largest file size (usually a few GBs) because it is a full checkpoint model.

- Load the `herge_style` checkpoint, which is trained on just 10 images drawn by Hergé, to generate images in that style.
  - For it to work, you need to include the special word `herge_style` in your prompt to trigger the checkpoint:

In [50]:
pipeline = AutoPipelineForText2Image.from_pretrained("sd-dreambooth-library/herge-style", torch_dtype=torch.float16).to("cuda")
prompt = "A cute herge_style brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration"
image = pipeline(prompt).images[0]
image

#### Textual inversion
- Textual inversion is very similar to DreamBooth and it can also personalize a diffusion model to generate certain concepts (styles, objects) from just a few images.
  - This method works by training and finding new embeddings that represent the images you provide with a special word in the prompt.
  - As a result, the diffusion model weights stay the same and the training process produces a relatively tiny (a few KBs) file.

In [None]:
pipeline = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", 
                                                     torch_dtype=torch.float16).to("cuda")

- Load the textual inversion embeddings with the `load_textual_inversion()` method and generate some images.
  - Let’s load the `sd-concepts-library/gta5-artwork` embeddings and you’ll need to include the special word <gta5-artwork> in your prompt to trigger it:

In [None]:
pipeline.load_textual_inversion("sd-concepts-library/gta5-artwork")
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style"
image = pipeline(prompt).images[0]
image

- Textual inversion can also be trained on undesirable things to create negative embeddings to discourage a model from generating images with those undesirable things like blurry images or extra fingers on a hand.
  - This can be an easy way to quickly improve your prompt.
  - You’ll also load the embeddings with load_textual_inversion(), but this time, you’ll need two more parameters:
    - `weight_name`: specifies the weight file to load if the file was saved in the Diffusers format with a specific name or if the file is stored in the A1111 format
    - `token`: specifies the special word to use in the prompt to trigger the embeddings
- Load the sayakpaul/EasyNegative-test embeddings and use the token to generate an image with the negative embeddings:

In [None]:
pipeline.load_textual_inversion(
    "sayakpaul/EasyNegative-test", weight_name="EasyNegative.safetensors", token="EasyNegative"
)
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, EasyNegative"
negative_prompt = "EasyNegative"

image = pipeline(prompt, negative_prompt=negative_prompt, num_inference_steps=50).images[0]
image

#### LoRA
- Low-Rank Adaptation (LoRA) is a popular training technique because it is fast and generates smaller file sizes (a couple hundred MBs).
  - LoRA can train a model to learn new styles from just a few images.
  - It works by inserting new weights into the diffusion model and then only the new weights are trained instead of the entire model.
  - This makes LoRAs faster to train and easier to store.
  - Use the `load_lora_weights()` method to load the `ostris/super-cereal-sdxl-lora` weights and specify the weights filename from the repository:

In [None]:
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", 
                                                     torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("ostris/super-cereal-sdxl-lora", weight_name="cereal_box_sdxl_v1.safetensors")
prompt = "bears, pizza bites"
image = pipeline(prompt).images[0]
image

- The `load_lora_weights()` method loads LoRA weights into both the UNet and text encoder.
  - It is the preferred way for loading LoRAs because it can handle cases where:
    - The LoRA weights don’t have separate identifiers for the UNet and text encoder
    - the LoRA weights have separate identifiers for the UNet and text encoder
  - To directly load (and save) a LoRA adapter at the model-level, use `~PeftAdapterMixin.load_lora_adapter`, which builds and prepares the necessary model configuration for the adapter.
  - Like `load_lora_weights()`, `PeftAdapterMixin.load_lora_adapter` can load LoRAs for both the UNet and text encoder.
  - Use the `weight_name` parameter to specify the specific weight file and the prefix parameter to filter for the appropriate state dicts ("unet" in this case) to load.

In [None]:
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", 
                                                     torch_dtype=torch.float16).to("cuda")
pipeline.unet.load_lora_adapter("jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", prefix="unet")

# use cnmt in the prompt to trigger the LoRA
prompt = "A cute cnmt eating a slice of pizza, stunning color scheme, masterpiece, illustration"
image = pipeline(prompt).images[0]
image

- Save an adapter with `~PeftAdapterMixin.save_lora_adapter`.
  - To unload the LoRA weights, use the `unload_lora_weights()` method to discard the LoRA weights and restore the model to its original weights:

In [None]:
pipeline.unload_lora_weights()

#### Adjust LoRA weight scale
- For both `load_lora_weights()` and `load_attn_procs()`, you can pass the `cross_attention_kwargs={"scale": 0.5}` parameter to adjust how much of the LoRA weights to use.
  - A value of 0 is the same as only using the base model weights, and a value of 1 is equivalent to using the fully finetuned LoRA.

- For more granular control on the amount of LoRA weights used per layer, you can use `set_adapters()` and pass a dictionary specifying by how much to scale the weights in each layer by.

In [None]:
pipe = ... # create pipeline
pipe.load_lora_weights(..., adapter_name="my_adapter")
scales = {
    "text_encoder": 0.5,
    "text_encoder_2": 0.5,  # only usable if pipe has a 2nd text encoder
    "unet": {
        "down": 0.9,  # all transformers in the down-part will use scale 0.9
        # "mid"  # in this example "mid" is not given, therefore all transformers in the mid part will use the default scale 1.0
        "up": {
            "block_0": 0.6,  # all 3 transformers in the 0th block in the up-part will use scale 0.6
            "block_1": [0.4, 0.8, 1.0],  # the 3 transformers in the 1st block in the up-part will use scales 0.4, 0.8 and 1.0 respectively
        }
    }
}
pipe.set_adapters("my_adapter", scales)

#### Hotswapping LoRA adapters
- A common use case when serving multiple adapters is to load one adapter first, generate images, load another adapter, generate more images, load another adapter, etc.
  - This workflow normally requires calling `load_lora_weights()`, `set_adapters()`, and possibly `delete_adapters()` to save memory.
  - Moreover, if the model is compiled using `torch.compile`, performing these steps requires recompilation, which takes time.

- To better support this common workflow, you can **hotswap** a LoRA adapter, to avoid accumulating memory and in some cases, recompilation.
  - It requires an adapter to already be loaded, and the new adapter weights are swapped in-place for the existing adapter.

- Pass `hotswap=True` when loading a LoRA adapter to enable this feature.
  - It is important to indicate the name of the existing adapter, (`default_0` is the default adapter name), to be swapped.
  - If you loaded the first adapter with a different name, use that name instead.

In [None]:
pipe = ...
# load adapter 1 as normal
pipeline.load_lora_weights(file_name_adapter_1)
# generate some images with adapter 1
...
# now hot swap the 2nd adapter
pipeline.load_lora_weights(file_name_adapter_2, hotswap=True, adapter_name="default_0")
# generate images with adapter 2

- For compiled models, it is often (though not always if the second adapter targets identical LoRA ranks and scales) necessary to call `enable_lora_hotswap()` to avoid recompilation.
  - Use `enable_lora_hotswap()` before loading the first adapter, and `torch.compile` should be called after loading the first adapter.
 
- The `target_rank=max_rank` argument is important for setting the maximum rank among all LoRA adapters that will be loaded.
  - If you have one adapter with rank 8 and another with rank 16, pass `target_rank=16`.
  - You should use a higher value if in doubt. By default, this value is 128.

- There can be situations where recompilation is unavoidable. For example, if the hotswapped adapter targets more layers than the initial adapter, then recompilation is triggered. 
  - Try to load the adapter that targets the most layers first.
  - Refer to the PEFT docs on hotswapping for more details about the limitations of this feature.

In [None]:
pipe = ...
# call this extra method
pipe.enable_lora_hotswap(target_rank=max_rank)
# now load adapter 1
pipe.load_lora_weights(file_name_adapter_1)
# now compile the unet of the pipeline
pipe.unet = torch.compile(pipeline.unet, ...)
# generate some images with adapter 1
...
# now hot swap adapter 2
pipeline.load_lora_weights(file_name_adapter_2, hotswap=True, adapter_name="default_0")
# generate images with adapter 2

#### Kohya and TheLastBen
- Other popular LoRA trainers from the community include those by Kohya and TheLastBen.
  - These trainers create different LoRA checkpoints than those trained by Diffusers, but they can still be loaded in the same way.
 
- Kohya
```
!wget https://civitai.com/api/download/models/168776 -O blueprintify-sd-xl-10.safetensors
```

In [None]:
# Kohya
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("path/to/weights", weight_name="blueprintify-sd-xl-10.safetensors")

# TheLastBen
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("TheLastBen/William_Eggleston_Style_SDXL", weight_name="wegg.safetensors")

# use by william eggleston in the prompt to trigger the LoRA
prompt = "a house by william eggleston, sunrays, beautiful, sunlight, sunrays, beautiful"
image = pipeline(prompt=prompt).images[0]
image

#### IP-Adapter
- IP-Adapter is a lightweight adapter that enables image prompting for any diffusion model.
  - This adapter works by **decoupling the cross-attention layers** of the image and text features.
  - All the other model components are frozen and only the embedded image features in the UNet are trained.
  - As a result, IP-Adapter files are typically only ~100MBs.

- Load the IP-Adapter weights and add it to the pipeline with the `load_ip_adapter()` method.

In [None]:
pipeline = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png")
generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality, wearing sunglasses',
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
    num_inference_steps=50,
    generator=generator,
).images[0]
images

#### IP-Adapter Plus
- P-Adapter relies on an image encoder to generate image features.
  - If the IP-Adapter repository contains an `image_encoder` subfolder, the image encoder is automatically loaded and registered to the pipeline.
  - Otherwise, you’ll need to explicitly load the image encoder with a `CLIPVisionModelWithProjection` model and pass it to the pipeline.

- This is the case for IP-Adapter Plus checkpoints which use the ViT-H image encoder.

In [None]:
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter",
    subfolder="models/image_encoder",
    torch_dtype=torch.float16
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    image_encoder=image_encoder,
    torch_dtype=torch.float16
).to("cuda")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter-plus_sdxl_vit-h.safetensors")