In [None]:
!pip install -qU diffusers accelerate transformers huggingface_hub

In [None]:
from huggingface_hub import notebook_login
notebook_login()

# Load pipelines

## Load a pipeline

### Generic pipeline

The `DiffusionPipeline` class uses the `from_pretrained()` method to automatically detect the correct pipeline class for a task from the checkpoint, downloads and caches all the required configuration and weight files, and returns a pipeline ready for inference.

In [None]:
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    'stable-diffusion-v1-5/stable-diffusion-v1-5',
    use_safetensors=True,
)

The same checkpoint can also be used for an image-to-image task.

The `DiffusionPipeline` class can handle any tasks as long as we provide the appropriate inputs.

In [None]:
# for image-to-image task
# we keep the same pipeline instance
from diffusers.utils import load_image

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

image = pipeline(
    prompt,
    image=init_image,
).images[0]

### Specific pipeline

Checkpoints can also be loaded by their specific pipeline class if we already know it.

In [None]:
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    'stable-diffusion-v1-5/stable-diffusion-v1-5',
    use_safetensors=True,
)

The same checkpoint may also be used for another task like image-to-image. To differentiate what task we want to use the checkpoint for, we have to use the corresponding task-specific pipeline class. For example, to use the same checkpoint for image-to-image, we need to use the `StableDiffusionImg2ImgPipeline` class:

In [None]:
from diffusers import StableDiffusionImg2ImgPipeline

pipeline = StableDiffusionImg2ImgPipeline.from_pretrained(
    'stable-diffusion-v1-5/stable-diffusion-v1-5',
    use_safetensors=True,
)

### Local pipeline

We can also manually download a checkpoint to our local disk and load a pipeline locally:

In [None]:
from diffusers import DiffusionPipeline

stable_diffusion = DiffusionPipeline.from_pretrained(
    './stable-diffusion-v1-5',
    use_safetensors=True,
)

This assumes that the checkpoint file is the folder `./stable-diffusion-v1-5`. The `from_pretrained()` method will not download files from the Hub when it detects a local path.

## Customize a pipeline

We can customize a pipeline by loading different components into it. Then we can
* change to a scheduler with faster generation speed or higher generation quality depending on our needs
* change a default pipeline component to a newer and better performing one

For example, we can customize the default `stabilityai/stable-diffusion-xl-base1.0` checkpoint with:
* The `HeunDiscreteScheduler` to generate higher quality images at the expense of slower generation speed. We must pass the `subfolder="scheduler"` parameter in `from_pretrained()` to load the scheduler configuration into the correct subfolder of the pipeline repository.
* A more stable VAE that runs in `fp16`.

In [None]:
from diffusers import StableDiffusionXLPipeline, HeunDiscreteScheduler, AutoencoderKL
import torch

scheduler = HeunDiscreteScheduler.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    subfolder='scheduler',
)

vae = AutoencoderKL.from_pretrained(
    'madebyollin/sdxl-vae-fp16-fix',
    torch_dtype=torch.float16,
    use_safetensors=True,
)

Now we can pass the new scheduler and VAE to the `StableDiffusionXLPipeline`:

In [None]:
pipeline = StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    scheduler=scheduler,
    vae=vae,
    torch_dtype=torch.float16,
    variant='fp16',
    use_safetensors=True,
).to('cuda')

## Reuse a pipeline

When we load multiple pipelines that share the same model checkpoints, it makes sense to reuse the shared components instead of reloading everything into memory again, espeically if our hardware is memory-constrained.

With the `DiffusionPipeline.from_pipe()`, we can switch between multiple pipelines to take advantage of their different features without increasing memory-usage. It is similar to turning on and off a feature in our pipeline.

We will start with a `StableDiffusionPipeline` and then reuse the loaded model components to create a `StableDiffusionSAGPipeline` to increase generation quality. We will use the `StableDiffusionPipeline` with an `IP-Adapter` to generate a bear eating pizza.

In [None]:
from diffusers import DiffusionPipeline, StableDiffusionSAGPipeline
import torch
import gc
from diffusers.utils import load_image
from accelerate.utils import compute_module_sizes

In [None]:
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png")

pipe_sd = DiffusionPipeline.from_pretrained(
    'SG161222/Realistic_Vision_V6.0_B1_noVAE',
    torch_dtype=torch.float16,
)
# load IP-Adapter
pipe_sd.load_ip_adapter(
    'h94/IP-Adapter',
    subfolder='models',
    weight_name='ip-adapter_sd15.bin',
)
pipe_sd.set_ip_adapter_scale(0.6)
pipe_sd.to('cuda')

generator = torch.Generator(device='cpu').manual_seed(111)

In [None]:
out_sd = pipe_sd(
    prompt='bear eating pizza',
    negative_prompt="wrong white balance, dark, sketches, worst quality, low quality",
    ip_adapter_image=image,
    num_inference_steps=50,
    generator=generator,
).images[0]
out_sd

We can check how much memory this process consumed:

In [None]:
def bytes_to_giga_bytes(bytes):
    return bytes / 1024 / 1024 / 1024

print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB.")

Now we wil reuse the same pipeline components from `StableDiffusionPipeline` in the `StableDiffusionSAGPipeline` with the `from_pipe()` method:

In [None]:
pipe_sag = StableDiffusionSAGPipeline.from_pipe(
    pipe_sd,
)

generator = torch.Generator(device='cpu').manual_seed(111)

In [None]:
out_sag = pipe_sag(
    prompt='bear eating pizza',
    negative_prompt='wrong white balance, dark, sketches, worst quality, low quality',
    ip_adapter_image=image,
    num_inference_steps=50,
    generator=generator,
    guidance_scale=1.0,
    sag_scale=0.75,
).images[0]
out_sag

In [None]:
print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB.")

We can see that the memory usage remains the same as before because the `StableDiffusionPipeline` and the `StableDiffusionSAGPipeline` are sharing the same pipeline components.

We can animate the image with the `AnimateDiffPipeline` and also add a `MotionAdapter` module to the pipeline. For the `AnimateDiffPipeline`, we need to unload the IP-Adapter first and reload it *after* we have created our new pipeline (this only applies to the `AnimateDiffPipeline`).

In [None]:
from diffusers import AnimateDiffPipeline, MotionAdapter, DDIMScheduler
from diffusers.utils import export_to_gif

pipe_sag.unload_ip_adapter()
adapter = MotionAdapter.from_pretrained(
    'guoyww/animatediff-motion-adapter-v1-5-2',
    torch_dtype=torch.float16,
)

In [None]:
pipe_animate = AnimateDiffPipeline.from_pipe(
    pipe_sd,
    motion_adapter=adapter,
)
pipe_animate.scheduler = DDIMScheduler.from_config(
    pipe_animate.scheduler.config,
    beta_schedule='linear'
)

In [None]:
# load IP-Adapter and LoRA weights again
pipe_animate.load_ip_adapter(
    'h94/IP-Adapter',
    subfolder='models',
    weight_name='ip-adapter_sd15.bin',
)
pipe_animate.load_lora_weights(
    'guoyww/animatediff-motion-lora-zoom-out',
    adapter_name='zoom-out',
)
pipe_animate.to('cuda')

In [None]:
generator = torch.Generator(device='cpu').manual_seed(111)

pipe_animate.set_adapters('zoom-out', adapter_weights=0.75)
out = pipe_animate(
    prompt='bear eating pizza',
    num_frames=16,
    num_inference_steps=50,
    ip_adapter_image=image,
    generator=generator,
).frames[0]

export_to_gif(out, 'out_animate.gif')

In [None]:
print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB.")

### Modify `from_pipe` components

Pipelines loaded with `from_pipe()` can be customized with different model components or methods, but it affects all other pipelines that share the same components whenever we modify the **state** of the model components.

For example, if we call `unload_ip_adapter()` on the `StableDiffusionSAGPipeline`, we will not be able to use IP-Adapter with the `StableDiffusionPipeline` because the IP-Adapter has been removed from their shared components.

In [None]:
pipe.sag_unloaded_ip_adapter()

In [None]:
generator = torch.Generator(device='cpu').manual_seed(111)

out_sd = pipe_sd(
    prompt='bear eating pizza',
    negative_prompt='wrong white balance, dark, sketches, worst quality, low quality',
    ip_adapter_image=image,
    num_inference_steps=50,
    generator=generator,
).images[0]

## Safety checker

The safety checker screens the generated output against the not-safe-for-work (NSFW) content. To disable the safety checker, pass `safety_checker=None`:

In [None]:
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    'stable-diffusion-v1-5/stable-diffusion-v1-5',
    safety_checker=None,
    use_safetensors=True,
)

## Checkpoint variants

A **checkpoint variant** is a checkpoint whose weights are
* stored in a different floating point type, such as `torch.float16`, because it only requries half the bandwidth and storage to download. We cannot use this variant if we are continuing training or using a CPU.
* non-exponential mean averaged (EMA) weights which should not be used for inference. We should use this variant to continue finetuning a model.

When the checkpoints have identical model structures, but they were trained on different datasets and with a different training setup, they should be stored in separate repositories.

NOTE:
* `torch_dtype` specifies the floating point precision of the loaded checkpoint. If we want to save bandwidth by loading a fp16 variant, we should set `variant="fp16"` and `torch_dtype=torch.float16` to *convert the weights to fp16*. Otherwise, the `fp16` weights are converted to the default fp32 precision. If we only set `torch_dtype=torch.float16`, the default fpt32 are downloaded first and then converted to fp16.
* `variant` specifies which files should be loaded from the repository. If we want to load a non-EMA variant of a UNet from `stable-diffusion-v1-5/stable-diffusion-v1-5`, set `variant="non_ema"` to download the `non_ema` file.

In [None]:
from diffusers import DiffusionPipeline
import torch

# load a fp16 model
pipeline = DiffusionPipeline.from_pretrained(
    'stable-diffusion-v1-5/stable-diffusion-v1-5',
    variant='fp16',
    torch_dtype=torch.float16,
    use_safetensors=True,
)

In [None]:
# load a non-ema model
pipeline = DiffusionPipeline.from_pretrained(
    'stable-diffusion-v1-5/stable-diffusion-v1-5',
    variant='non_ema',
    use_safetensors=True,
)

Use the `variant` parameter in the `DiffusionPipeline.save_pretrained()` method to save a checkpoint as a different floating point type or as a non-EMA variant.

We should save a variant to the same folder as the original checkpoint, so we have the option of loading both from the same folder.

In [None]:
from diffusers import DiffusionPipeline

# save a fp16 model
pipeline.save_pretrained(
    'stable-diffusion-v1-5/stable-diffusion-v1-5',
    variant='fp16',
)

In [None]:
# save a non-ema model
pipeline.save_pretrained(
    'stable-diffusion-v1-5/stable-diffusion-v1-5',
    variant='non_ema',
)

# DiffusionPipeline explained

The `DiffusionPipeline.from_pretrained()` will
* download the latest version of the folder structure required for inference and cache it. If the latest folder structure is available in the local cache, `DiffusionPipeline.from_pretrained()` reuses the cache and will not redownload the files.
* load the cached weights into the correct pipelin class - retrieved from the `model_index.json` file - and return an instance of it.

The pipelines' underlying folder structure corresponds directly with their class instance.

In [None]:
from diffusers import DiffusionPipeline

repo_id = 'stable-diffusion-v1-5/stable-diffusion-v1-5'
pipeline = DiffusionPipeline.from_pretrained(repo_id, use_safetensors=True)

In [None]:
print(pipeline)

The `StableDiffusionPipeline` instance consists of
* `"feature_extractor"`: a `CLIPImageProcessor` from HuggingFace Transformers,
* `"safety_checker"`: a component for screening against harmful content,
* `"scheduler"`: an instance of `PNDMScheduler`,
* `"text_encoder"`: a `CLIPTextModel` from HuggingFace Transformers,
* `"tokenizer"`: a `CLIPTokenizer` from HuggingFace Transformers,
* `"unet"`: an instance of `UNet2DConditionModel`,
* `"vae"`: an instance of `AutoencoderKL`.

We can access each of the components of the pipeline as an attribute to view its configuration:

In [None]:
pipeline.tokenizer

Every pipeline expects a `model_index.json` file that tells the `DiffusionPipeline`:
* which pipeline class to load from `_class_name`,
* which verison of Diffusers was used to create the model in `_diffusers_version`
* what components from which library are stored in the subfolders (`name` corresponds to the component and subfolder name, `library` corresponds to the name of the library to load the class from, and `class` corresponds to the class name)