# Latent Consistency Model

**Latent Consistency Models (LCMs) enable fast high-quality image generation by directly predicting the reverse diffusion process in the latent rather than pixel space.

LCMs try to predict the noiseless image from the noisy image in contrast to typical diffusion models that iteratively remove noise from the noisy image. By avoiding the iterative sampling process, LCMs are able to generate high-quality images in 2-4 steps instead of 20-30 steps.

LCMs are distilled from pretrained models which requires ~32 hours of A100 compute. To speed this up, `LCM-LoRAs` train a `LoRA adapter` which have much fewer parameters to train compared to the full models. The LCM-LoRA can be plugged into a diffusion model once it has been trained.

## Text-to-image

##### LCM

To use LCMs, we need to load the LCM checkpoint for our supported model into `UNet2DConditonModel` and replace the scheduler with the `LCMScheduler`.

Note to LCM:
* Batch size is doubled inside the pipeline for classifier-free guidance, but LCM applies guidance with guidance embeddings and does not need to double the batch size, which leads to faster inference. The downside is that negative prompts do not work with LCM because they do not have any effect on the denoising process.
* The ideal range for `guidance_scale` is between 3 and 13 because that is what the UNet was trained with. However, disabling `guidance_scale` with a a value of 1.0 is also effective in most cases.

In [None]:
from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
    'latent-consistency/lcm-sdxl',
    torch_dtype=torch.float16,
    variant='fp16'
)

pipe = StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    unet=unet,
    torch_dtype=torch.float16,
    variant='fp16
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

In [None]:
prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"
generator = torch.manual_seed(111)

image = pipe(
    prompt,
    num_inference_steps=4, # LCM needs fewer steps
    guidance_scale=8.0,
    generator=generator,
).images[0]
image

In [None]:
image = pipe(
    prompt,
    num_inference_steps=4, # LCM needs fewer steps
    guidance_scale=1.0, # disable guidance_scale
    generator=generator,
).images[0]
image

##### LCM-LoRA

To use LCM-LoRAs, we need to replace the scheduler with the `LCMScheduler` and load the LCM-LoRA weights with the `load_lora_weights`.

Note to LCM-LoRA:
* Batch size is doubled inside the pipeline for CFG.
* We should use guidance with LCM-LoRAs, but it is very sensitive to high `guidance_scale` values and can lead to artifacts in the generated image. The best values are between 1 and 2.
* Replacing `stabilityai/stable-diffusion-xl-base-1.0` with any finetuned model should be fine.

In [None]:
from diffusers import DiffusionPipeline, LCMScheduler
import torch

pipe = DiffusionPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
# load LCM-LoRA
pipe.load_lora_weights('latent-consistency/lcm-lora-sdxl')

In [None]:
prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"
generator = torch.manual_seed(111)

image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=1.0,
    generator=generator,
).images[0]
image

In [None]:
image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=2.0,
    generator=generator,
).images[0]
image

## Image-to-image

##### LCM

To use LCMs for image-to-image, we need to load the LCM checkpoint for our supported model into `UNet2DConditionModel` and replace the scheduler with the `LCMScheduler`.

In [None]:
from diffusers import AutoPipelineForImage2Image, UNet2DConditionModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid
import torch

unet = UNet2DConditionModel.from_pretrained(
    'SimianLuo/LCM_Dreamshaper_v7',
    subfolder='unet',
    torch_dtype=torch.float16
)

pipe = AutoPipelineForImage2Image.from_pretrained(
    'Lykon/dreamshaper-7',
    unet=unet, # LCM
    torch_dtype=torch.float16,
    variant='fp16',
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

In [None]:
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png")
prompt = "Astronauts in a jungle, cold color palette, muted colors, detailed, 8k"
generator = torch.manual_seed(0)

image = pipe(
    prompt,
    image=init_image,
    num_inference_steps=4,
    guidance_scale=7.5,
    strength=0.5,
    generator=generator
).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

##### LCM-LoRA

To use LCM-LoRA for image-to-image, we need to replace the scheduler with the `LCMScheduler` and load the LCM-LoRA weights with the `load_lora_weights`.

In [None]:
from diffusers import AutoPipelineForImage2Image, LCMScheduler
from diffusers.utils import load_image, make_image_grid
import torch

pipe = AutoPipelineForImage2Image.from_pretrained(
    'Lykon/dreamshaper',
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
# load LCM-LoRA
pipe.load_lora_weights('latent-consistency/lcm-lora-sdv1-5')

In [None]:
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png")
prompt = "Astronauts in a jungle, cold color palette, muted colors, detailed, 8k"
generator = torch.manual_seed(0)

image = pipe(
    prompt,
    image=init_image,
    num_inference_steps=4,
    guidance_scale=1,
    strength=0.6,
    generator=generator
).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

## Inpainting

To use LCM-LoRAs for inpainting, we need to replace the scheduler with the `LCMScheduler` and load the LCM-LoRA weights with the `load_lora_weights`.

In [None]:
from diffusers import AutoPipelineForInpainting, LCMScheduler
from diffusers.utils import load_image, make_image_grid
import torch

pipe = AutoPipelineForInpainting.from_pretrained(
    'runwayml/stable-diffusion-inpainting',
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
# load LCM-LoRA
pipe.load_lora_weights('latent-consistency/lcm-lora-sdv1-5')

In [None]:
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
generator = torch.manual_seed(111)

image = pipe(
    prompt,
    image=init_image,
    mask_image=mask_image,
    generator=generator,
    num_infernece_steps=4,
    guidance_scale=4,
).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

## Adapters

LCMs are compatible with adapters like LoRA, ControlNet, T2I-Adapter, and AnimateDiff. We can bring the speed of LCM to these adapters to generate images in a certain style or condition the model on another input.

### LoRA

##### LCM

* Load the LCM checkpoint into `UNet2DConditionModel` and replace the scheduler with the `LCMScheduler`
* Use the `load_lora_weights` to load the LoRA weights

In [None]:
from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
    'latent-consistency/lcm-sdxl',
    torch_dtype=torch.float16,
    variant='fp16'
)

pipe = StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    unet=unet,
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(
    'TheLastBen/Papercut_SDXL',
    weight_name='papercut.safetensors',
    adapter_name='papercut'
)

In [None]:
prompt = "papercut, a cute fox"
generator = torch.manual_seed(0)

image = pipe(
    prompt,
    num_inference_steps=4,
    generator=generator,
    guidance_scale=8.0
).images[0]
image

##### LCM-LoRA

* Replace the scheduler with the `LCMScheduler`
* Use the `load_lora_weights` to load LCM-LoRA weights and the style LoRA.
* Combine both LoRA adapters

In [None]:
from diffusers import DiffusionPipeline, LCMScheduler
import torch

pipe = DiffusionPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(
    'latent-consistency/lcm-lora-sdxl',
    adapter_name='lcm'
)
pipe.load_lora_weights(
    'TheLastBen/Papercut_SDXL',
    weight_name='papercut.safetensors',
    adapter_name='papercut'
)

pipe.set_adapters(
    ['lcm', 'papercut'],
    adapter_weights=[1.0, 0.8]
)

In [None]:
prompt = "papercut, a cute fox"
generator = torch.manual_seed(0)
image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=1, # note the scale here
    generator=generator
).images[0]
image

### ControlNet

##### LCM

In [None]:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid
import torch
import cv2
import numpy as np
from PIL import Image

controlnet = ControlNetModel.from_pretrained(
    'lllyasviel/sd-controlnet-canny',
    torch_dtype=torch.float16,
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "SimianLuo/LCM_Dreamshaper_v7",
    controlnet=controlnet,
    torch_dtype=torch.float16,
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

In [None]:
original_image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((512, 512))

image = np.array(original_image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

In [None]:
prompt = 'the mona lisa'
generator = torch.manual_seed(111)

image = pipe(
    prompt,
    image=canny_image,
    num_inference_steps=4,
    generator=generator,
    guidance_scale=7.5
).images[0]
make_image_grid([original_image, canny_image, image], rows=1, cols=3)

##### LCM-LoRA

In [None]:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid
import torch
import cv2
import numpy as np
from PIL import Image

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-canny",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", # this is regular sd15
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
    variant="fp16"
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights('latent-consistency/lcm-lora-sdv1-5')

In [None]:
original_image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((512, 512))

image = np.array(original_image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

In [None]:
prompt = 'the mona lisa'
generator = torch.manual_seed(111)

image = pipe(
    prompt,
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=1.5, # note the scale here
    controlnet_conditioning_scale=0.8,
    cross_attention_kwargs={'scale': 1},
    generator=generator
).images[0]
make_image_grid([original_image, canny_image, image], rows=1, cols=3)

### T2I-Adapter

##### LCM

In [None]:
from diffusers import StableDiffusionXLAdapterPipeline, UNet2DConditionModel, T2IAdapter, LCMScheduler
from diffusers.utils import load_image, make_image_grid
import torch
import cv2
import numpy as np
from PIL import Image

In [None]:
adapter = T2IAdapter.from_pretrained(
    'TencentARC/t2i-adapter-canny-sdxl-1.0',
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')

unet = UNet2DConditionModel.from_pretrained(
    'latent-consistency/lcm-sdxl',
    torch_dtype=torch.float16,
    variant='fp16'
)

pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    unet=unet,
    adapter=adapter,
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

In [None]:
original_image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((384, 384))

image = np.array(original_image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image).resize((1024, 1024))

In [None]:
prompt = "the mona lisa, 4k picture, high quality"
negative_prompt = "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured"

generator = torch.manual_seed(111)

image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=5,
    adapter_conditioning_scale=0.8,
    adapter_conditioning_factor=1,
    generator=generator
).images[0]
make_image_grid([original_image, canny_image, image], rows=1, cols=3)

##### LCM-LoRA

In [None]:
from diffusers import StableDiffusionXLAdapterPipeline, UNet2DConditionModel, T2IAdapter, LCMScheduler
from diffusers.utils import load_image, make_image_grid
import torch
import cv2
import numpy as np
from PIL import Image

In [None]:
adapter = T2IAdapter.from_pretrained(
    'TencentARC/t2i-adapter-canny-sdxl-1.0',
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')

pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    adapter=adapter,
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights('latent-consistency/lcm-lora-sdxl')

In [None]:
original_image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((384, 384))

image = np.array(original_image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image).resize((1024, 1024))

In [None]:
prompt = "the mona lisa, 4k picture, high quality"
negative_prompt = "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured"

generator = torch.manual_seed(111)

image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=1.5,
    adapter_conditioning_scale=0.8,
    adapter_conditioning_factor=1,
    generator=generator
).images[0]
make_image_grid([original_image, canny_image, image], rows=1, cols=3)

### AnimateDiff

In [None]:
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler, LCMScheduler
from diffusers.utils import export_to_gif
import torch

adapter = MotionAdapter.from_pretrained(
    'guoyww/animatediff-motion-adapter-v1-5'
)

pipe = AnimateDiffPipeline.from_pretrained(
    'frankjoshua/toonyou_beta6',
    motion_adapter=adapter,
).to('cuda')
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# load LCM-LoRA
pipe.load_lora_weights(
    'latent-consistency/lcm-lora-sdv1-5',
    adapter_name='lcm'
)

# load animatediff lora
pipe.load_lora_weights(
    'guoyww/animatediff-motion-lora-zoom-in',
    weight_name='diffusion_pytorch_model.safetensors',
    adapter_name='motion-lora'
)

pipe.set_adapters(
    ['lcm', 'motion-lora'],
    adapter_weights=[0.55, 1.2]
)

In [None]:
prompt = "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress"
generator = torch.manual_seed(111)

frames = pipe(
    prompt,
    num_inference_steps=5,
    guidance_scale=1.25,
    cross_attention_kwargs={'scale': 1},
    num_frames=24,
    generator=generator,
).frames[0]
export_to_gif(frames, 'animate.gif')