Perturbed-Attention Guidance

[Project](https://cvlab-kaist.github.io/Perturbed-Attention-Guidance/) / [arXiv](https://arxiv.org/abs/2403.17377) / [GitHub](https://github.com/cvlab-kaist/Perturbed-Attention-Guidance)

This implementation is based on [Diffusers](https://huggingface.co/docs/diffusers/index). StableDiffusionPAGPipeline is a modification of StableDiffusionPipeline to support Perturbed-Attention Guidance (PAG). This script was contributed by [Hyoungwon Cho](https://github.com/HyoungwonCho) and the notebook by [Parag Ekbote](https://github.com/ParagEkbote).

PAG Parameters-

`pag_scale` : guidance scale of PAG (ex: 5.0)

`pag_applied_layers_index` : index of the layer to apply perturbation (ex: ['m0'])

In [1]:
pip install diffusers torch

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [3]:
import os
import torch

from accelerate.utils import set_seed

from diffusers import StableDiffusionPipeline
from diffusers.utils import load_image, make_image_grid
from diffusers.utils.torch_utils import randn_tensor

pipe = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    custom_pipeline="hyoungwoncho/sd_perturbed_attention_guidance",
    torch_dtype=torch.float16
)

device = "cuda"
pipe = pipe.to(device)

pag_scale = 5.0
pag_applied_layers_index = ['m0']

batch_size = 4
seed = 10

base_dir = "./results/"
grid_dir = base_dir + "/pag" + str(pag_scale) + "/"

if not os.path.exists(grid_dir):
    os.makedirs(grid_dir)

set_seed(seed)

latent_input = randn_tensor(shape=(batch_size,4,64,64), generator=None, device=device, dtype=torch.float16)

output_baseline = pipe(
    "",
    width=512,
    height=512,
    num_inference_steps=50,
    guidance_scale=0.0,
    pag_scale=0.0,
    pag_applied_layers_index=pag_applied_layers_index,
    num_images_per_prompt=batch_size,
    latents=latent_input
).images

output_pag = pipe(
    "",
    width=512,
    height=512,
    num_inference_steps=50,
    guidance_scale=0.0,
    pag_scale=5.0,
    pag_applied_layers_index=pag_applied_layers_index,
    num_images_per_prompt=batch_size,
    latents=latent_input
).images

grid_image = make_image_grid(output_baseline + output_pag, rows=2, cols=batch_size)
grid_image.save(grid_dir + "sample.png")

pipeline.py:   0%|          | 0.00/72.5k [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  deprecate("LoraLoaderMixin", "1.0.0", deprecation_message)


  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]