Release v0.25.0: aMUSEd, faster SDXL, interruptable pipelines · huggingface/diffusers

aMUSEd

aMUSEd is a lightweight text to image model based off of the MUSE architecture. aMUSEd is particularly useful in applications that require a lightweight and fast model, such as generating many images quickly at once. aMUSEd is currently a research release.

aMUSEd is a VQVAE token-based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with MUSE, it uses the smaller text encoder CLIP-L/14 instead of T5-XXL. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.

Text-to-image generation

import torch
from diffusers import AmusedPipeline

pipe = AmusedPipeline.from_pretrained(
    "amused/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "cowboy"
image = pipe(prompt, generator=torch.manual_seed(8)).images[0]
image.save("text2image_512.png")

Image-to-image generation

import torch
from diffusers import AmusedImg2ImgPipeline
from diffusers.utils import load_image

pipe = AmusedImg2ImgPipeline.from_pretrained(
    "amused/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "apple watercolor"
input_image = (
    load_image(
        "https://huggingface.co/amused/amused-512/resolve/main/assets/image2image_256_orig.png"
    )
    .resize((512, 512))
    .convert("RGB")
)

image = pipe(prompt, input_image, strength=0.7, generator=torch.manual_seed(3)).images[0]
image.save("image2image_512.png")

Inpainting

import torch
from diffusers import AmusedInpaintPipeline
from diffusers.utils import load_image
from PIL import Image

pipe = AmusedInpaintPipeline.from_pretrained(
    "amused/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "a man with glasses"
input_image = (
    load_image(
        "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting_256_orig.png"
    )
    .resize((512, 512))
    .convert("RGB")
)
mask = (
    load_image(
        "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting_256_mask.png"
    )
    .resize((512, 512))
    .convert("L")
)    

image = pipe(prompt, input_image, mask, generator=torch.manual_seed(3)).images[0]
image.save(f"inpainting_512.png")

📜 Docs: https://huggingface.co/docs/diffusers/main/en/api/pipelines/amused

🛠️ Models:

mused-256: https://huggingface.co/amused/amused-256 (603M params)
amused-512: https://huggingface.co/amused/amused-512 (608M params)

Faster SDXL

We’re excited to present an array of optimization techniques that can be used to accelerate the inference latency of text-to-image diffusion models. All of these can be done in native PyTorch without requiring additional C++ code.

These techniques are not specific to Stable Diffusion XL (SDXL) and can be used to improve other text-to-image diffusion models too. Starting from default fp32 precision, we can achieve a 3x speed improvement by applying different PyTorch optimization techniques. We encourage you to check out the detailed docs provided below.

Note: Compared to the default way most people use Diffusers which is fp16 + SDPA, applying all the optimization explained in the blog below yields a 30% speed-up.

📜 Docs: https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion
🌠 PyTorch blog post: https://pytorch.org/blog/accelerating-generative-ai-3/

Interruptible pipelines

Interrupting the diffusion process is particularly useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback.

This callback function should take the following arguments: pipe, i, t, and callback_kwargs (this must be returned). Set the pipeline's _interrupt attribute to True to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback.

In this example, the diffusion process is stopped after 10 steps even though num_inference_steps is set to 50.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe.enable_model_cpu_offload()
num_inference_steps = 50

def interrupt_callback(pipe, i, t, callback_kwargs):
    stop_idx = 10
    if i == stop_idx:
        pipe._interrupt = True

    return callback_kwargs

pipe(
    "A photo of a cat",
    num_inference_steps=num_inference_steps,
    callback_on_step_end=interrupt_callback,
)

📜 Docs: https://huggingface.co/docs/diffusers/main/en/using-diffusers/callback

`peft` in our LoRA training examples

We incorporated peft in all the officially supported training examples concerning LoRA. This greatly simplifies the code and improves readability. LoRA training hasn't been easier, thanks to peft!

More memory-friendly version of LCM LoRA SDXL training

We incorporated best practices from peft to make LCM LoRA training for SDXL more memory-friendly. As such, you don't have to initialize two UNets (teacher and student) anymore. This version also integrates with the datasets library for quick experimentation. Check out this section for more details.

All commits

[docs] Fix video link by @stevhliu in #5986
Fix LLMGroundedDiffusionPipeline super class arguments by @KristianMischke in #5993
Remove a duplicated line? by @sweetcocoa in #6010
[examples/advanced_diffusion_training] bug fixes and improvements for LoRA Dreambooth SDXL advanced training script by @linoytsaban in #5935
[advanced_dreambooth_lora_sdxl_tranining_script] readme fix by @linoytsaban in #6019
[docs] Fix SVD video by @stevhliu in #6004
[Easy] minor edits to setup.py by @sayakpaul in #5996
[From Single File] Allow Text Encoder to be passed by @patrickvonplaten in #6020
[Community Pipeline] Regional Prompting Pipeline by @hako-mikan in #6015
[logging] Fix assertion bug by @standardAI in #6012
[Docs] Update a link by @standardAI in #6014
added attention_head_dim, attention_type, resolution_idx by @charchit7 in #6011
fix style by @patrickvonplaten (direct commit on v0.25.0)
[Kandinsky 3.0] Follow-up TODOs by @yiyixuxu in #5944
[schedulers] create self.sigmas during init by @yiyixuxu in #6006
Post Release: v0.24.0 by @patrickvonplaten in #5985
LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft by @TonyLianLong in #6023
adapt PixArtAlphaPipeline for pixart-lcm model by @lawrence-cj in #5974
[PixArt Tests] remove fast tests from slow suite by @sayakpaul in #5945
[LoRA serialization] fix: duplicate unet prefix problem. by @sayakpaul in #5991
[advanced dreambooth lora sdxl training script] improve help tags by @linoytsaban in #6035
fix StableDiffusionTensorRT super args error by @gujingit in #6009
Update value_guided_sampling.py by @Parth38 in #6027
Update Tests Fetcher by @DN6 in #5950
Add variant argument to dreambooth lora sdxl advanced by @levi in #6021
[Feature] Support IP-Adapter Plus by @okotaku in #5915
[Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ by @RuoyiDu in #6022
[advanced dreambooth lora training script][bug_fix] change token_abstraction type to str by @linoytsaban in #6040
[docs] Add Kandinsky 3 by @stevhliu in #5988
[docs] #Copied from mechanism by @stevhliu in #6007
Move kandinsky convert script by @DN6 in #6047
Pin Ruff Version by @DN6 in #6059
Ldm unet convert fix by @DN6 in #6038
Fix demofusion by @radames in #6049
[From single file] remove depr warning by @patrickvonplaten in #6043
[advanced_dreambooth_lora_sdxl_tranining_script] save embeddings locally fix by @apolinario in #6058
Device agnostic testing by @arsalanu in #5612
[feat] allow SDXL pipeline to run with fused QKV projections by @sayakpaul in #6030
fix by @DN6 (direct commit on v0.25.0)
Use CC12M for LCM WDS training example by @pcuenca in #5908
Disable Tests Fetcher by @DN6 in #6060
[Advanced Diffusion Training] Cache latents to avoid VAE passes for every training step by @apolinario in #6076
[Euler Discrete] Fix sigma by @patrickvonplaten in #6078
Harmonize HF environment variables + deprecate use_auth_token by @Wauplin in #6066
[docs] SDXL Turbo by @stevhliu in #6065
Add ControlNet-XS support by @UmerHA in #5827
Fix typing inconsistency in Euler discrete scheduler by @iabaldwin in #6052
[PEFT] Adapt example scripts to use PEFT by @younesbelkada in #5388
Fix clearing backend cache from device agnostic testing by @DN6 in #6075
[Community] AnimateDiff + Controlnet Pipeline by @a-r-r-o-w in #5928
EulerDiscreteScheduler add rescale_betas_zero_snr by @Beinsezii in #6024
Add support for IPAdapterFull by @fabiorigano in #5911
Fix a bug in add_noise function by @yiyixuxu in #6085
[Advanced Diffusion Script] Add Widget default text by @apolinario in #6100
[Advanced Training Script] Fix pipe example by @apolinario in #6106
IP-Adapter for StableDiffusionControlNetImg2ImgPipeline by @charchit7 in #5901
IP adapter support for most pipelines by @a-r-r-o-w in #5900
Correct type annotation for VaeImageProcessor.numpy_to_pil by @edwardwli in #6111
[Docs] Fix typos by @standardAI in #6122
[feat: Benchmarking Workflow] add stuff for a benchmarking workflow by @sayakpaul in #5839
[Community] Add SDE Drag pipeline by @Monohydroxides in #6105
[docs] IP-Adapter API doc by @stevhliu in #6140
Add missing subclass docs, Fix broken example in SD_safe by @a-r-r-o-w in #6116
[advanced dreambooth lora sdxl training script] load pipeline for inference only if validation prompt is used by @linoytsaban in #6171
[docs] Add missing \ in lora.md by @pierd in #6174
[Sigmas] Keep sigmas on CPU by @patrickvonplaten in #6173
LoRA test fixes by @DN6 in #6163
Add PEFT to training deps by @DN6 in #6148
Clean Up Comments in LCM(-LoRA) Distillation Scripts. by @dg845 in #6145
Compile test fix by @DN6 in #6104
[LoRA] add an error message when dealing with _best_guess_weight_name ofline by @sayakpaul in #6184
[Core] feat: enable fused attention projections for other SD and SDXL pipelines by @sayakpaul in #6179
[Benchmarks] fix: lcm benchmarking reporting by @sayakpaul in #6198
[Refactor autoencoders] feat: introduce autoencoders module by @sayakpaul in #6129
Fix the test script in examples/text_to_image/README.md by @krahets in #6209
Nit fix to training params by @osanseviero in #6200
[Training] remove depcreated method from lora scripts. by @sayakpaul in #6207
Fix SDXL Inpainting from single file with Refiner Model by @DN6 in #6147
Fix possible re-conversion issues after extracting from safetensors by @d8ahazard in #6097
Fix t2i. blog url by @abinthomasonline in #6205
[Text-to-Video] Clean up pipeline by @patrickvonplaten in #6213
[Torch Compile] Fix torch compile for svd vae by @patrickvonplaten in #6217
Deprecate Pipelines by @DN6 in #6169
Update README.md by @TilmannR in #6191
Support img2img and inpaint in lpw-xl by @a-r-r-o-w in #6114
Update train_text_to_image_lora.py by @haofanwang in #6144
[SVD] Fix guidance scale by @patrickvonplaten in #6002
Slow Test for Pipelines minor fixes by @DN6 in #6221
Add converter method for ip adapters by @fabiorigano in #6150
offload the optional module image_encoder by @yiyixuxu in #6151
fix: init for vae during pixart tests by @sayakpaul in #6215
[T2I LoRA training] fix: unscale fp16 gradient problem by @sayakpaul in #6119
ControlNetXS fixes. by @DN6 in #6228
add peft dependency to fast push tests by @sayakpaul in #6229
[refactor embeddings]pixart-alpha by @yiyixuxu in #6212
[Docs] Fix a code example in the ControlNet Inpainting documentation by @raven38 in #6236
[docs] Batched seeds by @stevhliu in #6237
[Fix] Fix Regional Prompting Pipeline by @hako-mikan in #6188
EulerAncestral add rescale_betas_zero_snr by @Beinsezii in #6187
[Refactor upsamplers and downsamplers] separate out upsamplers and downsamplers. by @sayakpaul in #6128
Bump transformers from 4.34.0 to 4.36.0 in /examples/research_projects/realfill by @dependabot[bot] in #6255
fix: unscale fp16 gradient problem & potential error by @lvzii in #6086)
[Refactor] move diffedit out of stable_diffusion by @sayakpaul in #6260
move attend and excite out of stable_diffusion by @sayakpaul (direct commit on v0.25.0)
Revert "move attend and excite out of stable_diffusion" by @sayakpaul (direct commit on v0.25.0)
[Training] remove depcreated method from lora scripts again by @Yimi81 in #6266
[Refactor] move k diffusion out of stable_diffusion by @sayakpaul in #6267
[Refactor] move gligen out of stable diffusion. by @sayakpaul in #6265
[Refactor] move sag out of stable_diffusion by @sayakpaul in #6264
TST Fix LoRA test that fails with PEFT >= 0.7.0 by @BenjaminBossan in #6216
[Refactor] move attend and excite out of stable_diffusion. by @sayakpaul in #6261
[Refactor] move panorama out of stable_diffusion by @sayakpaul in #6262
[Deprecated pipelines] remove pix2pix zero from init by @sayakpaul in #6268
[Refactor] move ldm3d out of stable_diffusion. by @sayakpaul in #6263
open muse by @williamberman in #5437
Remove ONNX inpaint legacy by @DN6 in #6269
Remove peft tests from old lora backend tests by @DN6 in #6273
Allow diffusers to load with Flax, w/o PyTorch by @pcuenca in #6272
[Community Pipeline] Add Marigold Monocular Depth Estimation by @markkua in #6249
Fix Prodigy optimizer in SDXL Dreambooth script by @apolinario in #6290
[LoRA PEFT] fix LoRA loading so that correct alphas are parsed by @sayakpaul in #6225
LoRA Unfusion test fix by @DN6 in #6291
Fix typos in the ValueError for a nested image list as StableDiffusionControlNetPipeline input. by @celestialphineas in #6286
fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same in train_text_to_image_lora.py by @mwkldeveloper in #6259
fix: t2i apdater paper link by @sayakpaul in #6314
fix: lora peft dummy components by @sayakpaul in #6308
[Tests] Speed up example tests by @sayakpaul in #6319
fix: cannot set guidance_scale by @Jannchie in #6326
Change LCM-LoRA README Script Example Learning Rates to 1e-4 by @dg845 in #6304
[Peft] fix saving / loading when unet is not "unet" by @kashif in #6046
[Wuerstchen] fix fp16 training and correct lora args by @kashif in #6245
[docs] fix: animatediff docs by @sayakpaul in #6339
[Training] Add datasets version of LCM LoRA SDXL by @sayakpaul in #5778
[Peft / Lora] Add adapter_names in fuse_lora by @younesbelkada in #5823
[Diffusion fast] add doc for diffusion fast by @sayakpaul in #6311
Add rescale_betas_zero_snr Argument to DDPMScheduler by @dg845 in #6305
Interruptable Pipelines by @DN6 in #5867
Update Animatediff docs by @DN6 in #6341
Add AnimateDiff conversion scripts by @DN6 in #6340
amused other pipelines docs by @williamberman in #6343
[Docs] fix: video rendering on svd. by @sayakpaul in #6330
[SDXL-IP2P] Update README_sdxl, Replace the link for wandb log with the correct run by @priprapre in #6270
adding auto1111 features to inpainting pipeline by @yiyixuxu in #6072
Remove unused parameters and fixed FutureWarning by @Justin900429 in #6317
amused update links to new repo by @williamberman in #6344
[LoRA] make LoRAs trained with peft loadable when peft isn't installed by @sayakpaul in #6306
Move ControlNetXS into Community Folder by @DN6 in #6316
fix: use retrieve_latents by @Jannchie in #6337
Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. by @dg845 in #6279
Fix "push_to_hub only create repo in consistency model lora SDXL training script" by @aandyw in #6102
Fix chunking in SVD by @DN6 in #6350
Add PEFT to advanced training script by @apolinario in #6294
Release: v0.25.0 by @sayakpaul (direct commit on v0.25.0)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@hako-mikan
- [Community Pipeline] Regional Prompting Pipeline (#6015)
- [Fix] Fix Regional Prompting Pipeline (#6188)
@TonyLianLong
- LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft (#6023)
@okotaku
- [Feature] Support IP-Adapter Plus (#5915)
@RuoyiDu
- [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ (#6022)
@UmerHA
- Add ControlNet-XS support (#5827)
@a-r-r-o-w
- [Community] AnimateDiff + Controlnet Pipeline (#5928)
- IP adapter support for most pipelines (#5900)
- Add missing subclass docs, Fix broken example in SD_safe (#6116)
- Support img2img and inpaint in lpw-xl (#6114)
@Monohydroxides
- [Community] Add SDE Drag pipeline (#6105)
@dg845
- Clean Up Comments in LCM(-LoRA) Distillation Scripts. (#6145)
- Change LCM-LoRA README Script Example Learning Rates to 1e-4 (#6304)
- Add rescale_betas_zero_snr Argument to DDPMScheduler (#6305)
- Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. (#6279)
@markkua
- [Community Pipeline] Add Marigold Monocular Depth Estimation (#6249)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.25.0: aMUSEd, faster SDXL, interruptable pipelines

aMUSEd

Faster SDXL

Interruptible pipelines

`peft` in our LoRA training examples

More memory-friendly version of LCM LoRA SDXL training

All commits

Significant community contributions

Contributors

v0.25.0: aMUSEd, faster SDXL, interruptable pipelines

aMUSEd

Faster SDXL

Interruptible pipelines

peft in our LoRA training examples

More memory-friendly version of LCM LoRA SDXL training

All commits

Significant community contributions

Contributors

`peft` in our LoRA training examples