v0.4.0 Better, faster, stronger!
🚗 Faster
We have thoroughly profiled our codebase and applied a number of incremental improvements that, when combined, provide a speed improvement of almost 3x.
On top of that, we now default to using the float16
format. It's much faster than float32
and, according to our tests, produces images with no discernible difference in quality. This beats the use of autocast
, so the resulting code is cleaner!
🔑 use_auth_token
no more
The recently released version of huggingface-hub
automatically uses your access token if you are logged in, so you don't need to put it everywhere in your code. All you need to do is authenticate once using huggingface-cli login
in your terminal and you're all set.
- pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
+ pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
We bumped huggingface-hub
version to 0.10.0
in our dependencies to achieve this.
🎈More flexible APIs
- Schedulers now use a common, simpler unified API design. This has allowed us to remove many conditionals and special cases in the rest of the code, including the pipelines. This is very important for us and for the users of 🧨 diffusers: we all gain clarity and a solid abstraction for schedulers. See the description in #719 for more details
Please update any custom Stable Diffusion pipelines accordingly:
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- latents = latents * self.scheduler.sigmas[0]
+ latents = latents * self.scheduler.init_noise_sigma
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- sigma = self.scheduler.sigmas[i]
- latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+ latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs).prev_sample
- else:
- latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
+ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
- Pipeline callbacks. As a community project (h/t @jamestiotio!),
diffusers
pipelines can now invoke a callback function during generation, providing the latents at each step of the process. This makes it easier to perform tasks such as visualization, inspection, explainability and others the community may invent.
🛠️ More tasks
Building on top of the previous foundations, this release incorporates several new tasks that have been adapted from research papers or community projects. These include:
- Textual inversion. Makes it possible to quickly train a new concept or style and incorporate it into the vocabulary of Stable Diffusion. Hundreds of people have already created theirs, and they can be shared and combined together. See the training Colab to get started.
- Dreambooth. Similar goal to textual inversion, but instead of creating a new item in the vocabulary it fine-tunes the model to make it learn a new concept. Training Colab.
- Negative prompts. Another community effort led by @shirayu. The Stable Diffusion pipeline can now receive both a positive prompt (the one you want to create), and a negative prompt (something you want to drive the model away from). This opens up a lot of creative possibilities!
🏃♀️ Under the hood changes to support better fine-tuning
Gradient checkpointing and 8-bit optimizers have been successfully applied to achieve Dreambooth fine-tuning in a Colab notebook! These updates will make it easier for diffusers
to support general-purpose fine-tuning (coming soon!).
⚠️ Experimental: community pipelines
This is big, but it's still an experimental feature that may change in the future.
We are constantly amazed at the amount of imagination and creativity in the diffusers
community, so we've made it easy to create custom pipelines and share them with others. You can write your own pipeline code, store it in 🤗 Hub, GitHub or your local filesystem and StableDiffusionPipeline.from_pretrained
will be able to load and run it. Read more in the documentation.
We can't wait to see what new tasks the community creates!
💪 Quality of life fixes
Bug fixing, improved documentation, better tests are all important to ensure diffusers
is a high-quality codebase, and we always spend a lot of effort working on them. Several first-time contributors have helped here, and we are very grateful for their efforts!
🙏 Significant community contributions
The following people have made significant contributions to the library over the last release:
- @Victarry – Add training example for DreamBooth (#554)
- @jamestiotio – Add callback parameters for Stable Diffusion pipelines (#521)
- @jachiam – Allow resolutions that are not multiples of 64 (#505)
- @johnowhitaker – Adding pred_original_sample to SchedulerOutput for some samplers (#614).
- @keturn – Interesting discussions and insights on many topics.
✏️ Change list
- [Docs] Correct links by @patrickvonplaten in #432
- [Black] Update black by @patrickvonplaten in #433
- use torch.matmul instead of einsum in attnetion. by @patil-suraj in #445
- Renamed variables from single letter to better naming by @daspartho in #449
- Docs: fix installation typo by @daspartho in #453
- fix table formatting for stable diffusion pipeline doc (add blank line) by @natolambert in #471
- update expected results of slow tests by @kashif in #268
- [Flax] Make room for more frameworks by @patrickvonplaten in #494
- Fix
disable_attention_slicing
in pipelines by @pcuenca in #498 - Rename test_scheduler_outputs_equivalence in model tests. by @pcuenca in #451
- Scheduler docs update by @natolambert in #464
- Fix scheduler inference steps error with power of 3 by @natolambert in #466
- initial flax pndm schedular by @kashif in #492
- Fix vae tests for cpu and gpu by @kashif in #480
- [Docs] Add subfolder docs by @patrickvonplaten in #500
- docs: bocken doc links for relative links by @jjmachan in #504
- Removing
.float()
(autocast
in fp16 will discard this (I think)). by @Narsil in #495 - Fix MPS scheduler indexing when using
mps
by @pcuenca in #450 - [CrossAttention] add different method for sliced attention by @patil-suraj in #446
- Implement
FlaxModelMixin
by @mishig25 in #493 - Karras VE, DDIM and DDPM flax schedulers by @kashif in #508
- [UNet2DConditionModel, UNet2DModel] pass norm_num_groups to all the blocks by @patil-suraj in #442
- Add
init_weights
method toFlaxMixin
by @mishig25 in #513 - UNet Flax with FlaxModelMixin by @pcuenca in #502
- Stable diffusion text2img conversion script. by @patil-suraj in #154
- [CI] Add stalebot by @anton-l in #481
- Fix is_onnx_available by @SkyTNT in #440
- [Tests] Test attention.py by @sidthekidder in #368
- Finally fix the image-based SD tests by @anton-l in #509
- Remove the usage of numpy in up/down sample_2d by @ydshieh in #503
- Fix typos and add Typo check GitHub Action by @shirayu in #483
- Quick fix for the img2img tests by @anton-l in #530
- [Tests] Fix spatial transformer tests on GPU by @anton-l in #531
- [StableDiffusionInpaintPipeline] accept tensors for init and mask image by @patil-suraj in #439
- adding more typehints to DDIM scheduler by @vishnu-anirudh in #456
- Revert "adding more typehints to DDIM scheduler" by @patrickvonplaten in #533
- Add LMSDiscreteSchedulerTest by @sidthekidder in #467
- [Download] Smart downloading by @patrickvonplaten in #512
- [Hub] Update hub version by @patrickvonplaten in #538
- Unify offset configuration in DDIM and PNDM schedulers by @jonatanklosko in #479
- [Configuration] Better logging by @patrickvonplaten in #545
make fixup
support by @younesbelkada in #546- FlaxUNet2DConditionOutput @flax.struct.dataclass by @mishig25 in #550
- [Flax] fix Flax scheduler by @kashif in #564
- JAX/Flax safety checker by @pcuenca in #558
- Flax: ignore dtype for configuration by @pcuenca in #565
- Remove check_tf_utils to avoid an unnecessary TF import for now by @anton-l in #566
- Fix
_upsample_2d
by @ydshieh in #535 - [Flax] Add Vae for Stable Diffusion by @patrickvonplaten in #555
- [Flax] Solve problem with VAE by @patrickvonplaten in #574
- [Tests] Upload custom test artifacts by @anton-l in #572
- [Tests] Mark the ncsnpp model tests as slow by @anton-l in #575
- [examples/community] add CLIPGuidedStableDiffusion by @patil-suraj in #561
- Fix
CrossAttention._sliced_attention
by @ydshieh in #563 - Fix typos by @shirayu in #568
- Add
from_pt
argument in.from_pretrained
by @younesbelkada in #527 - [FlaxAutoencoderKL] rename weights to align with PT by @patil-suraj in #584
- Fix BaseOutput initialization from dict by @anton-l in #570
- Add the K-LMS scheduler to the inpainting pipeline + tests by @anton-l in #587
- [flax safety checker] Use
FlaxPreTrainedModel
for saving/loading by @patil-suraj in #591 - FlaxDiffusionPipeline & FlaxStableDiffusionPipeline by @mishig25 in #559
- [Flax] Fix unet and ddim scheduler by @patrickvonplaten in #594
- Fix params replication when using the dummy checker by @pcuenca in #602
- Allow dtype to be specified in Flax pipeline by @pcuenca in #600
- Fix flax from_pretrained pytorch weight check by @mishig25 in #603
- Mv weights name consts to diffusers.utils by @mishig25 in #605
- Replace
dropout_prob
bydropout
invae
by @younesbelkada in #595 - Add smoke tests for the training examples by @anton-l in #585
- Add torchvision to training deps by @anton-l in #607
- Return Flax scheduler state by @pcuenca in #601
- [ONNX] Collate the external weights, speed up loading from the hub by @anton-l in #610
- docs: fix
Berkeley
ref by @ryanrussell in #611 - Handle the PIL.Image.Resampling deprecation by @anton-l in #588
- Make flax from_pretrained work with local subfolder by @mishig25 in #608
- [flax] 'dtype' should not be part of self._internal_dict by @mishig25 in #609
- [UNet2DConditionModel] add gradient checkpointing by @patil-suraj in #461
- docs: fix
stochastic_karras_ve
ref by @ryanrussell in #618 - Adding pred_original_sample to SchedulerOutput for some samplers by @johnowhitaker in #614
- docs:
.md
readability fixups by @ryanrussell in #619 - Flax documentation by @younesbelkada in #589
- fix docs: change sample to images by @AbdullahAlfaraj in #613
- refactor: pipelines readability improvements by @ryanrussell in #622
- Allow passing session_options for ORT backend by @cloudhan in #620
- Fix breaking error: "ort is not defined" by @pcuenca in #626
- docs:
src/diffusers
readability improvements by @ryanrussell in #629 - Fix formula for noise levels in Karras scheduler and tests by @sgrigory in #627
- [CI] Fix onnxruntime installation order by @anton-l in #633
- Warning for too long prompts in DiffusionPipelines (Resolve #447) by @shirayu in #472
- Fix docs link to train_unconditional.py by @AbdullahAlfaraj in #642
- Remove deprecated
torch_device
kwarg by @pcuenca in #623 - refactor:
custom_init_isort
readability fixups by @ryanrussell in #631 - Remove inappropriate docstrings in LMS docstrings. by @pcuenca in #634
- Flax pipeline pndm by @pcuenca in #583
- Fix
SpatialTransformer
by @ydshieh in #578 - Add training example for DreamBooth. by @Victarry in #554
- [Pytorch] Pytorch only schedulers by @kashif in #534
- [examples/dreambooth] don't pass tensor_format to scheduler. by @patil-suraj in #649
- [dreambooth] update install section by @patil-suraj in #650
- [DDIM, DDPM] fix add_noise by @patil-suraj in #648
- [Pytorch] add dep. warning for pytorch schedulers by @kashif in #651
- [CLIPGuidedStableDiffusion] remove set_format from pipeline by @patil-suraj in #653
- Fix onnx tensor format by @anton-l in #654
- Fix
main
: stable diffusion pipelines cannot be loaded by @pcuenca in #655 - Fix the LMS pytorch regression by @anton-l in #664
- Added script to save during textual inversion training. Issue 524 by @isamu-isozaki in #645
- [CLIPGuidedStableDiffusion] take the correct text embeddings by @patil-suraj in #667
- Update index.mdx by @tmabraham in #670
- [examples] update transfomers version by @patil-suraj in #665
- [gradient checkpointing] lower tolerance for test by @patil-suraj in #652
- Flax
from_pretrained
: clean upmismatched_keys
. by @pcuenca in #630 trained_betas
ignored in some schedulers by @vishnu-anirudh in #635- Renamed x -> hidden_states in resnet.py by @daspartho in #676
- Optimize Stable Diffusion by @NouamaneTazi in #371
- Allow resolutions that are not multiples of 64 by @jachiam in #505
- refactor: update ldm-bert
config.json
url closes #675 by @ryanrussell in #680 - [docs] fix table in fp16.mdx by @NouamaneTazi in #683
- Fix slow tests by @NouamaneTazi in #689
- Fix BibText citation by @osanseviero in #693
- Add callback parameters for Stable Diffusion pipelines by @jamestiotio in #521
- [dreambooth] fix applying clip_grad_norm_ by @patil-suraj in #686
- Flax: add shape argument to
set_timesteps
by @pcuenca in #690 - Fix type annotations on StableDiffusionPipeline.call by @tasercake in #682
- Fix import with Flax but without PyTorch by @pcuenca in #688
- [Support PyTorch 1.8] Remove inference mode by @patrickvonplaten in #707
- [CI] Speed up slow tests by @anton-l in #708
- [Utils] Add deprecate function and move testing_utils under utils by @patrickvonplaten in #659
- Checkpoint conversion script from Diffusers => Stable Diffusion (CompVis) by @jachiam in #701
- [Docs] fix docstring for issue #709 by @kashif in #710
- Update schedulers README.md by @tmabraham in #694
- add accelerate to load models with smaller memory footprint by @piEsposito in #361
- Fix typos by @shirayu in #718
- Add an argument "negative_prompt" by @shirayu in #549
- Fix import if PyTorch is not installed by @pcuenca in #715
- Remove comments no longer appropriate by @pcuenca in #716
- [train_unconditional] fix applying clip_grad_norm_ by @patil-suraj in #721
- renamed x to meaningful variable in resnet.py by @i-am-epic in #677
- [Tests] Add accelerate to testing by @patrickvonplaten in #729
- [dreambooth] Using already created
Path
in dataset by @DrInfiniteExplorer in #681 - Include CLIPTextModel parameters in conversion by @kanewallmann in #695
- Avoid negative strides for tensors by @shirayu in #717
- [Pytorch] pytorch only timesteps by @kashif in #724
- [Scheduler design] The pragmatic approach by @anton-l in #719
- Removing
autocast
for35-25% speedup
. (autocast
considered harmful). by @Narsil in #511 - No more use_auth_token=True by @patrickvonplaten in #733
- remove use_auth_token from remaining places by @patil-suraj in #737
- Replace messages that have empty backquotes by @pcuenca in #738
- [Docs] Advertise fp16 instead of autocast by @patrickvonplaten in #740
- remove use_auth_token from for TI test by @patil-suraj in #747
- allow multiple generations per prompt by @patil-suraj in #741
- Add back-compatibility to LMS timesteps by @anton-l in #750
- update the clip guided PR according to the new API by @patil-suraj in #751
- Raise an error when moving an fp16 pipeline to CPU by @anton-l in #749
- Better steps deprecation for LMS by @anton-l in #753