Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
57b8406
Add new text encoder
patrickvonplaten Jun 23, 2023
39b0b97
add transformers depth
patrickvonplaten Jun 23, 2023
50df26c
More
patrickvonplaten Jun 23, 2023
4309a2c
Correct conversion script
patrickvonplaten Jun 23, 2023
51ab97a
Fix more
patrickvonplaten Jun 23, 2023
dd48802
Fix more
patrickvonplaten Jun 23, 2023
7b76780
Correct more
patrickvonplaten Jun 25, 2023
e0a0e36
correct text encoder
patrickvonplaten Jun 25, 2023
277bc9d
Finish all
patrickvonplaten Jun 25, 2023
62a151d
proof that in works in run local xl
patrickvonplaten Jun 25, 2023
ea4cf25
clean up
patrickvonplaten Jun 25, 2023
48d203e
Get refiner to work
patrickvonplaten Jun 25, 2023
4216826
Add red castle
patrickvonplaten Jun 26, 2023
13107bb
Fix batch size
patrickvonplaten Jun 26, 2023
cb23c61
Improve pipelines more
patrickvonplaten Jun 27, 2023
0f1d17c
Finish text2image tests
patrickvonplaten Jun 27, 2023
7850ef3
Add img2img test
patrickvonplaten Jun 27, 2023
a0621fd
Fix more
patrickvonplaten Jun 27, 2023
60cea8e
fix import
patrickvonplaten Jun 27, 2023
6217c36
Merge branch 'sd_xl' of https://github.com/huggingface/diffusers into…
patrickvonplaten Jun 27, 2023
fb7ee3a
Fix embeddings for classic models (#3888)
pcuenca Jun 28, 2023
62df284
Allow multiple prompts to be passed to the refiner (#3895)
pcuenca Jun 28, 2023
3e386df
Merge branch 'main' into sd_xl
patrickvonplaten Jun 30, 2023
364b71d
finish more
patrickvonplaten Jun 30, 2023
1f67cdb
Merge branch 'sd_xl' of https://github.com/huggingface/diffusers into…
patrickvonplaten Jun 30, 2023
2f7bf37
Apply suggestions from code review
patrickvonplaten Jun 30, 2023
2367c1b
add watermarker
patrickvonplaten Jun 30, 2023
558ef96
Model offload (#3889)
pcuenca Jun 30, 2023
ccede46
correct
patrickvonplaten Jun 30, 2023
09ba1b2
fix
patrickvonplaten Jun 30, 2023
008852a
fix
patrickvonplaten Jun 30, 2023
dbc5fe4
clean print
patrickvonplaten Jun 30, 2023
045fc0d
Update install warning for `invisible-watermark`
pcuenca Jul 3, 2023
c7884c5
merge main and resolve conflicts.
sayakpaul Jul 6, 2023
e21e83b
add: missing docstrings.
sayakpaul Jul 6, 2023
9b918eb
fix and simplify the usage example in img2img.
sayakpaul Jul 6, 2023
491bc9f
fix setup for watermarking.
sayakpaul Jul 6, 2023
7525786
Revert "fix setup for watermarking."
sayakpaul Jul 6, 2023
fd2af23
fix: watermarking setup.
sayakpaul Jul 6, 2023
75381ed
fix: op.
sayakpaul Jul 6, 2023
cefee41
run make fix-copies.
sayakpaul Jul 6, 2023
9bc7eab
make sure tests pass
patrickvonplaten Jul 6, 2023
a97ce2e
Merge branch 'main' into sd_xl
patrickvonplaten Jul 6, 2023
75f26d6
improve convert
patrickvonplaten Jul 6, 2023
e6a1381
make tests pass
patrickvonplaten Jul 6, 2023
6002919
Merge branch 'sd_xl' of https://github.com/huggingface/diffusers into…
patrickvonplaten Jul 6, 2023
d9296c5
make tests pass
patrickvonplaten Jul 6, 2023
55ebe05
better error message
patrickvonplaten Jul 6, 2023
46f515d
fiinsh
patrickvonplaten Jul 6, 2023
6ad5005
finish
patrickvonplaten Jul 6, 2023
bc35818
Fix final test
patrickvonplaten Jul 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions .github/workflows/build_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,20 @@ on:
- v*-patch

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: diffusers
notebook_folder: diffusers_doc
languages: en ko zh
build:
steps:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y

- name: Build doc
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: diffusers
notebook_folder: diffusers_doc
languages: en ko zh

secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
18 changes: 12 additions & 6 deletions .github/workflows/build_pr_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,15 @@ concurrency:

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: diffusers
languages: en ko
steps:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y

- name: Build doc
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: diffusers
languages: en ko zh
2 changes: 1 addition & 1 deletion .github/workflows/pr_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:

- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev -y
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m pip install -e .[quality,test]

- name: Environment
Expand Down
4 changes: 3 additions & 1 deletion docker/diffusers-pytorch-cpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ RUN apt update && \
libsndfile1-dev \
python3.8 \
python3-pip \
libgl1 \
python3.8-venv && \
rm -rf /var/lib/apt/lists

Expand All @@ -27,6 +28,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
torch \
torchvision \
torchaudio \
invisible_watermark \
--extra-index-url https://download.pytorch.org/whl/cpu && \
python3 -m pip install --no-cache-dir \
accelerate \
Expand All @@ -40,4 +42,4 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
tensorboard \
transformers

CMD ["/bin/bash"]
CMD ["/bin/bash"]
4 changes: 3 additions & 1 deletion docker/diffusers-pytorch-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ RUN apt update && \
curl \
ca-certificates \
libsndfile1-dev \
libgl1 \
python3.8 \
python3-pip \
python3.8-venv && \
Expand All @@ -26,7 +27,8 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
torch \
torchvision \
torchaudio && \
torchaudio \
invisible_watermark && \
python3 -m pip install --no-cache-dir \
accelerate \
datasets \
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Stable diffusion XL

Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of [Stable Diffusion 1](https://stability.ai/blog/stable-diffusion-public-release).
The project to train Stable Diffusion 2 was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/).

*The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels.
These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).*

For more details about how Stable Diffusion 2 works and how it differs from Stable Diffusion 1, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-v2-release).

## Tips

### Available checkpoints:

- *Text-to-Image (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-base-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) with [`StableDiffusionXLPipeline`]
- *Image-to-Image / Refiner (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9) with [`StableDiffusionXLImg2ImgPipeline`]

TODO

## StableDiffusionXLPipeline

[[autodoc]] StableDiffusionXLPipeline
- all
- __call__

## StableDiffusionXLImg2ImgPipeline

[[autodoc]] StableDiffusionXLImg2ImgPipeline
- all
- __call__
8 changes: 8 additions & 0 deletions scripts/convert_original_stable_diffusion_to_diffusers.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,13 @@
"--controlnet", action="store_true", default=None, help="Set flag if this is a controlnet checkpoint."
)
parser.add_argument("--half", action="store_true", help="Save weights in half precision.")
parser.add_argument(
"--vae_path",
type=str,
default=None,
required=False,
help="Set to a path, hub id to an already converted vae to not convert it again.",
)
args = parser.parse_args()

pipe = download_from_original_stable_diffusion_ckpt(
Expand All @@ -144,6 +151,7 @@
stable_unclip_prior=args.stable_unclip_prior,
clip_stats_path=args.clip_stats_path,
controlnet=args.controlnet,
vae_path=args.vae_path,
)

if args.half:
Expand Down
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@
"huggingface-hub>=0.13.2",
"requests-mock==1.10.0",
"importlib_metadata",
"invisible-watermark",
"isort>=5.5.4",
"jax>=0.2.8,!=0.3.2",
"jaxlib>=0.1.65",
Expand Down Expand Up @@ -193,6 +194,7 @@ def run(self):
"compel",
"datasets",
"Jinja2",
"invisible-watermark",
"k-diffusion",
"librosa",
"omegaconf",
Expand Down
9 changes: 9 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
OptionalDependencyNotAvailable,
is_flax_available,
is_inflect_available,
is_invisible_watermark_available,
is_k_diffusion_available,
is_k_diffusion_version,
is_librosa_available,
Expand Down Expand Up @@ -179,6 +180,14 @@
VQDiffusionPipeline,
)

try:
if not (is_torch_available() and is_transformers_available() and is_invisible_watermark_available()):
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
from .utils.dummy_torch_and_transformers_and_invisible_watermark_objects import * # noqa F403
else:
from .pipelines import StableDiffusionXLImg2ImgPipeline, StableDiffusionXLPipeline

try:
if not (is_torch_available() and is_transformers_available() and is_k_diffusion_available()):
raise OptionalDependencyNotAvailable()
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/dependency_versions_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
"huggingface-hub": "huggingface-hub>=0.13.2",
"requests-mock": "requests-mock==1.10.0",
"importlib_metadata": "importlib_metadata",
"invisible-watermark": "invisible-watermark",
"isort": "isort>=5.5.4",
"jax": "jax>=0.2.8,!=0.3.2",
"jaxlib": "jaxlib>=0.1.65",
Expand Down
2 changes: 2 additions & 0 deletions src/diffusers/models/attention_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -1118,7 +1118,9 @@ def __call__(
value = attn.to_v(encoder_hidden_states)

head_dim = inner_dim // attn.heads

query = query.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)

key = key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
value = value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)

Expand Down
13 changes: 10 additions & 3 deletions src/diffusers/models/unet_2d_blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ def get_down_block(
add_downsample,
resnet_eps,
resnet_act_fn,
transformer_layers_per_block=1,
num_attention_heads=None,
resnet_groups=None,
cross_attention_dim=None,
Expand Down Expand Up @@ -111,6 +112,7 @@ def get_down_block(
raise ValueError("cross_attention_dim must be specified for CrossAttnDownBlock2D")
return CrossAttnDownBlock2D(
num_layers=num_layers,
transformer_layers_per_block=transformer_layers_per_block,
in_channels=in_channels,
out_channels=out_channels,
temb_channels=temb_channels,
Expand Down Expand Up @@ -232,6 +234,7 @@ def get_up_block(
add_upsample,
resnet_eps,
resnet_act_fn,
transformer_layers_per_block=1,
num_attention_heads=None,
resnet_groups=None,
cross_attention_dim=None,
Expand Down Expand Up @@ -287,6 +290,7 @@ def get_up_block(
raise ValueError("cross_attention_dim must be specified for CrossAttnUpBlock2D")
return CrossAttnUpBlock2D(
num_layers=num_layers,
transformer_layers_per_block=transformer_layers_per_block,
in_channels=in_channels,
out_channels=out_channels,
prev_output_channel=prev_output_channel,
Expand Down Expand Up @@ -517,6 +521,7 @@ def __init__(
temb_channels: int,
dropout: float = 0.0,
num_layers: int = 1,
transformer_layers_per_block: int = 1,
resnet_eps: float = 1e-6,
resnet_time_scale_shift: str = "default",
resnet_act_fn: str = "swish",
Expand Down Expand Up @@ -559,7 +564,7 @@ def __init__(
num_attention_heads,
in_channels // num_attention_heads,
in_channels=in_channels,
num_layers=1,
num_layers=transformer_layers_per_block,
cross_attention_dim=cross_attention_dim,
norm_num_groups=resnet_groups,
use_linear_projection=use_linear_projection,
Expand Down Expand Up @@ -862,6 +867,7 @@ def __init__(
temb_channels: int,
dropout: float = 0.0,
num_layers: int = 1,
transformer_layers_per_block: int = 1,
resnet_eps: float = 1e-6,
resnet_time_scale_shift: str = "default",
resnet_act_fn: str = "swish",
Expand Down Expand Up @@ -906,7 +912,7 @@ def __init__(
num_attention_heads,
out_channels // num_attention_heads,
in_channels=out_channels,
num_layers=1,
num_layers=transformer_layers_per_block,
cross_attention_dim=cross_attention_dim,
norm_num_groups=resnet_groups,
use_linear_projection=use_linear_projection,
Expand Down Expand Up @@ -1995,6 +2001,7 @@ def __init__(
temb_channels: int,
dropout: float = 0.0,
num_layers: int = 1,
transformer_layers_per_block: int = 1,
resnet_eps: float = 1e-6,
resnet_time_scale_shift: str = "default",
resnet_act_fn: str = "swish",
Expand Down Expand Up @@ -2040,7 +2047,7 @@ def __init__(
num_attention_heads,
out_channels // num_attention_heads,
in_channels=out_channels,
num_layers=1,
num_layers=transformer_layers_per_block,
cross_attention_dim=cross_attention_dim,
norm_num_groups=resnet_groups,
use_linear_projection=use_linear_projection,
Expand Down
Loading