You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Docs] Fix typos and update files at API's Main Classes, Models, and Schedulers pages (#5720)
* Fix typos, update, add Copyright info, and trim trailing whitespaces
* Update docs/source/en/api/loaders.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/api/models/autoencoder_tiny.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/api/models/autoencoder_tiny.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/source/en/api/image_processor.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,9 +12,9 @@ specific language governing permissions and limitations under the License.
12
12
13
13
# VAE Image Processor
14
14
15
-
The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]'s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
15
+
The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
16
16
17
-
All pipelines with [`VaeImageProcessor`]accepts PIL Image, PyTorch tensor, or NumPy arrays as image inputs and returns outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="pt"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.
17
+
All pipelines with [`VaeImageProcessor`]accept PIL Image, PyTorch tensor, or NumPy arrays as image inputs and return outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="latent"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.
18
18
19
19
## VaeImageProcessor
20
20
@@ -24,4 +24,4 @@ All pipelines with [`VaeImageProcessor`] accepts PIL Image, PyTorch tensor, or N
24
24
25
25
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2
+
3
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+
the License. You may obtain a copy of the License at
5
+
6
+
http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+
specific language governing permissions and limitations under the License.
11
+
-->
12
+
1
13
# Overview
2
14
3
15
The APIs in this section are more experimental and prone to breaking changes. Most of them are used internally for development, but they may also be useful to you if you're interested in building a diffusion model with some custom parts or if you're interested in some of our helper utilities for working with 🤗 Diffusers.
Copy file name to clipboardExpand all lines: docs/source/en/api/loaders.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,11 +12,11 @@ specific language governing permissions and limitations under the License.
12
12
13
13
# Loaders
14
14
15
-
Adapters (textual inversion, LoRA, hypernetworks) allow you to modify a diffusion model to generate images in a specific style without training or finetuning the entire model. The adapter weights are typically only a tiny fraction of the pretrained model's which making them very portable. 🤗 Diffusers provides an easy-to-use `LoaderMixin` API to load adapter weights.
15
+
Adapters (textual inversion, LoRA, hypernetworks) allow you to modify a diffusion model to generate images in a specific style without training or finetuning the entire model. The adapter weights are very portable because they're typically only a tiny fraction of the pretrained model weights. 🤗 Diffusers provides an easy-to-use `LoaderMixin` API to load adapter weights.
16
16
17
17
<Tipwarning={true}>
18
18
19
-
🧪 The `LoaderMixins` are highly experimental and prone to future changes. To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
19
+
🧪 The `LoaderMixin`s are highly experimental and prone to future changes. To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
Copy file name to clipboardExpand all lines: docs/source/en/api/models/asymmetricautoencoderkl.md
+19-14Lines changed: 19 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,15 @@
1
+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2
+
3
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+
the License. You may obtain a copy of the License at
5
+
6
+
http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+
specific language governing permissions and limitations under the License.
11
+
-->
12
+
1
13
# AsymmetricAutoencoderKL
2
14
3
15
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
@@ -6,7 +18,7 @@ The abstract from the paper is:
6
18
7
19
*StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting. Secondly, the decoder is much heavier than the encoder, allowing for more detailed recovery while only slightly increasing the total inference cost. The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged. Our asymmetric VQGAN can be widely used in StableDiffusion-based inpainting and local editing methods. Extensive experiments demonstrate that it can significantly improve the inpainting and editing performance, while maintaining the original text-to-image capability. The code is available at https://github.com/buxiangzhiren/Asymmetric_VQGAN*
8
20
9
-
Evaluation results can be found in section 4.1 of the original paper.
21
+
Evaluation results can be found in section 4.1 of the original paper.
10
22
11
23
## Available checkpoints
12
24
@@ -16,30 +28,23 @@ Evaluation results can be found in section 4.1 of the original paper.
16
28
## Example Usage
17
29
18
30
```python
19
-
from io import BytesIO
20
-
fromPILimport Image
21
-
import requests
22
31
from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline
32
+
from diffusers.utils import load_image, make_image_grid
Copy file name to clipboardExpand all lines: docs/source/en/api/models/autoencoder_tiny.md
+16-4Lines changed: 16 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,18 @@
1
+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2
+
3
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+
the License. You may obtain a copy of the License at
5
+
6
+
http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+
specific language governing permissions and limitations under the License.
11
+
-->
12
+
1
13
# Tiny AutoEncoder
2
14
3
-
Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [`StableDiffusionPipeline`] or [`StableDiffusionXLPipeline`] almost instantly.
15
+
Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [`StableDiffusionPipeline`] or [`StableDiffusionXLPipeline`] almost instantly.
4
16
5
17
To use with Stable Diffusion v-2.1:
6
18
@@ -16,7 +28,7 @@ pipe = pipe.to("cuda")
16
28
17
29
prompt ="slice of delicious New York-style berry cheesecake"
Copy file name to clipboardExpand all lines: docs/source/en/api/models/autoencoderkl.md
+13-1Lines changed: 13 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,15 @@
1
+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2
+
3
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+
the License. You may obtain a copy of the License at
5
+
6
+
http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+
specific language governing permissions and limitations under the License.
11
+
-->
12
+
1
13
# AutoencoderKL
2
14
3
15
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
@@ -14,7 +26,7 @@ from the original format using [`FromOriginalVAEMixin.from_single_file`] as foll
14
26
```py
15
27
from diffusers import AutoencoderKL
16
28
17
-
url ="https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors"# can also be local file
29
+
url ="https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors"# can also be a local file
Copy file name to clipboardExpand all lines: docs/source/en/api/models/controlnet.md
+14-2Lines changed: 14 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,22 @@
1
+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2
+
3
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+
the License. You may obtain a copy of the License at
5
+
6
+
http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+
specific language governing permissions and limitations under the License.
11
+
-->
12
+
1
13
# ControlNet
2
14
3
-
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
15
+
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
4
16
5
17
The abstract from the paper is:
6
18
7
-
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
19
+
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
0 commit comments