Skip to content

Commit 768d662

Browse files
author
yiyixuxu
committed
Merge branch 'main' into fix-scheduler-index
2 parents 3a9d18e + 1328aeb commit 768d662

File tree

147 files changed

+11350
-1135
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

147 files changed

+11350
-1135
lines changed
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: Run Flax dependency tests
2+
3+
on:
4+
pull_request:
5+
branches:
6+
- main
7+
push:
8+
branches:
9+
- main
10+
11+
concurrency:
12+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
13+
cancel-in-progress: true
14+
15+
jobs:
16+
check_flax_dependencies:
17+
runs-on: ubuntu-latest
18+
steps:
19+
- uses: actions/checkout@v3
20+
- name: Set up Python
21+
uses: actions/setup-python@v4
22+
with:
23+
python-version: "3.8"
24+
- name: Install dependencies
25+
run: |
26+
python -m pip install --upgrade pip
27+
pip install -e .
28+
pip install "jax[cpu]>=0.2.16,!=0.3.2"
29+
pip install "flax>=0.4.1"
30+
pip install "jaxlib>=0.1.65"
31+
pip install pytest
32+
- name: Check for soft dependencies
33+
run: |
34+
pytest tests/others/test_dependencies.py

.github/workflows/pr_tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ jobs:
7272
run: |
7373
apt-get update && apt-get install libsndfile1-dev libgl1 -y
7474
python -m pip install -e .[quality,test]
75-
python -m pip install git+https://github.com/huggingface/accelerate.git
75+
python -m pip install accelerate
7676
7777
- name: Environment
7878
run: |
@@ -115,7 +115,7 @@ jobs:
115115
run: |
116116
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
117117
--make-reports=tests_${{ matrix.config.report }} \
118-
examples/test_examples.py
118+
examples/test_examples.py
119119
120120
- name: Failure short reports
121121
if: ${{ failure() }}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: Run Torch dependency tests
2+
3+
on:
4+
pull_request:
5+
branches:
6+
- main
7+
push:
8+
branches:
9+
- main
10+
11+
concurrency:
12+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
13+
cancel-in-progress: true
14+
15+
jobs:
16+
check_torch_dependencies:
17+
runs-on: ubuntu-latest
18+
steps:
19+
- uses: actions/checkout@v3
20+
- name: Set up Python
21+
uses: actions/setup-python@v4
22+
with:
23+
python-version: "3.8"
24+
- name: Install dependencies
25+
run: |
26+
python -m pip install --upgrade pip
27+
pip install -e .
28+
pip install torch torchvision torchaudio
29+
pip install pytest
30+
- name: Check for soft dependencies
31+
run: |
32+
pytest tests/others/test_dependencies.py

docs/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ limitations under the License.
1616

1717
# Generating the documentation
1818

19-
To generate the documentation, you first have to build it. Several packages are necessary to build the doc,
19+
To generate the documentation, you first have to build it. Several packages are necessary to build the doc,
2020
you can install them with the following command, at the root of the code repository:
2121

2222
```bash
@@ -142,7 +142,7 @@ This will include every public method of the pipeline that is documented, as wel
142142
- __call__
143143
- enable_attention_slicing
144144
- disable_attention_slicing
145-
- enable_xformers_memory_efficient_attention
145+
- enable_xformers_memory_efficient_attention
146146
- disable_xformers_memory_efficient_attention
147147
```
148148

@@ -154,7 +154,7 @@ Values that should be put in `code` should either be surrounded by backticks: \`
154154
and objects like True, None, or any strings should usually be put in `code`.
155155

156156
When mentioning a class, function, or method, it is recommended to use our syntax for internal links so that our tool
157-
adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
157+
adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
158158
function to be in the main package.
159159

160160
If you want to create a link to some internal class or function, you need to

docs/TRANSLATING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Here, `LANG-ID` should be one of the ISO 639-1 or ISO 639-2 language codes -- se
3838

3939
The fun part comes - translating the text!
4040

41-
The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
41+
The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
4242

4343
> 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/LANG-ID/` directory!
4444

docs/source/en/_toctree.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@
7272
title: Overview
7373
- local: using-diffusers/sdxl
7474
title: Stable Diffusion XL
75+
- local: using-diffusers/lcm
76+
title: Latent Consistency Models
7577
- local: using-diffusers/kandinsky
7678
title: Kandinsky
7779
- local: using-diffusers/controlnet
@@ -133,7 +135,7 @@
133135
- local: optimization/memory
134136
title: Reduce memory usage
135137
- local: optimization/torch2.0
136-
title: Torch 2.0
138+
title: PyTorch 2.0
137139
- local: optimization/xformers
138140
title: xFormers
139141
- local: optimization/tome
@@ -200,6 +202,8 @@
200202
title: AsymmetricAutoencoderKL
201203
- local: api/models/autoencoder_tiny
202204
title: Tiny AutoEncoder
205+
- local: api/models/consistency_decoder_vae
206+
title: ConsistencyDecoderVAE
203207
- local: api/models/transformer2d
204208
title: Transformer2D
205209
- local: api/models/transformer_temporal
@@ -268,6 +272,8 @@
268272
title: Parallel Sampling of Diffusion Models
269273
- local: api/pipelines/pix2pix_zero
270274
title: Pix2Pix Zero
275+
- local: api/pipelines/pixart
276+
title: PixArt
271277
- local: api/pipelines/pndm
272278
title: PNDM
273279
- local: api/pipelines/repaint
@@ -342,6 +348,8 @@
342348
title: Overview
343349
- local: api/schedulers/cm_stochastic_iterative
344350
title: CMStochasticIterativeScheduler
351+
- local: api/schedulers/consistency_decoder
352+
title: ConsistencyDecoderScheduler
345353
- local: api/schedulers/ddim_inverse
346354
title: DDIMInverseScheduler
347355
- local: api/schedulers/ddim
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Consistency Decoder
2+
3+
Consistency decoder can be used to decode the latents from the denoising UNet in the [`StableDiffusionPipeline`]. This decoder was introduced in the [DALL-E 3 technical report](https://openai.com/dall-e-3).
4+
5+
The original codebase can be found at [openai/consistencydecoder](https://github.com/openai/consistencydecoder).
6+
7+
<Tip warning={true}>
8+
9+
Inference is only supported for 2 iterations as of now.
10+
11+
</Tip>
12+
13+
The pipeline could not have been contributed without the help of [madebyollin](https://github.com/madebyollin) and [mrsteyk](https://github.com/mrsteyk) from [this issue](https://github.com/openai/consistencydecoder/issues/1).
14+
15+
## ConsistencyDecoderVAE
16+
[[autodoc]] ConsistencyDecoderVAE
17+
- all
18+
- decode
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# PixArt
14+
15+
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pixart/header_collage.png)
16+
17+
[PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis](https://huggingface.co/papers/2310.00426) is Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li.
18+
19+
The abstract from the paper is:
20+
21+
*The most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours), seriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions. This paper introduces PIXART-α, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), reaching near-commercial application standards. Additionally, it supports high-resolution image synthesis up to 1024px resolution with low training cost, as shown in Figure 1 and 2. To achieve this goal, three core designs are proposed: (1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; (2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; (3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. As a result, PIXART-α's training speed markedly surpasses existing large-scale T2I models, e.g., PIXART-α only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days), saving nearly $300,000 ($26,000 vs. $320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%. Extensive experiments demonstrate that PIXART-α excels in image quality, artistry, and semantic control. We hope PIXART-α will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.*
22+
23+
You can find the original codebase at [PixArt-alpha/PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha) and all the available checkpoints at [PixArt-alpha](https://huggingface.co/PixArt-alpha).
24+
25+
Some notes about this pipeline:
26+
27+
* It uses a Transformer backbone (instead of a UNet) for denoising. As such it has a similar architecture as [DiT](./dit.md).
28+
* It was trained using text conditions computed from T5. This aspect makes the pipeline better at following complex text prompts with intricate details.
29+
* It is good at producing high-resolution images at different aspect ratios. To get the best results, the authors recommend some size brackets which can be found [here](https://github.com/PixArt-alpha/PixArt-alpha/blob/08fbbd281ec96866109bdd2cdb75f2f58fb17610/diffusion/data/datasets/utils.py).
30+
* It rivals the quality of state-of-the-art text-to-image generation systems (as of this writing) such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient than them.
31+
32+
## PixArtAlphaPipeline
33+
34+
[[autodoc]] PixArtAlphaPipeline
35+
- all
36+
- __call__
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# ConsistencyDecoderScheduler
2+
3+
This scheduler is a part of the [`ConsistencyDecoderPipeline`] and was introduced in [DALL-E 3](https://openai.com/dall-e-3).
4+
5+
The original codebase can be found at [openai/consistency_models](https://github.com/openai/consistency_models).
6+
7+
8+
## ConsistencyDecoderScheduler
9+
[[autodoc]] schedulers.scheduling_consistency_decoder.ConsistencyDecoderScheduler

docs/source/en/conceptual/ethical_guidelines.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
1414

1515
## Preamble
1616

17-
[Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training.
17+
[Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training.
1818

1919
Given its real case applications in the world and potential negative impacts on society, we think it is important to provide the project with ethical guidelines to guide the development, users’ contributions, and usage of the Diffusers library.
2020

@@ -46,7 +46,7 @@ The following ethical guidelines apply generally, but we will primarily implemen
4646

4747
## Examples of implementations: Safety features and Mechanisms
4848

49-
The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us.
49+
The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us.
5050

5151
- [**Community tab**](https://huggingface.co/docs/hub/repositories-pull-requests-discussions): it enables the community to discuss and better collaborate on a project.
5252

@@ -60,4 +60,4 @@ The team works daily to make the technical and non-technical tools available to
6060

6161
- **Staged released on the Hub**: in particularly sensitive situations, access to some repositories should be restricted. This staged release is an intermediary step that allows the repository’s authors to have more control over its use.
6262

63-
- **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use.
63+
- **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use.

0 commit comments

Comments
 (0)