Skip to content

Commit c8a019d

Browse files
committed
Merge branch 'main' into tests/fix-xformers-tests
2 parents a360039 + c291617 commit c8a019d

File tree

50 files changed

+3034
-591
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+3034
-591
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 116 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ jobs:
116116
run:
117117
shell: bash
118118
strategy:
119+
fail-fast: false
119120
max-parallel: 2
120121
matrix:
121122
module: [models, schedulers, lora, others, single_file, examples]
@@ -290,64 +291,118 @@ jobs:
290291
pip install slack_sdk tabulate
291292
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
292293
293-
run_nightly_tests_apple_m1:
294-
name: Nightly PyTorch MPS tests on MacOS
295-
runs-on: [ self-hosted, apple-m1 ]
296-
if: github.event_name == 'schedule'
297-
298-
steps:
299-
- name: Checkout diffusers
300-
uses: actions/checkout@v3
301-
with:
302-
fetch-depth: 2
303-
304-
- name: Clean checkout
305-
shell: arch -arch arm64 bash {0}
306-
run: |
307-
git clean -fxd
308-
309-
- name: Setup miniconda
310-
uses: ./.github/actions/setup-miniconda
311-
with:
312-
python-version: 3.9
313-
314-
- name: Install dependencies
315-
shell: arch -arch arm64 bash {0}
316-
run: |
317-
${CONDA_RUN} python -m pip install --upgrade pip uv
318-
${CONDA_RUN} python -m uv pip install -e [quality,test]
319-
${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
320-
${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
321-
${CONDA_RUN} python -m uv pip install pytest-reportlog
322-
323-
- name: Environment
324-
shell: arch -arch arm64 bash {0}
325-
run: |
326-
${CONDA_RUN} python utils/print_env.py
327-
328-
- name: Run nightly PyTorch tests on M1 (MPS)
329-
shell: arch -arch arm64 bash {0}
330-
env:
331-
HF_HOME: /System/Volumes/Data/mnt/cache
332-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
333-
run: |
334-
${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
335-
--report-log=tests_torch_mps.log \
336-
tests/
337-
338-
- name: Failure short reports
339-
if: ${{ failure() }}
340-
run: cat reports/tests_torch_mps_failures_short.txt
341-
342-
- name: Test suite reports artifacts
343-
if: ${{ always() }}
344-
uses: actions/upload-artifact@v2
345-
with:
346-
name: torch_mps_test_reports
347-
path: reports
348-
349-
- name: Generate Report and Notify Channel
350-
if: always()
351-
run: |
352-
pip install slack_sdk tabulate
353-
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
294+
# M1 runner currently not well supported
295+
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
296+
# run_nightly_tests_apple_m1:
297+
# name: Nightly PyTorch MPS tests on MacOS
298+
# runs-on: [ self-hosted, apple-m1 ]
299+
# if: github.event_name == 'schedule'
300+
#
301+
# steps:
302+
# - name: Checkout diffusers
303+
# uses: actions/checkout@v3
304+
# with:
305+
# fetch-depth: 2
306+
#
307+
# - name: Clean checkout
308+
# shell: arch -arch arm64 bash {0}
309+
# run: |
310+
# git clean -fxd
311+
# - name: Setup miniconda
312+
# uses: ./.github/actions/setup-miniconda
313+
# with:
314+
# python-version: 3.9
315+
#
316+
# - name: Install dependencies
317+
# shell: arch -arch arm64 bash {0}
318+
# run: |
319+
# ${CONDA_RUN} python -m pip install --upgrade pip uv
320+
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
321+
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
322+
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
323+
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
324+
# - name: Environment
325+
# shell: arch -arch arm64 bash {0}
326+
# run: |
327+
# ${CONDA_RUN} python utils/print_env.py
328+
# - name: Run nightly PyTorch tests on M1 (MPS)
329+
# shell: arch -arch arm64 bash {0}
330+
# env:
331+
# HF_HOME: /System/Volumes/Data/mnt/cache
332+
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
333+
# run: |
334+
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
335+
# --report-log=tests_torch_mps.log \
336+
# tests/
337+
# - name: Failure short reports
338+
# if: ${{ failure() }}
339+
# run: cat reports/tests_torch_mps_failures_short.txt
340+
#
341+
# - name: Test suite reports artifacts
342+
# if: ${{ always() }}
343+
# uses: actions/upload-artifact@v2
344+
# with:
345+
# name: torch_mps_test_reports
346+
# path: reports
347+
#
348+
# - name: Generate Report and Notify Channel
349+
# if: always()
350+
# run: |
351+
# pip install slack_sdk tabulate
352+
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY run_nightly_tests_apple_m1:
353+
# name: Nightly PyTorch MPS tests on MacOS
354+
# runs-on: [ self-hosted, apple-m1 ]
355+
# if: github.event_name == 'schedule'
356+
#
357+
# steps:
358+
# - name: Checkout diffusers
359+
# uses: actions/checkout@v3
360+
# with:
361+
# fetch-depth: 2
362+
#
363+
# - name: Clean checkout
364+
# shell: arch -arch arm64 bash {0}
365+
# run: |
366+
# git clean -fxd
367+
# - name: Setup miniconda
368+
# uses: ./.github/actions/setup-miniconda
369+
# with:
370+
# python-version: 3.9
371+
#
372+
# - name: Install dependencies
373+
# shell: arch -arch arm64 bash {0}
374+
# run: |
375+
# ${CONDA_RUN} python -m pip install --upgrade pip uv
376+
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
377+
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
378+
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
379+
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
380+
# - name: Environment
381+
# shell: arch -arch arm64 bash {0}
382+
# run: |
383+
# ${CONDA_RUN} python utils/print_env.py
384+
# - name: Run nightly PyTorch tests on M1 (MPS)
385+
# shell: arch -arch arm64 bash {0}
386+
# env:
387+
# HF_HOME: /System/Volumes/Data/mnt/cache
388+
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
389+
# run: |
390+
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
391+
# --report-log=tests_torch_mps.log \
392+
# tests/
393+
# - name: Failure short reports
394+
# if: ${{ failure() }}
395+
# run: cat reports/tests_torch_mps_failures_short.txt
396+
#
397+
# - name: Test suite reports artifacts
398+
# if: ${{ always() }}
399+
# uses: actions/upload-artifact@v2
400+
# with:
401+
# name: torch_mps_test_reports
402+
# path: reports
403+
#
404+
# - name: Generate Report and Notify Channel
405+
# if: always()
406+
# run: |
407+
# pip install slack_sdk tabulate
408+
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

.github/workflows/push_tests.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,8 @@ jobs:
112112
run:
113113
shell: bash
114114
strategy:
115+
fail-fast: false
116+
max-parallel: 2
115117
matrix:
116118
module: [models, schedulers, lora, others, single_file]
117119
steps:

docs/source/en/api/pipelines/controlnet_sd3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The abstract from the paper is:
2222

2323
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
2424

25-
This controlnet code is mainly implemented by [The InstantX Team](https://huggingface.co/InstantX). The inpainting-related code was developed by [The Alimama Creative Team](https://huggingface.co/alimama-creative). You can find pre-trained checkpoints for SD3-ControlNet in the table below:
25+
This controlnet code is mainly implemented by [The InstantX Team](https://huggingface.co/InstantX). The inpainting-related code was developed by [The Alimama Creative Team](https://huggingface.co/alimama-creative). You can find pre-trained checkpoints for SD3-ControlNet in the table below:
2626

2727

2828
| ControlNet type | Developer | Link |

docs/source/en/api/pipelines/kolors.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
1414

1515
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/kolors/kolors_header_collage.png)
1616

17-
Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by [the Kuaishou Kolors team](kwai-kolors@kuaishou.com). Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this [technical report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf).
17+
Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by [the Kuaishou Kolors team](https://github.com/Kwai-Kolors/Kolors). Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this [technical report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf).
1818

1919
The abstract from the technical report is:
2020

@@ -74,7 +74,7 @@ image_encoder = CLIPVisionModelWithProjection.from_pretrained(
7474

7575
pipe = KolorsPipeline.from_pretrained(
7676
"Kwai-Kolors/Kolors-diffusers", image_encoder=image_encoder, torch_dtype=torch.float16, variant="fp16"
77-
).to("cuda")
77+
)
7878
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
7979

8080
pipe.load_ip_adapter(

docs/source/en/api/pipelines/pag.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The abstract from the paper is:
2020

2121
*Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.*
2222

23-
PAG can be used by specifying the `pag_applied_layers` as a parameter when instantiating a PAG pipeline. It can be a single string or a list of strings. Each string can be a unique layer identifier or a regular expression to identify one or more layers.
23+
PAG can be used by specifying the `pag_applied_layers` as a parameter when instantiating a PAG pipeline. It can be a single string or a list of strings. Each string can be a unique layer identifier or a regular expression to identify one or more layers.
2424

2525
- Full identifier as a normal string: `down_blocks.2.attentions.0.transformer_blocks.0.attn1.processor`
2626
- Full identifier as a RegEx: `down_blocks.2.(attentions|motion_modules).0.transformer_blocks.0.attn1.processor`
@@ -46,7 +46,7 @@ Since RegEx is supported as a way for matching layer identifiers, it is crucial
4646
## KolorsPAGPipeline
4747
[[autodoc]] KolorsPAGPipeline
4848
- all
49-
- __call__
49+
- __call__
5050

5151
## StableDiffusionPAGPipeline
5252
[[autodoc]] StableDiffusionPAGPipeline
@@ -78,6 +78,10 @@ Since RegEx is supported as a way for matching layer identifiers, it is crucial
7878
- all
7979
- __call__
8080

81+
## StableDiffusionXLControlNetPAGImg2ImgPipeline
82+
[[autodoc]] StableDiffusionXLControlNetPAGImg2ImgPipeline
83+
- all
84+
- __call__
8185

8286
## StableDiffusion3PAGPipeline
8387
[[autodoc]] StableDiffusion3PAGPipeline

0 commit comments

Comments
 (0)