Skip to content

Commit 5d6b341

Browse files
committed
first draft
1 parent 5313aa6 commit 5d6b341

File tree

8 files changed

+156
-349
lines changed

8 files changed

+156
-349
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,8 @@
7979
- local: using-diffusers/custom_pipeline_examples
8080
title: Community pipelines
8181
- local: using-diffusers/contribute_pipeline
82-
title: How to contribute a community pipeline
83-
title: Pipelines for Inference
82+
title: Contribute a community pipeline
83+
title: Specific pipeline examples
8484
- sections:
8585
- local: training/overview
8686
title: Overview
@@ -162,8 +162,6 @@
162162
- sections:
163163
- local: api/attnprocessor
164164
title: Attention Processor
165-
- local: api/diffusion_pipeline
166-
title: Diffusion Pipeline
167165
- local: api/logging
168166
title: Logging
169167
- local: api/configuration

docs/source/en/api/diffusion_pipeline.md

Lines changed: 0 additions & 36 deletions
This file was deleted.

docs/source/en/api/pipelines/overview.md

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,70 @@ specific language governing permissions and limitations under the License.
1212

1313
# Pipelines
1414

15-
Pipelines provide a simple way to run state-of-the-art diffusion models in inference by bundling all of the necessary components (multiple independently-trained models, schedulers, and processors) into a single end-to-end class. Pipelines are flexible and they can be adapted to use different scheduler or even model components.
15+
Pipelines provide a simple way to run state-of-the-art diffusion models in inference by bundling all of the necessary components (multiple independently-trained models, schedulers, and processors) into a single end-to-end class. Pipelines are flexible and they can be adapted to use different schedulers or even model components.
1616

17-
All pipelines are built from the base [`DiffusionPipeline`] class which provides basic functionality for loading, downloading, and saving all the components.
17+
All pipelines are built from the base [`DiffusionPipeline`] class which provides basic functionality for loading, downloading, and saving all the components. Specific pipeline types (for example [`StableDiffusionPipeline`]) loaded with [`~DiffusionPipeline.from_pretrained`] are automatically detected and the pipeline components are loaded and passed to the `__init__` function of the pipeline.
1818

1919
<Tip warning={true}>
2020

21-
Pipelines do not offer any training functionality. You'll notice PyTorch's autograd is disabled by decorating the [`~DiffusionPipeline.__call__`] method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should not be used for training. If you're interested in training, please take a look at the [Training](../traininig/overview) guides instead!
21+
Pipelines do not offer any training functionality. You'll notice PyTorch's autograd is disabled by decorating the [`~DiffusionPipeline.__call__`] method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should not be used for training. If you're interested in training, please take a look at the [Training](../../training/overview) guides instead!
2222

2323
</Tip>
2424

25+
The table below lists all the pipelines currently available in 🤗 Diffusers and the tasks they support. Click on a pipeline to view its abstract and published paper.
26+
27+
| Pipeline | Tasks |
28+
|---|---|
29+
| [AltDiffusion](alt_diffusion) | image2image |
30+
| [Attend-and-Excite](attend_and_excite) | text2image |
31+
| [Audio Diffusion](audio_diffusion) | image2audio |
32+
| [AudioLDM](audioldm) | text2audio |
33+
| [AudioLDM2](audioldm2) | text2audio |
34+
| [BLIP Diffusion](blip_diffusion) | text2image |
35+
| [Consistency Models](consistency_models) | unconditional image generation |
36+
| [ControlNet](controlnet) | text2image, image2image, inpainting |
37+
| [ControlNet with Stable Diffusion XL](controlnet_sdxl) | text2image |
38+
| [Cycle Diffusion](cycle_diffusion) | image2image |
39+
| [Dance Diffusion](dance_diffusion) | unconditional audio generation |
40+
| [DDIM](ddim) | unconditional image generation |
41+
| [DDPM](ddpm) | unconditional image generation |
42+
| [DeepFloyd IF](deepfloyd_if) | text2image, image2image, inpainting, super-resolution |
43+
| [DiffEdit](diffedit) | inpainting |
44+
| [DiT](dit) | text2image |
45+
| [GLIGEN](gligen) | text2image |
46+
| [InstructPix2Pix](pix2pix) | image editing |
47+
| [Kandinsky](kandinsky) | text2image, image2image, inpainting, interpolation |
48+
| [Kandinsky 2.2](kandinsky_v22) | text2image, image2image, inpainting |
49+
| [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
50+
| [LDM3D](ldm3d_diffusion) | text2image, text-to-3D |
51+
| [MultiDiffusion](panorama) | text2image |
52+
| [MusicLDM](musicldm) | text2audio |
53+
| [PaintByExample](paint_by_example) | inpainting |
54+
| [ParaDiGMS](paradigms) | text2image |
55+
| [Pix2Pix Zero](pix2pix_zero) | image editing |
56+
| [PNDM](pndm) | unconditional image generation |
57+
| [RePaint](repaint) | inpainting |
58+
| [ScoreSdeVe](score_sde_ve) | unconditional image generation |
59+
| [Self-Attention Guidance](self_attention_guidance) | text2image |
60+
| [Semantic Guidance](semantic_stable_diffusion) | text2image |
61+
| [Shap-E](shap_e) | text-to-3D, image-to-3D |
62+
| [Spectrogram Diffusion](spectrogram_diffusion) | |
63+
| [StableDiffusion](stable_diffusion/overview) | text2image, image2image, depth2image, inpainting, image variation, latent upscaler, super-resolution |
64+
| [StableDiffusionModelEditing](model_editing) | model editing |
65+
| [Stable Diffusion XL](stable_diffusion_xl) | text2image, image2image, inpainting |
66+
| [Stable unCLIP](stable_unclip) | text2image, image variation |
67+
| [KarrasVe](karras_ve) | unconditional image generation |
68+
| [T2I Adapter](adapter) | text2image |
69+
| [Text2Video](text_to_video) | text2video, video2video |
70+
| [Text2Video Zero](text_to_video_zero) | text2video |
71+
| [UnCLIP](unclip) | text2image, image variation |
72+
| [Unconditional Latent Diffusion](latent_diffusion_uncond) | unconditional image generation |
73+
| [UniDiffuser](unidiffuser) | text2image, image2text, image variation, text variation, unconditional image generation, unconditional audio generation |
74+
| [Value-guided planning](value_guided_sampling) | value guided sampling |
75+
| [Versatile Diffusion](versatile_diffusion) | text2image, image variation |
76+
| [VQ Diffusion](vq_diffusion) | text2image |
77+
| [Wuerstchen](wuerstchen) | text2image |
78+
2579
## DiffusionPipeline
2680

2781
[[autodoc]] DiffusionPipeline

docs/source/en/index.md

Lines changed: 1 addition & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -45,54 +45,4 @@ The library has three main components:
4545
<p class="text-gray-700">Technical descriptions of how 🤗 Diffusers classes and methods work.</p>
4646
</a>
4747
</div>
48-
</div>
49-
50-
## Supported pipelines
51-
52-
| Pipeline | Paper/Repository | Tasks |
53-
|---|---|:---:|
54-
| [alt_diffusion](./api/pipelines/alt_diffusion) | [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation |
55-
| [audio_diffusion](./api/pipelines/audio_diffusion) | [Audio Diffusion](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation |
56-
| [controlnet](./api/pipelines/controlnet) | [Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation |
57-
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
58-
| [dance_diffusion](./api/pipelines/dance_diffusion) | [Dance Diffusion](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
59-
| [ddpm](./api/pipelines/ddpm) | [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |
60-
| [ddim](./api/pipelines/ddim) | [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation |
61-
| [if](./if) | [**IF**](./api/pipelines/if) | Image Generation |
62-
| [if_img2img](./if) | [**IF**](./api/pipelines/if) | Image-to-Image Generation |
63-
| [if_inpainting](./if) | [**IF**](./api/pipelines/if) | Image-to-Image Generation |
64-
| [latent_diffusion](./api/pipelines/latent_diffusion) | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation |
65-
| [latent_diffusion](./api/pipelines/latent_diffusion) | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image |
66-
| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation |
67-
| [paint_by_example](./api/pipelines/paint_by_example) | [Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting |
68-
| [pndm](./api/pipelines/pndm) | [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation |
69-
| [score_sde_ve](./api/pipelines/score_sde_ve) | [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
70-
| [score_sde_vp](./api/pipelines/score_sde_vp) | [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
71-
| [semantic_stable_diffusion](./api/pipelines/semantic_stable_diffusion) | [Semantic Guidance](https://arxiv.org/abs/2301.12247) | Text-Guided Generation |
72-
| [stable_diffusion_adapter](./api/pipelines/stable_diffusion/adapter) | [**T2I-Adapter**](https://arxiv.org/abs/2302.08453) | Image-to-Image Text-Guided Generation | -
73-
| [stable_diffusion_text2img](./api/pipelines/stable_diffusion/text2img) | [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation |
74-
| [stable_diffusion_img2img](./api/pipelines/stable_diffusion/img2img) | [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation |
75-
| [stable_diffusion_inpaint](./api/pipelines/stable_diffusion/inpaint) | [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting |
76-
| [stable_diffusion_panorama](./api/pipelines/stable_diffusion/panorama) | [MultiDiffusion](https://multidiffusion.github.io/) | Text-to-Panorama Generation |
77-
| [stable_diffusion_pix2pix](./api/pipelines/stable_diffusion/pix2pix) | [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800) | Text-Guided Image Editing|
78-
| [stable_diffusion_pix2pix_zero](./api/pipelines/stable_diffusion/pix2pix_zero) | [Zero-shot Image-to-Image Translation](https://pix2pixzero.github.io/) | Text-Guided Image Editing |
79-
| [stable_diffusion_attend_and_excite](./api/pipelines/stable_diffusion/attend_and_excite) | [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://arxiv.org/abs/2301.13826) | Text-to-Image Generation |
80-
| [stable_diffusion_self_attention_guidance](./api/pipelines/stable_diffusion/self_attention_guidance) | [Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://arxiv.org/abs/2210.00939) | Text-to-Image Generation Unconditional Image Generation |
81-
| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [Stable Diffusion Image Variations](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation |
82-
| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [Stable Diffusion Latent Upscaler](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image |
83-
| [stable_diffusion_model_editing](./api/pipelines/stable_diffusion/model_editing) | [Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://time-diffusion.github.io/) | Text-to-Image Model Editing |
84-
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation |
85-
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting |
86-
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Depth-Conditional Stable Diffusion](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) | Depth-to-Image Generation |
87-
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
88-
| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [Safe Stable Diffusion](https://arxiv.org/abs/2211.05105) | Text-Guided Generation |
89-
| [stable_unclip](./stable_unclip) | Stable unCLIP | Text-to-Image Generation |
90-
| [stable_unclip](./stable_unclip) | Stable unCLIP | Image-to-Image Text-Guided Generation |
91-
| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
92-
| [text_to_video_sd](./api/pipelines/text_to_video) | [Modelscope's Text-to-video-synthesis Model in Open Domain](https://modelscope.cn/models/damo/text-to-video-synthesis/summary) | Text-to-Video Generation |
93-
| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125)(implementation by [kakaobrain](https://github.com/kakaobrain/karlo)) | Text-to-Image Generation |
94-
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |
95-
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
96-
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
97-
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |
98-
| [stable_diffusion_ldm3d](./api/pipelines/stable_diffusion/ldm3d_diffusion) | [LDM3D: Latent Diffusion Model for 3D](https://arxiv.org/abs/2305.10853) | Text to Image and Depth Generation |
48+
</div>

0 commit comments

Comments
 (0)