From d8039c6db9465134f87a702a2c905bda87940cad Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=2E=20Tolga=20Cang=C3=B6z?= Date: Mon, 30 Oct 2023 20:03:27 +0300 Subject: [PATCH 1/5] Add Copyright info --- docs/source/en/using-diffusers/control_brightness.md | 12 ++++++++++++ docs/source/en/using-diffusers/controlnet.md | 12 ++++++++++++ docs/source/en/using-diffusers/diffedit.md | 12 ++++++++++++ docs/source/en/using-diffusers/distilled_sd.md | 12 ++++++++++++ docs/source/en/using-diffusers/freeu.md | 12 ++++++++++++ docs/source/en/using-diffusers/sdxl.md | 12 ++++++++++++ docs/source/en/using-diffusers/shap-e.md | 12 ++++++++++++ .../using-diffusers/stable_diffusion_jax_how_to.md | 12 ++++++++++++ .../using-diffusers/textual_inversion_inference.md | 12 ++++++++++++ 9 files changed, 108 insertions(+) diff --git a/docs/source/en/using-diffusers/control_brightness.md b/docs/source/en/using-diffusers/control_brightness.md index c56c757bb1bc..17c107ba57b8 100644 --- a/docs/source/en/using-diffusers/control_brightness.md +++ b/docs/source/en/using-diffusers/control_brightness.md @@ -1,3 +1,15 @@ + + # Control image brightness The Stable Diffusion pipeline is mediocre at generating images that are either very bright or dark as explained in the [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) paper. The solutions proposed in the paper are currently implemented in the [`DDIMScheduler`] which you can use to improve the lighting in your images. diff --git a/docs/source/en/using-diffusers/controlnet.md b/docs/source/en/using-diffusers/controlnet.md index 9af2806672be..71fd3c7a307e 100644 --- a/docs/source/en/using-diffusers/controlnet.md +++ b/docs/source/en/using-diffusers/controlnet.md @@ -1,3 +1,15 @@ + + # ControlNet ControlNet is a type of model for controlling image diffusion models by conditioning the model with an additional input image. There are many types of conditioning inputs (canny edge, user sketching, human pose, depth, and more) you can use to control a diffusion model. This is hugely useful because it affords you greater control over image generation, making it easier to generate specific images without experimenting with different text prompts or denoising values as much. diff --git a/docs/source/en/using-diffusers/diffedit.md b/docs/source/en/using-diffusers/diffedit.md index 4c32eb4c482b..1c4a347e7396 100644 --- a/docs/source/en/using-diffusers/diffedit.md +++ b/docs/source/en/using-diffusers/diffedit.md @@ -1,3 +1,15 @@ + + # DiffEdit [[open-in-colab]] diff --git a/docs/source/en/using-diffusers/distilled_sd.md b/docs/source/en/using-diffusers/distilled_sd.md index 7653300b92ab..2dd96d98861d 100644 --- a/docs/source/en/using-diffusers/distilled_sd.md +++ b/docs/source/en/using-diffusers/distilled_sd.md @@ -1,3 +1,15 @@ + + # Distilled Stable Diffusion inference [[open-in-colab]] diff --git a/docs/source/en/using-diffusers/freeu.md b/docs/source/en/using-diffusers/freeu.md index 6c23ec754382..4f3c64096705 100644 --- a/docs/source/en/using-diffusers/freeu.md +++ b/docs/source/en/using-diffusers/freeu.md @@ -1,3 +1,15 @@ + + # Improve generation quality with FreeU [[open-in-colab]] diff --git a/docs/source/en/using-diffusers/sdxl.md b/docs/source/en/using-diffusers/sdxl.md index 36286ecad863..1016c57ca0ec 100644 --- a/docs/source/en/using-diffusers/sdxl.md +++ b/docs/source/en/using-diffusers/sdxl.md @@ -1,3 +1,15 @@ + + # Stable Diffusion XL [[open-in-colab]] diff --git a/docs/source/en/using-diffusers/shap-e.md b/docs/source/en/using-diffusers/shap-e.md index 68542bf56773..b5ba7923049d 100644 --- a/docs/source/en/using-diffusers/shap-e.md +++ b/docs/source/en/using-diffusers/shap-e.md @@ -1,3 +1,15 @@ + + # Shap-E [[open-in-colab]] diff --git a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md index d62ce0bf91bf..9cf82907180c 100644 --- a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md +++ b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md @@ -1,3 +1,15 @@ + + # JAX/Flax [[open-in-colab]] diff --git a/docs/source/en/using-diffusers/textual_inversion_inference.md b/docs/source/en/using-diffusers/textual_inversion_inference.md index 821b8ec6745a..6e690c62f76a 100644 --- a/docs/source/en/using-diffusers/textual_inversion_inference.md +++ b/docs/source/en/using-diffusers/textual_inversion_inference.md @@ -1,3 +1,15 @@ + + # Textual inversion [[open-in-colab]] From dbfcadb13020d6eed6c87dba064e0a7a54b6cb5f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=2E=20Tolga=20Cang=C3=B6z?= Date: Mon, 30 Oct 2023 20:04:16 +0300 Subject: [PATCH 2/5] Fix typos, improve, update --- docs/source/en/api/pipelines/deepfloyd_if.md | 4 ++-- .../en/api/pipelines/paint_by_example.md | 2 +- .../stable_diffusion/ldm3d_diffusion.md | 4 ++-- docs/source/en/api/pipelines/unclip.md | 6 ++--- docs/source/en/optimization/opt_overview.md | 4 ++-- .../en/using-diffusers/pipeline_overview.md | 2 +- examples/README.md | 2 +- examples/textual_inversion/README.md | 24 +++++++++++-------- 8 files changed, 26 insertions(+), 22 deletions(-) diff --git a/docs/source/en/api/pipelines/deepfloyd_if.md b/docs/source/en/api/pipelines/deepfloyd_if.md index 7769b71d38dc..01c5c0281038 100644 --- a/docs/source/en/api/pipelines/deepfloyd_if.md +++ b/docs/source/en/api/pipelines/deepfloyd_if.md @@ -53,8 +53,8 @@ pip install diffusers accelerate transformers safetensors The following sections give more in-detail examples of how to use IF. Specifically: -- [Text-to-Image Generation](#text-to-image-generation) -- [Image-to-Image Generation](#text-guided-image-to-image-generation) +- [Text-to-Image Generation](#texttoimage-generation) +- [Image-to-Image Generation](#text-guided-imagetoimage-generation) - [Inpainting](#text-guided-inpainting-generation) - [Reusing model weights](#converting-between-different-pipelines) - [Speed optimization](#optimizing-for-speed) diff --git a/docs/source/en/api/pipelines/paint_by_example.md b/docs/source/en/api/pipelines/paint_by_example.md index a6d3c255e3dd..d04a378a09d3 100644 --- a/docs/source/en/api/pipelines/paint_by_example.md +++ b/docs/source/en/api/pipelines/paint_by_example.md @@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# PaintByExample +# Paint By Example [Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://huggingface.co/papers/2211.13227) is by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen. diff --git a/docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md b/docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md index 9d70ab4f88e6..da868c6a0393 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md +++ b/docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md @@ -10,9 +10,9 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# Text-to-(RGB, depth) +# Text-to-(RGB, Depth) -LDM3D was proposed in [LDM3D: Latent Diffusion Model for 3D](https://huggingface.co/papers/2305.10853) by Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, and Vasudev Lal. LDM3D generates an image and a depth map from a given text prompt unlike the existing text-to-image diffusion models such as [Stable Diffusion](./stable_diffusion/overview) which only generates an image. With almost the same number of parameters, LDM3D achieves to create a latent space that can compress both the RGB images and the depth maps. +LDM3D was proposed in [LDM3D: Latent Diffusion Model for 3D](https://huggingface.co/papers/2305.10853) by Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, and Vasudev Lal. LDM3D generates an image and a depth map from a given text prompt unlike the existing text-to-image diffusion models such as [Stable Diffusion](./overview) which only generates an image. With almost the same number of parameters, LDM3D achieves to create a latent space that can compress both the RGB images and the depth maps. The abstract from the paper is: diff --git a/docs/source/en/api/pipelines/unclip.md b/docs/source/en/api/pipelines/unclip.md index 74258b7f7026..0cb5dc54dc29 100644 --- a/docs/source/en/api/pipelines/unclip.md +++ b/docs/source/en/api/pipelines/unclip.md @@ -7,9 +7,9 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# UnCLIP +# unCLIP -[Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) is by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. The UnCLIP model in 🤗 Diffusers comes from kakaobrain's [karlo]((https://github.com/kakaobrain/karlo)). +[Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) is by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. The unCLIP model in 🤗 Diffusers comes from kakaobrain's [karlo]((https://github.com/kakaobrain/karlo)). The abstract from the paper is following: @@ -34,4 +34,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) - __call__ ## ImagePipelineOutput -[[autodoc]] pipelines.ImagePipelineOutput \ No newline at end of file +[[autodoc]] pipelines.ImagePipelineOutput diff --git a/docs/source/en/optimization/opt_overview.md b/docs/source/en/optimization/opt_overview.md index 1f809bb011ce..40c75008677c 100644 --- a/docs/source/en/optimization/opt_overview.md +++ b/docs/source/en/optimization/opt_overview.md @@ -12,6 +12,6 @@ specific language governing permissions and limitations under the License. # Overview -Generating high-quality outputs is computationally intensive, especially during each iterative step where you go from a noisy output to a less noisy output. One of 🤗 Diffuser's goal is to make this technology widely accessible to everyone, which includes enabling fast inference on consumer and specialized hardware. +Generating high-quality outputs is computationally intensive, especially during each iterative step where you go from a noisy output to a less noisy output. One of 🤗 Diffuser's goals is to make this technology widely accessible to everyone, which includes enabling fast inference on consumer and specialized hardware. -This section will cover tips and tricks - like half-precision weights and sliced attention - for optimizing inference speed and reducing memory-consumption. You'll also learn how to speed up your PyTorch code with [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) or [ONNX Runtime](https://onnxruntime.ai/docs/), and enable memory-efficient attention with [xFormers](https://facebookresearch.github.io/xformers/). There are also guides for running inference on specific hardware like Apple Silicon, and Intel or Habana processors. \ No newline at end of file +This section will cover tips and tricks - like half-precision weights and sliced attention - for optimizing inference speed and reducing memory-consumption. You'll also learn how to speed up your PyTorch code with [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) or [ONNX Runtime](https://onnxruntime.ai/docs/), and enable memory-efficient attention with [xFormers](https://facebookresearch.github.io/xformers/). There are also guides for running inference on specific hardware like Apple Silicon, Intel, or Habana processors. diff --git a/docs/source/en/using-diffusers/pipeline_overview.md b/docs/source/en/using-diffusers/pipeline_overview.md index 6d3ee7cc61ce..292ce51d322a 100644 --- a/docs/source/en/using-diffusers/pipeline_overview.md +++ b/docs/source/en/using-diffusers/pipeline_overview.md @@ -14,4 +14,4 @@ specific language governing permissions and limitations under the License. A pipeline is an end-to-end class that provides a quick and easy way to use a diffusion system for inference by bundling independently trained models and schedulers together. Certain combinations of models and schedulers define specific pipeline types, like [`StableDiffusionXLPipeline`] or [`StableDiffusionControlNetPipeline`], with specific capabilities. All pipeline types inherit from the base [`DiffusionPipeline`] class; pass it any checkpoint, and it'll automatically detect the pipeline type and load the necessary components. -This section demonstrates how to use specific pipelines such as Stable Diffusion XL, ControlNet, and DiffEdit. You'll also learn how to use a distilled version of the Stable Diffusion model to speed up inference, how to create reproducible pipelines, and how to use and contribute community pipelines. \ No newline at end of file +This section demonstrates how to use specific pipelines such as Stable Diffusion XL, ControlNet, and DiffEdit. You'll also learn how to use a distilled version of the Stable Diffusion model to speed up inference, how to create reproducible pipelines, and how to use and contribute community pipelines. diff --git a/examples/README.md b/examples/README.md index 9566e68fc51d..f0d8a6bb57f0 100644 --- a/examples/README.md +++ b/examples/README.md @@ -19,7 +19,7 @@ Diffusers examples are a collection of scripts to demonstrate how to effectively for a variety of use cases involving training or fine-tuning. **Note**: If you are looking for **official** examples on how to use `diffusers` for inference, -please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines) +please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines). Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. More specifically, this means: diff --git a/examples/textual_inversion/README.md b/examples/textual_inversion/README.md index 21bca526b5d2..0a1d8a459fc6 100644 --- a/examples/textual_inversion/README.md +++ b/examples/textual_inversion/README.md @@ -25,12 +25,12 @@ cd diffusers pip install . ``` -Then cd in the example folder and run +Then cd in the example folder and run: ```bash pip install -r requirements.txt ``` -And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: +And initialize an [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with: ```bash accelerate config @@ -56,7 +56,7 @@ snapshot_download("diffusers/cat_toy_example", local_dir=local_dir, repo_type="d ``` This will be our training data. -Now we can launch the training using +Now we can launch the training using: **___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___** @@ -68,12 +68,14 @@ accelerate launch textual_inversion.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --train_data_dir=$DATA_DIR \ --learnable_property="object" \ - --placeholder_token="" --initializer_token="toy" \ + --placeholder_token="" \ + --initializer_token="toy" \ --resolution=512 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --max_train_steps=3000 \ - --learning_rate=5.0e-04 --scale_lr \ + --learning_rate=5.0e-04 \ + --scale_lr \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --push_to_hub \ @@ -85,10 +87,10 @@ A full training run takes ~1 hour on one V100 GPU. **Note**: As described in [the official paper](https://arxiv.org/abs/2208.01618) only one embedding vector is used for the placeholder token, *e.g.* `""`. However, one can also add multiple embedding vectors for the placeholder token -to inclease the number of fine-tuneable parameters. This can help the model to learn -more complex details. To use multiple embedding vectors, you can should define `--num_vectors` +to increase the number of fine-tuneable parameters. This can help the model to learn +more complex details. To use multiple embedding vectors, you should define `--num_vectors` to a number larger than one, *e.g.*: -``` +```bash --num_vectors 5 ``` @@ -131,11 +133,13 @@ python textual_inversion_flax.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --train_data_dir=$DATA_DIR \ --learnable_property="object" \ - --placeholder_token="" --initializer_token="toy" \ + --placeholder_token="" \ + --initializer_token="toy" \ --resolution=512 \ --train_batch_size=1 \ --max_train_steps=3000 \ - --learning_rate=5.0e-04 --scale_lr \ + --learning_rate=5.0e-04 \ + --scale_lr \ --output_dir="textual_inversion_cat" ``` It should be at least 70% faster than the PyTorch script with the same configuration. From 90a521f8df1e39ade8c72bc6f7b6c338d39d769e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=2E=20Tolga=20Cang=C3=B6z?= <46008593+standardAI@users.noreply.github.com> Date: Tue, 31 Oct 2023 08:47:18 +0300 Subject: [PATCH 3/5] Update deepfloyd_if.md --- docs/source/en/api/pipelines/deepfloyd_if.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/en/api/pipelines/deepfloyd_if.md b/docs/source/en/api/pipelines/deepfloyd_if.md index 01c5c0281038..7769b71d38dc 100644 --- a/docs/source/en/api/pipelines/deepfloyd_if.md +++ b/docs/source/en/api/pipelines/deepfloyd_if.md @@ -53,8 +53,8 @@ pip install diffusers accelerate transformers safetensors The following sections give more in-detail examples of how to use IF. Specifically: -- [Text-to-Image Generation](#texttoimage-generation) -- [Image-to-Image Generation](#text-guided-imagetoimage-generation) +- [Text-to-Image Generation](#text-to-image-generation) +- [Image-to-Image Generation](#text-guided-image-to-image-generation) - [Inpainting](#text-guided-inpainting-generation) - [Reusing model weights](#converting-between-different-pipelines) - [Speed optimization](#optimizing-for-speed) From f8e399f32c9adde6cb9cf68fb76ca8d72180afa0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=2E=20Tolga=20Cang=C3=B6z?= <46008593+standardAI@users.noreply.github.com> Date: Tue, 31 Oct 2023 08:49:34 +0300 Subject: [PATCH 4/5] Update ldm3d_diffusion.md --- .../source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md b/docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md index da868c6a0393..2e489c0eeb7c 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md +++ b/docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md @@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# Text-to-(RGB, Depth) +# Text-to-(RGB, depth) LDM3D was proposed in [LDM3D: Latent Diffusion Model for 3D](https://huggingface.co/papers/2305.10853) by Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, and Vasudev Lal. LDM3D generates an image and a depth map from a given text prompt unlike the existing text-to-image diffusion models such as [Stable Diffusion](./overview) which only generates an image. With almost the same number of parameters, LDM3D achieves to create a latent space that can compress both the RGB images and the depth maps. From d3870a110ad6a3039f1c43b6135237932cda1035 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=2E=20Tolga=20Cang=C3=B6z?= <46008593+standardAI@users.noreply.github.com> Date: Tue, 31 Oct 2023 08:58:36 +0300 Subject: [PATCH 5/5] Update opt_overview.md --- docs/source/en/optimization/opt_overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/optimization/opt_overview.md b/docs/source/en/optimization/opt_overview.md index 40c75008677c..3a458291ce5b 100644 --- a/docs/source/en/optimization/opt_overview.md +++ b/docs/source/en/optimization/opt_overview.md @@ -14,4 +14,4 @@ specific language governing permissions and limitations under the License. Generating high-quality outputs is computationally intensive, especially during each iterative step where you go from a noisy output to a less noisy output. One of 🤗 Diffuser's goals is to make this technology widely accessible to everyone, which includes enabling fast inference on consumer and specialized hardware. -This section will cover tips and tricks - like half-precision weights and sliced attention - for optimizing inference speed and reducing memory-consumption. You'll also learn how to speed up your PyTorch code with [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) or [ONNX Runtime](https://onnxruntime.ai/docs/), and enable memory-efficient attention with [xFormers](https://facebookresearch.github.io/xformers/). There are also guides for running inference on specific hardware like Apple Silicon, Intel, or Habana processors. +This section will cover tips and tricks - like half-precision weights and sliced attention - for optimizing inference speed and reducing memory-consumption. You'll also learn how to speed up your PyTorch code with [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) or [ONNX Runtime](https://onnxruntime.ai/docs/), and enable memory-efficient attention with [xFormers](https://facebookresearch.github.io/xformers/). There are also guides for running inference on specific hardware like Apple Silicon, and Intel or Habana processors.