diff --git a/docs/source/en/tutorials/autopipeline.md b/docs/source/en/tutorials/autopipeline.md index fcc6f5300eab..4f17760e8bc0 100644 --- a/docs/source/en/tutorials/autopipeline.md +++ b/docs/source/en/tutorials/autopipeline.md @@ -50,7 +50,7 @@ Under the hood, [`AutoPipelineForText2Image`]: 1. automatically detects a `"stable-diffusion"` class from the [`model_index.json`](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json) file 2. loads the corresponding text-to-image [`StableDiffusionPipeline`] based on the `"stable-diffusion"` class name -Likewise, for image-to-image, [`AutoPipelineForImage2Image`] detects a `"stable-diffusion"` checkpoint from the `model_index.json` file and it'll load the corresponding [`StableDiffusionImg2ImgPipeline`] behind the scenes. You can also pass any additional arguments specific to the pipeline class such as `strength`, which determines the amount of noise or variation added to an input image: +Likewise, for image-to-image, [`AutoPipelineForImage2Image`] detects a `"stable-diffusion"` checkpoint from the `model_index.json` file and it'll load the corresponding [`StableDiffusionImg2ImgPipeline`] behind the scenes. You can also pass any additional arguments specific to the pipeline class such as `strength`, which determines the amount of noise or variation added to an input image: ```py from diffusers import AutoPipelineForImage2Image diff --git a/docs/source/en/tutorials/tutorial_overview.md b/docs/source/en/tutorials/tutorial_overview.md index 85c30256ec89..ee7a49e43851 100644 --- a/docs/source/en/tutorials/tutorial_overview.md +++ b/docs/source/en/tutorials/tutorial_overview.md @@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. # Overview -Welcome to 🧨 Diffusers! If you're new to diffusion models and generative AI, and want to learn more, then you've come to the right place. These beginner-friendly tutorials are designed to provide a gentle introduction to diffusion models and help you understand the library fundamentals - the core components and how 🧨 Diffusers is meant to be used. +Welcome to 🧨 Diffusers! If you're new to diffusion models and generative AI, and want to learn more, then you've come to the right place. These beginner-friendly tutorials are designed to provide a gentle introduction to diffusion models and help you understand the library fundamentals - the core components and how 🧨 Diffusers is meant to be used. You'll learn how to use a pipeline for inference to rapidly generate things, and then deconstruct that pipeline to really understand how to use the library as a modular toolbox for building your own diffusion systems. In the next lesson, you'll learn how to train your own diffusion model to generate what you want. diff --git a/docs/source/en/tutorials/using_peft_for_inference.md b/docs/source/en/tutorials/using_peft_for_inference.md index da69b712a989..6f317a7610b2 100644 --- a/docs/source/en/tutorials/using_peft_for_inference.md +++ b/docs/source/en/tutorials/using_peft_for_inference.md @@ -58,7 +58,7 @@ image ``` ![toy-face](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_8_1.png) - + With the `adapter_name` parameter, it is really easy to use another adapter for inference! Load the [nerijs/pixel-art-xl](https://huggingface.co/nerijs/pixel-art-xl) adapter that has been fine-tuned to generate pixel art images, and let's call it `"pixel"`. @@ -80,7 +80,7 @@ image ``` ![pixel-art](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_12_1.png) - + ## Combine multiple adapters You can also perform multi-adapter inference where you combine different adapter checkpoints for inference. @@ -112,7 +112,7 @@ image ``` ![toy-face-pixel-art](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_16_1.png) - + Impressive! As you can see, the model was able to generate an image that mixes the characteristics of both adapters. If you want to go back to using only one adapter, use the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method to activate the `"toy"` adapter: diff --git a/docs/source/en/using-diffusers/conditional_image_generation.md b/docs/source/en/using-diffusers/conditional_image_generation.md index 9832f53cffe6..eaca038d59fe 100644 --- a/docs/source/en/using-diffusers/conditional_image_generation.md +++ b/docs/source/en/using-diffusers/conditional_image_generation.md @@ -226,7 +226,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16 ).to("cuda") image = pipeline( - prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", + prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy", ).images[0] image @@ -258,7 +258,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained( ).to("cuda") generator = torch.Generator(device="cuda").manual_seed(30) image = pipeline( - "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", + "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator, ).images[0] image diff --git a/docs/source/en/using-diffusers/contribute_pipeline.md b/docs/source/en/using-diffusers/contribute_pipeline.md index 15b4b20ab34a..09d5e7525784 100644 --- a/docs/source/en/using-diffusers/contribute_pipeline.md +++ b/docs/source/en/using-diffusers/contribute_pipeline.md @@ -49,7 +49,7 @@ To ensure your pipeline and its components (`unet` and `scheduler`) can be saved + self.register_modules(unet=unet, scheduler=scheduler) ``` -Cool, the `__init__` step is done and you can move to the forward pass now! 🔥 +Cool, the `__init__` step is done and you can move to the forward pass now! 🔥 ## Define the forward pass diff --git a/docs/source/en/using-diffusers/controlnet.md b/docs/source/en/using-diffusers/controlnet.md index 71fd3c7a307e..6272d746d923 100644 --- a/docs/source/en/using-diffusers/controlnet.md +++ b/docs/source/en/using-diffusers/controlnet.md @@ -86,7 +86,7 @@ import torch controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16, use_safetensors=True) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, use_safetensors=True -).to("cuda") +) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload() @@ -146,7 +146,7 @@ import torch controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11f1p_sd15_depth", torch_dtype=torch.float16, use_safetensors=True) pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, use_safetensors=True -).to("cuda") +) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload() @@ -231,7 +231,7 @@ import torch controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16, use_safetensors=True) pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, use_safetensors=True -).to("cuda") +) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload() diff --git a/docs/source/en/using-diffusers/custom_pipeline_examples.md b/docs/source/en/using-diffusers/custom_pipeline_examples.md index 555292568349..e13d7f98a356 100644 --- a/docs/source/en/using-diffusers/custom_pipeline_examples.md +++ b/docs/source/en/using-diffusers/custom_pipeline_examples.md @@ -74,7 +74,7 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained( diffuser_pipeline.enable_attention_slicing() diffuser_pipeline = diffuser_pipeline.to(device) -prompt = ["a photograph of an astronaut riding a horse", +prompt = ["a photograph of an astronaut riding a horse", "Una casa en la playa", "Ein Hund, der Orange isst", "Un restaurant parisien"] diff --git a/docs/source/en/using-diffusers/custom_pipeline_overview.md b/docs/source/en/using-diffusers/custom_pipeline_overview.md index 10627d3163d8..f898bd0dc205 100644 --- a/docs/source/en/using-diffusers/custom_pipeline_overview.md +++ b/docs/source/en/using-diffusers/custom_pipeline_overview.md @@ -117,10 +117,10 @@ from pipeline_t2v_base_pixel import TextToVideoIFPipeline import torch pipeline = TextToVideoIFPipeline( - unet=unet, - text_encoder=text_encoder, - tokenizer=tokenizer, - scheduler=scheduler, + unet=unet, + text_encoder=text_encoder, + tokenizer=tokenizer, + scheduler=scheduler, feature_extractor=feature_extractor ) pipeline = pipeline.to(device="cuda") diff --git a/docs/source/en/using-diffusers/diffedit.md b/docs/source/en/using-diffusers/diffedit.md index 1c4a347e7396..049e17739c32 100644 --- a/docs/source/en/using-diffusers/diffedit.md +++ b/docs/source/en/using-diffusers/diffedit.md @@ -160,12 +160,12 @@ Check out the [generation strategy](https://huggingface.co/docs/transformers/mai Load the text encoder model used by the [`StableDiffusionDiffEditPipeline`] to encode the text. You'll use the text encoder to compute the text embeddings: ```py -import torch -from diffusers import StableDiffusionDiffEditPipeline +import torch +from diffusers import StableDiffusionDiffEditPipeline pipeline = StableDiffusionDiffEditPipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() pipeline.enable_vae_slicing() diff --git a/docs/source/en/using-diffusers/freeu.md b/docs/source/en/using-diffusers/freeu.md index c5f3577ae3aa..6e8f5773cd75 100644 --- a/docs/source/en/using-diffusers/freeu.md +++ b/docs/source/en/using-diffusers/freeu.md @@ -14,12 +14,12 @@ specific language governing permissions and limitations under the License. [[open-in-colab]] -The UNet is responsible for denoising during the reverse diffusion process, and there are two distinct features in its architecture: +The UNet is responsible for denoising during the reverse diffusion process, and there are two distinct features in its architecture: 1. Backbone features primarily contribute to the denoising process 2. Skip features mainly introduce high-frequency features into the decoder module and can make the network overlook the semantics in the backbone features -However, the skip connection can sometimes introduce unnatural image details. [FreeU](https://hf.co/papers/2309.11497) is a technique for improving image quality by rebalancing the contributions from the UNet’s skip connections and backbone feature maps. +However, the skip connection can sometimes introduce unnatural image details. [FreeU](https://hf.co/papers/2309.11497) is a technique for improving image quality by rebalancing the contributions from the UNet’s skip connections and backbone feature maps. FreeU is applied during inference and it does not require any additional training. The technique works for different tasks such as text-to-image, image-to-image, and text-to-video. @@ -27,11 +27,11 @@ In this guide, you will apply FreeU to the [`StableDiffusionPipeline`], [`Stable ## StableDiffusionPipeline -Load the pipeline: +Load the pipeline: ```py from diffusers import DiffusionPipeline -import torch +import torch pipeline = DiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, safety_checker=None @@ -70,7 +70,7 @@ Let's see how Stable Diffusion 2 results are impacted: ```py from diffusers import DiffusionPipeline -import torch +import torch pipeline = DiffusionPipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, safety_checker=None @@ -92,7 +92,7 @@ Finally, let's take a look at how FreeU affects Stable Diffusion XL results: ```py from diffusers import DiffusionPipeline -import torch +import torch pipeline = DiffusionPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, diff --git a/docs/source/en/using-diffusers/img2img.md b/docs/source/en/using-diffusers/img2img.md index 5caba021f39e..6014d87b7906 100644 --- a/docs/source/en/using-diffusers/img2img.md +++ b/docs/source/en/using-diffusers/img2img.md @@ -27,7 +27,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForImage2Image.from_pretrained( "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -79,7 +79,7 @@ from diffusers.utils import make_image_grid, load_image pipeline = AutoPipelineForImage2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -117,7 +117,7 @@ from diffusers.utils import make_image_grid, load_image pipeline = AutoPipelineForImage2Image.from_pretrained( "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -157,7 +157,7 @@ from diffusers.utils import make_image_grid, load_image pipeline = AutoPipelineForImage2Image.from_pretrained( "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -204,7 +204,7 @@ from diffusers.utils import make_image_grid, load_image pipeline = AutoPipelineForImage2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -248,7 +248,7 @@ from diffusers.utils import make_image_grid, load_image pipeline = AutoPipelineForImage2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -290,7 +290,7 @@ from diffusers.utils import make_image_grid, load_image pipeline = AutoPipelineForImage2Image.from_pretrained( "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -335,7 +335,7 @@ from diffusers.utils import make_image_grid pipeline = AutoPipelineForText2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -349,7 +349,7 @@ Now you can pass this generated image to the image-to-image pipeline: ```py pipeline = AutoPipelineForImage2Image.from_pretrained( "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -371,7 +371,7 @@ from diffusers.utils import make_image_grid, load_image pipeline = AutoPipelineForImage2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -397,7 +397,7 @@ Pass the latent output from this pipeline to the next pipeline to generate an im ```py pipeline = AutoPipelineForImage2Image.from_pretrained( "ogkalu/Comic-Diffusion", torch_dtype=torch.float16 -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -411,7 +411,7 @@ Repeat one more time to generate the final image in a [pixel art style](https:// ```py pipeline = AutoPipelineForImage2Image.from_pretrained( "kohbanye/pixel-art-style", torch_dtype=torch.float16 -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -434,7 +434,7 @@ from diffusers.utils import make_image_grid, load_image pipeline = AutoPipelineForImage2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -462,7 +462,7 @@ from diffusers import StableDiffusionLatentUpscalePipeline upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained( "stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) upscaler.enable_model_cpu_offload() upscaler.enable_xformers_memory_efficient_attention() @@ -476,7 +476,7 @@ from diffusers import StableDiffusionUpscalePipeline super_res = StableDiffusionUpscalePipeline.from_pretrained( "stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) super_res.enable_model_cpu_offload() super_res.enable_xformers_memory_efficient_attention() @@ -500,7 +500,7 @@ import torch pipeline = AutoPipelineForImage2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -537,7 +537,7 @@ import torch controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11f1p_sd15_depth", torch_dtype=torch.float16, variant="fp16", use_safetensors=True) pipeline = AutoPipelineForImage2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -571,7 +571,7 @@ Let's apply a new [style](https://huggingface.co/nitrosocke/elden-ring-diffusion ```py pipeline = AutoPipelineForImage2Image.from_pretrained( "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16, -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() diff --git a/docs/source/en/using-diffusers/inpaint.md b/docs/source/en/using-diffusers/inpaint.md index abdfbffb908b..e6b1010f13b0 100644 --- a/docs/source/en/using-diffusers/inpaint.md +++ b/docs/source/en/using-diffusers/inpaint.md @@ -27,7 +27,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForInpainting.from_pretrained( "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16 -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -98,7 +98,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForInpainting.from_pretrained( "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -124,7 +124,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForInpainting.from_pretrained( "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16" -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -150,7 +150,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForInpainting.from_pretrained( "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16 -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -379,7 +379,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForInpainting.from_pretrained( "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -424,7 +424,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForInpainting.from_pretrained( "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -464,7 +464,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForInpainting.from_pretrained( "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -503,7 +503,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForText2Image.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -522,7 +522,7 @@ And let's inpaint the masked area with a waterfall: ```py pipeline = AutoPipelineForInpainting.from_pretrained( "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16 -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -556,7 +556,7 @@ from diffusers.utils import load_image, make_image_grid pipeline = AutoPipelineForInpainting.from_pretrained( "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -577,7 +577,7 @@ Now let's pass the image to another inpainting pipeline with SDXL's refiner mode ```py pipeline = AutoPipelineForInpainting.from_pretrained( "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16" -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -636,7 +636,7 @@ from diffusers.utils import make_image_grid pipeline = AutoPipelineForInpainting.from_pretrained( "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -667,7 +667,7 @@ controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_inpai # pass ControlNet to the pipeline pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained( "runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16" -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() @@ -705,7 +705,7 @@ from diffusers import AutoPipelineForImage2Image pipeline = AutoPipelineForImage2Image.from_pretrained( "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16, -).to("cuda") +) pipeline.enable_model_cpu_offload() # remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed pipeline.enable_xformers_memory_efficient_attention() diff --git a/docs/source/en/using-diffusers/kandinsky.md b/docs/source/en/using-diffusers/kandinsky.md index 4ca544270766..a7c15eaf1e6e 100644 --- a/docs/source/en/using-diffusers/kandinsky.md +++ b/docs/source/en/using-diffusers/kandinsky.md @@ -1,3 +1,15 @@ + + # Kandinsky [[open-in-colab]] @@ -91,7 +103,7 @@ Use the [`AutoPipelineForText2Image`] to automatically call the combined pipelin from diffusers import AutoPipelineForText2Image import torch -pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16).to("cuda") +pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16) pipeline.enable_model_cpu_offload() prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" @@ -107,7 +119,7 @@ image = pipeline(prompt=prompt, negative_prompt=negative_prompt, prior_guidance_ from diffusers import AutoPipelineForText2Image import torch -pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16).to("cuda") +pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16) pipeline.enable_model_cpu_offload() prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" @@ -217,14 +229,14 @@ from io import BytesIO from PIL import Image import os -pipeline = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda") +pipeline = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True) pipeline.enable_model_cpu_offload() prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality" url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" - + response = requests.get(url) original_image = Image.open(BytesIO(response.content)).convert("RGB") original_image.thumbnail((768, 768)) @@ -243,14 +255,14 @@ from io import BytesIO from PIL import Image import os -pipeline = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16).to("cuda") +pipeline = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16) pipeline.enable_model_cpu_offload() prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality" url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" - + response = requests.get(url) original_image = Image.open(BytesIO(response.content)).convert("RGB") original_image.thumbnail((768, 768)) diff --git a/docs/source/en/using-diffusers/lcm.md b/docs/source/en/using-diffusers/lcm.md index 39bc2426a92b..f793569cb8fc 100644 --- a/docs/source/en/using-diffusers/lcm.md +++ b/docs/source/en/using-diffusers/lcm.md @@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. # Performing inference with LCM -Latent Consistency Models (LCM) enable quality image generation in typically 2-4 steps making it possible to use diffusion models in almost real-time settings. +Latent Consistency Models (LCM) enable quality image generation in typically 2-4 steps making it possible to use diffusion models in almost real-time settings. From the [official website](https://latent-consistency-models.github.io/): @@ -59,7 +59,7 @@ Some details to keep in mind: ## Image-to-image -The findings above apply to image-to-image tasks too. Let's look at how we can perform image-to-image generation with LCMs: +The findings above apply to image-to-image tasks too. Let's look at how we can perform image-to-image generation with LCMs: ```python from diffusers import AutoPipelineForImage2Image, UNet2DConditionModel, LCMScheduler @@ -96,8 +96,8 @@ image = pipe( It is possible to generalize the LCM framework to use with [LoRA](../training/lora.md). It effectively eliminates the need to conduct expensive fine-tuning runs as LoRA training concerns just a few number of parameters compared to full fine-tuning. During inference, the [`LCMScheduler`] comes to the advantage as it enables very few-steps inference without compromising the quality. -We recommend to disable `guidance_scale` by setting it 0. The model is trained to follow prompts accurately -even without using guidance scale. You can however, still use guidance scale in which case we recommend +We recommend to disable `guidance_scale` by setting it 0. The model is trained to follow prompts accurately +even without using guidance scale. You can however, still use guidance scale in which case we recommend using values between 1.0 and 2.0. ### Text-to-image diff --git a/docs/source/en/using-diffusers/loading.md b/docs/source/en/using-diffusers/loading.md index 57348e849e6b..d9e19a5bdd2a 100644 --- a/docs/source/en/using-diffusers/loading.md +++ b/docs/source/en/using-diffusers/loading.md @@ -232,7 +232,7 @@ TODO(Patrick) - Make sure to uncomment this part as soon as things are deprecate #### Using `revision` to load pipeline variants is deprecated -Previously the `revision` argument of [`DiffusionPipeline.from_pretrained`] was heavily used to +Previously the `revision` argument of [`DiffusionPipeline.from_pretrained`] was heavily used to load model variants, e.g.: ```python @@ -247,7 +247,7 @@ The above example is therefore deprecated and won't be supported anymore for `di -If you load diffusers pipelines or models with `revision="fp16"` or `revision="non_ema"`, +If you load diffusers pipelines or models with `revision="fp16"` or `revision="non_ema"`, please make sure to update the code and use `variant="fp16"` or `variation="non_ema"` respectively instead. diff --git a/docs/source/en/using-diffusers/loading_adapters.md b/docs/source/en/using-diffusers/loading_adapters.md index 8f6bf85da318..e73e042bd4d5 100644 --- a/docs/source/en/using-diffusers/loading_adapters.md +++ b/docs/source/en/using-diffusers/loading_adapters.md @@ -189,7 +189,7 @@ pipeline = StableDiffusionXLPipeline.from_pretrained( ).to("cuda") ``` -Next, load the LoRA checkpoint and fuse it with the original weights. The `lora_scale` parameter controls how much to scale the output by with the LoRA weights. It is important to make the `lora_scale` adjustments in the [`~loaders.LoraLoaderMixin.fuse_lora`] method because it won't work if you try to pass `scale` to the `cross_attention_kwargs` in the pipeline. +Next, load the LoRA checkpoint and fuse it with the original weights. The `lora_scale` parameter controls how much to scale the output by with the LoRA weights. It is important to make the `lora_scale` adjustments in the [`~loaders.LoraLoaderMixin.fuse_lora`] method because it won't work if you try to pass `scale` to the `cross_attention_kwargs` in the pipeline. If you need to reset the original model weights for any reason (use a different `lora_scale`), you should use the [`~loaders.LoraLoaderMixin.unfuse_lora`] method. diff --git a/docs/source/en/using-diffusers/other-formats.md b/docs/source/en/using-diffusers/other-formats.md index 84945a6da87a..6f8e00d1e396 100644 --- a/docs/source/en/using-diffusers/other-formats.md +++ b/docs/source/en/using-diffusers/other-formats.md @@ -34,11 +34,11 @@ There are two options for converting a `.ckpt` file: use a Space to convert the The easiest and most convenient way to convert a `.ckpt` file is to use the [SD to Diffusers](https://huggingface.co/spaces/diffusers/sd-to-diffusers) Space. You can follow the instructions on the Space to convert the `.ckpt` file. -This approach works well for basic models, but it may struggle with more customized models. You'll know the Space failed if it returns an empty pull request or error. In this case, you can try converting the `.ckpt` file with a script. +This approach works well for basic models, but it may struggle with more customized models. You'll know the Space failed if it returns an empty pull request or error. In this case, you can try converting the `.ckpt` file with a script. ### Convert with a script -🤗 Diffusers provides a [conversion script](https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py) for converting `.ckpt` files. This approach is more reliable than the Space above. +🤗 Diffusers provides a [conversion script](https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py) for converting `.ckpt` files. This approach is more reliable than the Space above. Before you start, make sure you have a local clone of 🤗 Diffusers to run the script and log in to your Hugging Face account so you can open pull requests and push your converted model to the Hub. @@ -86,11 +86,11 @@ git push origin pr/13:refs/pr/13 -🧪 This is an experimental feature. Only Stable Diffusion v1 checkpoints are supported by the Convert KerasCV Space at the moment. +🧪 This is an experimental feature. Only Stable Diffusion v1 checkpoints are supported by the Convert KerasCV Space at the moment. -[KerasCV](https://keras.io/keras_cv/) supports training for [Stable Diffusion](https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/stable_diffusion) v1 and v2. However, it offers limited support for experimenting with Stable Diffusion models for inference and deployment whereas 🤗 Diffusers has a more complete set of features for this purpose, such as different [noise schedulers](https://huggingface.co/docs/diffusers/using-diffusers/schedulers), [flash attention](https://huggingface.co/docs/diffusers/optimization/xformers), and [other +[KerasCV](https://keras.io/keras_cv/) supports training for [Stable Diffusion](https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/stable_diffusion) v1 and v2. However, it offers limited support for experimenting with Stable Diffusion models for inference and deployment whereas 🤗 Diffusers has a more complete set of features for this purpose, such as different [noise schedulers](https://huggingface.co/docs/diffusers/using-diffusers/schedulers), [flash attention](https://huggingface.co/docs/diffusers/optimization/xformers), and [other optimization techniques](https://huggingface.co/docs/diffusers/optimization/fp16). The [Convert KerasCV](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers) Space converts `.pb` or `.h5` files to PyTorch, and then wraps them in a [`StableDiffusionPipeline`] so it is ready for inference. The converted checkpoint is stored in a repository on the Hugging Face Hub. diff --git a/docs/source/en/using-diffusers/schedulers.md b/docs/source/en/using-diffusers/schedulers.md index 9a8dd29ec2ea..6b5d8da465d8 100644 --- a/docs/source/en/using-diffusers/schedulers.md +++ b/docs/source/en/using-diffusers/schedulers.md @@ -14,10 +14,10 @@ specific language governing permissions and limitations under the License. [[open-in-colab]] -Diffusion pipelines are inherently a collection of diffusion models and schedulers that are partly independent from each other. This means that one is able to switch out parts of the pipeline to better customize +Diffusion pipelines are inherently a collection of diffusion models and schedulers that are partly independent from each other. This means that one is able to switch out parts of the pipeline to better customize a pipeline to one's use case. The best example of this is the [Schedulers](../api/schedulers/overview). -Whereas diffusion models usually simply define the forward pass from noise to a less noisy sample, +Whereas diffusion models usually simply define the forward pass from noise to a less noisy sample, schedulers define the whole denoising process, *i.e.*: - How many denoising steps? - Stochastic or deterministic? @@ -77,7 +77,7 @@ PNDMScheduler { } ``` -We can see that the scheduler is of type [`PNDMScheduler`]. +We can see that the scheduler is of type [`PNDMScheduler`]. Cool, now let's compare the scheduler in its performance to other schedulers. First we define a prompt on which we will test all the different schedulers: @@ -102,7 +102,7 @@ image ## Changing the scheduler -Now we show how easy it is to change the scheduler of a pipeline. Every scheduler has a property [`~SchedulerMixin.compatibles`] +Now we show how easy it is to change the scheduler of a pipeline. Every scheduler has a property [`~SchedulerMixin.compatibles`] which defines all compatible schedulers. You can take a look at all available, compatible schedulers for the Stable Diffusion pipeline as follows. ```python @@ -127,7 +127,7 @@ pipeline.scheduler.compatibles diffusers.schedulers.scheduling_k_dpm_2_ancestral_discrete.KDPM2AncestralDiscreteScheduler] ``` -Cool, lots of schedulers to look at. Feel free to have a look at their respective class definitions: +Cool, lots of schedulers to look at. Feel free to have a look at their respective class definitions: - [`EulerDiscreteScheduler`], - [`LMSDiscreteScheduler`], @@ -143,7 +143,7 @@ Cool, lots of schedulers to look at. Feel free to have a look at their respectiv - [`DPMSolverSinglestepScheduler`], - [`KDPM2AncestralDiscreteScheduler`]. -We will now compare the input prompt with all other schedulers. To change the scheduler of the pipeline you can make use of the +We will now compare the input prompt with all other schedulers. To change the scheduler of the pipeline you can make use of the convenient [`~ConfigMixin.config`] property in combination with the [`~ConfigMixin.from_config`] function. ```python @@ -171,7 +171,7 @@ FrozenDict([('num_train_timesteps', 1000), ``` This configuration can then be used to instantiate a scheduler -of a different class that is compatible with the pipeline. Here, +of a different class that is compatible with the pipeline. Here, we change the scheduler to the [`DDIMScheduler`]. ```python @@ -198,7 +198,7 @@ If you are a JAX/Flax user, please check [this section](#changing-the-scheduler- ## Compare schedulers -So far we have tried running the stable diffusion pipeline with two schedulers: [`PNDMScheduler`] and [`DDIMScheduler`]. +So far we have tried running the stable diffusion pipeline with two schedulers: [`PNDMScheduler`] and [`DDIMScheduler`]. A number of better schedulers have been released that can be run with much fewer steps; let's compare them here: [`LMSDiscreteScheduler`] usually leads to better results: diff --git a/docs/source/en/using-diffusers/textual_inversion_inference.md b/docs/source/en/using-diffusers/textual_inversion_inference.md index 7583dee63e3b..084101c06ba3 100644 --- a/docs/source/en/using-diffusers/textual_inversion_inference.md +++ b/docs/source/en/using-diffusers/textual_inversion_inference.md @@ -95,7 +95,7 @@ state_dict ``` There are two tensors, `"clip_g"` and `"clip_l"`. -`"clip_g"` corresponds to the bigger text encoder in SDXL and refers to +`"clip_g"` corresponds to the bigger text encoder in SDXL and refers to `pipe.text_encoder_2` and `"clip_l"` refers to `pipe.text_encoder`. Now you can load each tensor separately by passing them along with the correct text encoder and tokenizer diff --git a/docs/source/en/using-diffusers/unconditional_image_generation.md b/docs/source/en/using-diffusers/unconditional_image_generation.md index c055bc75c5a4..1983f6981e8f 100644 --- a/docs/source/en/using-diffusers/unconditional_image_generation.md +++ b/docs/source/en/using-diffusers/unconditional_image_generation.md @@ -35,7 +35,7 @@ from diffusers import DiffusionPipeline generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128", use_safetensors=True) ``` -The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. +The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. You can move the generator object to a GPU, just like you would in PyTorch: diff --git a/docs/source/en/using-diffusers/weighted_prompts.md b/docs/source/en/using-diffusers/weighted_prompts.md index 5007d235ae99..947d18b86ec8 100644 --- a/docs/source/en/using-diffusers/weighted_prompts.md +++ b/docs/source/en/using-diffusers/weighted_prompts.md @@ -142,7 +142,7 @@ image ## Conjunction A conjunction diffuses each prompt independently and concatenates their results by their weighted sum. Add `.and()` to the end of a list of prompts to create a conjunction: - + ```py prompt_embeds = compel_proc('["a red cat", "playing with a", "ball"].and()') generator = torch.Generator(device="cuda").manual_seed(55) diff --git a/docs/source/en/using-diffusers/write_own_pipeline.md b/docs/source/en/using-diffusers/write_own_pipeline.md index 38fc9e6457dd..d3061e36fe63 100644 --- a/docs/source/en/using-diffusers/write_own_pipeline.md +++ b/docs/source/en/using-diffusers/write_own_pipeline.md @@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License. [[open-in-colab]] -🧨 Diffusers is designed to be a user-friendly and flexible toolbox for building diffusion systems tailored to your use-case. At the core of the toolbox are models and schedulers. While the [`DiffusionPipeline`] bundles these components together for convenience, you can also unbundle the pipeline and use the models and schedulers separately to create new diffusion systems. +🧨 Diffusers is designed to be a user-friendly and flexible toolbox for building diffusion systems tailored to your use-case. At the core of the toolbox are models and schedulers. While the [`DiffusionPipeline`] bundles these components together for convenience, you can also unbundle the pipeline and use the models and schedulers separately to create new diffusion systems. In this tutorial, you'll learn how to use models and schedulers to assemble a diffusion system for inference, starting with a basic pipeline and then progressing to the Stable Diffusion pipeline. @@ -36,7 +36,7 @@ A pipeline is a quick and easy way to run a model for inference, requiring no mo That was super easy, but how did the pipeline do that? Let's breakdown the pipeline and take a look at what's happening under the hood. -In the example above, the pipeline contains a [`UNet2DModel`] model and a [`DDPMScheduler`]. The pipeline denoises an image by taking random noise the size of the desired output and passing it through the model several times. At each timestep, the model predicts the *noise residual* and the scheduler uses it to predict a less noisy image. The pipeline repeats this process until it reaches the end of the specified number of inference steps. +In the example above, the pipeline contains a [`UNet2DModel`] model and a [`DDPMScheduler`]. The pipeline denoises an image by taking random noise the size of the desired output and passing it through the model several times. At each timestep, the model predicts the *noise residual* and the scheduler uses it to predict a less noisy image. The pipeline repeats this process until it reaches the end of the specified number of inference steps. To recreate the pipeline with the model and scheduler separately, let's write our own denoising process. @@ -153,7 +153,7 @@ To speed up inference, move the models to a GPU since, unlike the scheduler, the ### Create text embeddings -The next step is to tokenize the text to generate embeddings. The text is used to condition the UNet model and steer the diffusion process towards something that resembles the input prompt. +The next step is to tokenize the text to generate embeddings. The text is used to condition the UNet model and steer the diffusion process towards something that resembles the input prompt. @@ -284,7 +284,7 @@ Lastly, convert the image to a `PIL.Image` to see your generated image! ## Next steps -From basic to complex pipelines, you've seen that all you really need to write your own diffusion system is a denoising loop. The loop should set the scheduler's timesteps, iterate over them, and alternate between calling the UNet model to predict the noise residual and passing it to the scheduler to compute the previous noisy sample. +From basic to complex pipelines, you've seen that all you really need to write your own diffusion system is a denoising loop. The loop should set the scheduler's timesteps, iterate over them, and alternate between calling the UNet model to predict the noise residual and passing it to the scheduler to compute the previous noisy sample. This is really what 🧨 Diffusers is designed for: to make it intuitive and easy to write your own diffusion system using models and schedulers.