diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
index 2381791a241b..1a0d8f5cd6c8 100644
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -4,7 +4,7 @@
- local: quicktour
title: Quicktour
- local: stable_diffusion
- title: Stable Diffusion
+ title: Effective and efficient diffusion
- local: installation
title: Installation
title: Get started
diff --git a/docs/source/en/stable_diffusion.mdx b/docs/source/en/stable_diffusion.mdx
index c1eef6fa3c5c..eebe0ec660f2 100644
--- a/docs/source/en/stable_diffusion.mdx
+++ b/docs/source/en/stable_diffusion.mdx
@@ -1,333 +1,271 @@
-
-
-# The Stable Diffusion Guide ๐จ
-
-
-
-
-## Intro
-
-Stable Diffusion is a [Latent Diffusion model](https://github.com/CompVis/latent-diffusion) developed by researchers from the Machine Vision and Learning group at LMU Munich, *a.k.a* CompVis.
-Model checkpoints were publicly released at the end of August 2022 by a collaboration of Stability AI, CompVis, and Runway with support from EleutherAI and LAION. For more information, you can check out [the official blog post](https://stability.ai/blog/stable-diffusion-public-release).
-
-Since its public release the community has done an incredible job at working together to make the stable diffusion checkpoints **faster**, **more memory efficient**, and **more performant**.
-
-๐งจ Diffusers offers a simple API to run stable diffusion with all memory, computing, and quality improvements.
-
-This notebook walks you through the improvements one-by-one so you can best leverage [`StableDiffusionPipeline`] for **inference**.
-
-## Prompt Engineering ๐จ
-
-When running *Stable Diffusion* in inference, we usually want to generate a certain type, or style of image and then improve upon it. Improving upon a previously generated image means running inference over and over again with a different prompt and potentially a different seed until we are happy with our generation.
-
-So to begin with, it is most important to speed up stable diffusion as much as possible to generate as many pictures as possible in a given amount of time.
-
-This can be done by both improving the **computational efficiency** (speed) and the **memory efficiency** (GPU RAM).
-
-Let's start by looking into computational efficiency first.
-
-Throughout the notebook, we will focus on [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5):
-
-``` python
-model_id = "runwayml/stable-diffusion-v1-5"
-```
-
-Let's load the pipeline.
-
-## Speed Optimization
-
-``` python
-from diffusers import DiffusionPipeline
-
-pipe = DiffusionPipeline.from_pretrained(model_id)
-```
-
-We aim at generating a beautiful photograph of an *old warrior chief* and will later try to find the best prompt to generate such a photograph. For now, let's keep the prompt simple:
-
-``` python
-prompt = "portrait photo of a old warrior chief"
-```
-
-To begin with, we should make sure we run inference on GPU, so let's move the pipeline to GPU, just like you would with any PyTorch module.
-
-``` python
-pipe = pipe.to("cuda")
-```
-
-To generate an image, you should use the [~`StableDiffusionPipeline.__call__`] method.
-
-To make sure we can reproduce more or less the same image in every call, let's make use of the generator. See the documentation on reproducibility [here](./conceptual/reproducibility) for more information.
-
-``` python
-generator = torch.Generator("cuda").manual_seed(0)
-```
-
-Now, let's take a spin on it.
-
-``` python
-image = pipe(prompt, generator=generator).images[0]
-image
-```
-
-
-
-Cool, this now took roughly 30 seconds on a T4 GPU (you might see faster inference if your allocated GPU is better than a T4).
-
-The default run we did above used full float32 precision and ran the default number of inference steps (50). The easiest speed-ups come from switching to float16 (or half) precision and simply running fewer inference steps. Let's load the model now in float16 instead.
-
-``` python
-import torch
-
-pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
-pipe = pipe.to("cuda")
-```
-
-And we can again call the pipeline to generate an image.
-
-``` python
-generator = torch.Generator("cuda").manual_seed(0)
-
-image = pipe(prompt, generator=generator).images[0]
-image
-```
-
-
-Cool, this is almost three times as fast for arguably the same image quality.
-
-We strongly suggest always running your pipelines in float16 as so far we have very rarely seen degradations in quality because of it.
-
-Next, let's see if we need to use 50 inference steps or whether we could use significantly fewer. The number of inference steps is associated with the denoising scheduler we use. Choosing a more efficient scheduler could help us decrease the number of steps.
-
-Let's have a look at all the schedulers the stable diffusion pipeline is compatible with.
-
-``` python
-pipe.scheduler.compatibles
-```
-
-```
- [diffusers.schedulers.scheduling_dpmsolver_singlestep.DPMSolverSinglestepScheduler,
- diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler,
- diffusers.schedulers.scheduling_heun_discrete.HeunDiscreteScheduler,
- diffusers.schedulers.scheduling_pndm.PNDMScheduler,
- diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler,
- diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler,
- diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler,
- diffusers.schedulers.scheduling_ddpm.DDPMScheduler,
- diffusers.schedulers.scheduling_ddim.DDIMScheduler]
-```
-
-Cool, that's a lot of schedulers.
-
-๐งจ Diffusers is constantly adding a bunch of novel schedulers/samplers that can be used with Stable Diffusion. For more information, we recommend taking a look at the official documentation [here](https://huggingface.co/docs/diffusers/main/en/api/schedulers/overview).
-
-Alright, right now Stable Diffusion is using the `PNDMScheduler` which usually requires around 50 inference steps. However, other schedulers such as `DPMSolverMultistepScheduler` or `DPMSolverSinglestepScheduler` seem to get away with just 20 to 25 inference steps. Let's try them out.
-
-You can set a new scheduler by making use of the [from_config](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) function.
-
-``` python
-from diffusers import DPMSolverMultistepScheduler
-
-pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
-```
-
-Now, let's try to reduce the number of inference steps to just 20.
-
-``` python
-generator = torch.Generator("cuda").manual_seed(0)
-
-image = pipe(prompt, generator=generator, num_inference_steps=20).images[0]
-image
-```
-
-
-
-The image now does look a little different, but it's arguably still of equally high quality. We now cut inference time to just 4 seconds though ๐.
-
-## Memory Optimization
+
+
+# Effective and efficient diffusion
-``` python
-def get_inputs(batch_size=1):
- generator = [torch.Generator("cuda").manual_seed(i) for i in range(batch_size)]
- prompts = batch_size * [prompt]
- num_inference_steps = 20
+[[open-in-colab]]
- return {"prompt": prompts, "generator": generator, "num_inference_steps": num_inference_steps}
-```
-This function returns a list of prompts and a list of generators, so we can reuse the generator that produced a result we like.
+Getting the [`DiffusionPipeline`] to generate images in a certain style or include what you want can be tricky. Often times, you have to run the [`DiffusionPipeline`] several times before you end up with an image you're happy with. But generating something out of nothing is a computationally intensive process, especially if you're running inference over and over again.
-We also need a method that allows us to easily display a batch of images.
+This is why it's important to get the most *computational* (speed) and *memory* (GPU RAM) efficiency from the pipeline to reduce the time between inference cycles so you can iterate faster.
-``` python
-from PIL import Image
+This tutorial walks you through how to generate faster and better with the [`DiffusionPipeline`].
-def image_grid(imgs, rows=2, cols=2):
- w, h = imgs[0].size
- grid = Image.new('RGB', size=(cols*w, rows*h))
-
- for i, img in enumerate(imgs):
- grid.paste(img, box=(i%cols*w, i//cols*h))
- return grid
-```
+Begin by loading the [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) model:
-Cool, let's see how much memory we can use starting with `batch_size=4`.
+```python
+from diffusers import DiffusionPipeline
-``` python
-images = pipe(**get_inputs(batch_size=4)).images
-image_grid(images)
-```
+model_id = "runwayml/stable-diffusion-v1-5"
+pipeline = DiffusionPipeline.from_pretrained(model_id)
+```
-
+The example prompt you'll use is a portrait of an old warrior chief, but feel free to use your own prompt:
-Going over a batch_size of 4 will error out in this notebook (assuming we are running it on a T4 GPU). Also, we can see we only generate slightly more images per second (3.75s/image) compared to 4s/image previously.
+```python
+prompt = "portrait photo of a old warrior chief"
+```
-However, the community has found some nice tricks to improve the memory constraints further. After stable diffusion was released, the community found improvements within days and shared them freely over GitHub - open-source at its finest! I believe the original idea came from [this](https://github.com/basujindal/stable-diffusion/pull/117) GitHub thread.
+## Speed
-By far most of the memory is taken up by the cross-attention layers. Instead of running this operation in batch, one can run it sequentially to save a significant amount of memory.
+
+
+
+
+
+
+
+