From 28ef949cf648176ec1b7b9659b59571c150c8169 Mon Sep 17 00:00:00 2001
From: sayakpaul <spsayakpaul@gmail.com>
Date: Mon, 24 Jun 2024 11:19:32 +0530
Subject: [PATCH 1/3] add note on caching in fast diffusion

---
 docs/source/en/tutorials/fast_diffusion.md | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/docs/source/en/tutorials/fast_diffusion.md b/docs/source/en/tutorials/fast_diffusion.md
index f827d118ca2f..cc1f58930318 100644
--- a/docs/source/en/tutorials/fast_diffusion.md
+++ b/docs/source/en/tutorials/fast_diffusion.md
@@ -34,13 +34,11 @@ Install [PyTorch nightly](https://pytorch.org/) to benefit from the latest and f
 pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121
 ```
 
-<Tip>
-
-The results reported below are from a 80GB 400W A100 with its clock rate set to the maximum. <br>
+> [!TIP]
+> The results reported below are from a 80GB 400W A100 with its clock rate set to the maximum. 
 
-If you're interested in the full benchmarking code, take a look at [huggingface/diffusion-fast](https://github.com/huggingface/diffusion-fast).
+> If you're interested in the full benchmarking code, take a look at [huggingface/diffusion-fast](https://github.com/huggingface/diffusion-fast).
 
-</Tip>
 
 ## Baseline
 
@@ -170,6 +168,9 @@ Using SDPA attention and compiling both the UNet and VAE cuts the latency from 3
     <img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/progressive-acceleration-sdxl/SDXL%2C_Batch_Size%3A_1%2C_Steps%3A_30_3.png" width=500>
 </div>
 
+> [!TIP]
+> Starting with PyTorch 2.3.1, you can control the caching behaviour of `torch.compile()`. This is particularly beneficial for compilation modes like `"max-autotune"` which performs a grid-search over several compilation flags to find the optimal configuration. Learn more about how to enable caching when using `torch.compile()` from [here](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html). 
+
 ### Prevent graph breaks
 
 Specifying `fullgraph=True` ensures there are no graph breaks in the underlying model to take full advantage of `torch.compile` without any performance degradation. For the UNet and VAE, this means changing how you access the return variables.

From 1b4c4d46143e8146e94d1ef2145e495458a65174 Mon Sep 17 00:00:00 2001
From: sayakpaul <spsayakpaul@gmail.com>
Date: Mon, 24 Jun 2024 11:45:34 +0530
Subject: [PATCH 2/3] formatting

---
 docs/source/en/tutorials/fast_diffusion.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/source/en/tutorials/fast_diffusion.md b/docs/source/en/tutorials/fast_diffusion.md
index cc1f58930318..db764aeeed6b 100644
--- a/docs/source/en/tutorials/fast_diffusion.md
+++ b/docs/source/en/tutorials/fast_diffusion.md
@@ -36,7 +36,6 @@ pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu
 
 > [!TIP]
 > The results reported below are from a 80GB 400W A100 with its clock rate set to the maximum. 
-
 > If you're interested in the full benchmarking code, take a look at [huggingface/diffusion-fast](https://github.com/huggingface/diffusion-fast).
 
 

From 3e3d102f201d7fa64cb180ddf7c6249f7a6a9c72 Mon Sep 17 00:00:00 2001
From: Sayak Paul <spsayakpaul@gmail.com>
Date: Mon, 24 Jun 2024 22:27:13 +0530
Subject: [PATCH 3/3] Update docs/source/en/tutorials/fast_diffusion.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/tutorials/fast_diffusion.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/tutorials/fast_diffusion.md b/docs/source/en/tutorials/fast_diffusion.md
index db764aeeed6b..266ea7fa236b 100644
--- a/docs/source/en/tutorials/fast_diffusion.md
+++ b/docs/source/en/tutorials/fast_diffusion.md
@@ -168,7 +168,7 @@ Using SDPA attention and compiling both the UNet and VAE cuts the latency from 3
 </div>
 
 > [!TIP]
-> Starting with PyTorch 2.3.1, you can control the caching behaviour of `torch.compile()`. This is particularly beneficial for compilation modes like `"max-autotune"` which performs a grid-search over several compilation flags to find the optimal configuration. Learn more about how to enable caching when using `torch.compile()` from [here](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html). 
+> From PyTorch 2.3.1, you can control the caching behavior of `torch.compile()`. This is particularly beneficial for compilation modes like `"max-autotune"` which performs a grid-search over several compilation flags to find the optimal configuration. Learn more in the [Compile Time Caching in torch.compile](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html) tutorial. 
 
 ### Prevent graph breaks