From 28ef949cf648176ec1b7b9659b59571c150c8169 Mon Sep 17 00:00:00 2001 From: sayakpaul Date: Mon, 24 Jun 2024 11:19:32 +0530 Subject: [PATCH 1/3] add note on caching in fast diffusion --- docs/source/en/tutorials/fast_diffusion.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/source/en/tutorials/fast_diffusion.md b/docs/source/en/tutorials/fast_diffusion.md index f827d118ca2f..cc1f58930318 100644 --- a/docs/source/en/tutorials/fast_diffusion.md +++ b/docs/source/en/tutorials/fast_diffusion.md @@ -34,13 +34,11 @@ Install [PyTorch nightly](https://pytorch.org/) to benefit from the latest and f pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121 ``` - - -The results reported below are from a 80GB 400W A100 with its clock rate set to the maximum.
+> [!TIP] +> The results reported below are from a 80GB 400W A100 with its clock rate set to the maximum. -If you're interested in the full benchmarking code, take a look at [huggingface/diffusion-fast](https://github.com/huggingface/diffusion-fast). +> If you're interested in the full benchmarking code, take a look at [huggingface/diffusion-fast](https://github.com/huggingface/diffusion-fast). -
## Baseline @@ -170,6 +168,9 @@ Using SDPA attention and compiling both the UNet and VAE cuts the latency from 3 +> [!TIP] +> Starting with PyTorch 2.3.1, you can control the caching behaviour of `torch.compile()`. This is particularly beneficial for compilation modes like `"max-autotune"` which performs a grid-search over several compilation flags to find the optimal configuration. Learn more about how to enable caching when using `torch.compile()` from [here](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html). + ### Prevent graph breaks Specifying `fullgraph=True` ensures there are no graph breaks in the underlying model to take full advantage of `torch.compile` without any performance degradation. For the UNet and VAE, this means changing how you access the return variables. From 1b4c4d46143e8146e94d1ef2145e495458a65174 Mon Sep 17 00:00:00 2001 From: sayakpaul Date: Mon, 24 Jun 2024 11:45:34 +0530 Subject: [PATCH 2/3] formatting --- docs/source/en/tutorials/fast_diffusion.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/en/tutorials/fast_diffusion.md b/docs/source/en/tutorials/fast_diffusion.md index cc1f58930318..db764aeeed6b 100644 --- a/docs/source/en/tutorials/fast_diffusion.md +++ b/docs/source/en/tutorials/fast_diffusion.md @@ -36,7 +36,6 @@ pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu > [!TIP] > The results reported below are from a 80GB 400W A100 with its clock rate set to the maximum. - > If you're interested in the full benchmarking code, take a look at [huggingface/diffusion-fast](https://github.com/huggingface/diffusion-fast). From 3e3d102f201d7fa64cb180ddf7c6249f7a6a9c72 Mon Sep 17 00:00:00 2001 From: Sayak Paul Date: Mon, 24 Jun 2024 22:27:13 +0530 Subject: [PATCH 3/3] Update docs/source/en/tutorials/fast_diffusion.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/tutorials/fast_diffusion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/tutorials/fast_diffusion.md b/docs/source/en/tutorials/fast_diffusion.md index db764aeeed6b..266ea7fa236b 100644 --- a/docs/source/en/tutorials/fast_diffusion.md +++ b/docs/source/en/tutorials/fast_diffusion.md @@ -168,7 +168,7 @@ Using SDPA attention and compiling both the UNet and VAE cuts the latency from 3 > [!TIP] -> Starting with PyTorch 2.3.1, you can control the caching behaviour of `torch.compile()`. This is particularly beneficial for compilation modes like `"max-autotune"` which performs a grid-search over several compilation flags to find the optimal configuration. Learn more about how to enable caching when using `torch.compile()` from [here](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html). +> From PyTorch 2.3.1, you can control the caching behavior of `torch.compile()`. This is particularly beneficial for compilation modes like `"max-autotune"` which performs a grid-search over several compilation flags to find the optimal configuration. Learn more in the [Compile Time Caching in torch.compile](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html) tutorial. ### Prevent graph breaks