From 74eb26b73fcad17ae06271734e6909c4b8ba6607 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=2E=20Tolga=20Cang=C3=B6z?= <46008593+standardAI@users.noreply.github.com> Date: Mon, 20 Mar 2023 12:56:18 +0300 Subject: [PATCH] Update fp16.mdx Fix typos --- docs/source/en/optimization/fp16.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/optimization/fp16.mdx b/docs/source/en/optimization/fp16.mdx index eef1dcec90f5..c18cefbde6a9 100644 --- a/docs/source/en/optimization/fp16.mdx +++ b/docs/source/en/optimization/fp16.mdx @@ -221,7 +221,7 @@ image = pipe(prompt).images[0] Full-model offloading is an alternative that moves whole models to the GPU, instead of handling each model's constituent _modules_. This results in a negligible impact on inference time (compared with moving the pipeline to `cuda`), while still providing some memory savings. In this scenario, only one of the main components of the pipeline (typically: text encoder, unet and vae) -will be in the GPU while the others wait in the CPU. Compoments like the UNet that run for multiple iterations will stay on GPU until they are no longer needed. +will be in the GPU while the others wait in the CPU. Components like the UNet that run for multiple iterations will stay on GPU until they are no longer needed. This feature can be enabled by invoking `enable_model_cpu_offload()` on the pipeline, as shown below.