Added LOWVRAM env variable to text-to-image #36

Titan-Node · 2024-03-13T21:16:14Z

Enables text-to-image sequential cpu offloading , roughly 1.2x savings of VRAM with 7.6x longer inference time.

yondonfu

Thanks for looking into this! Left some comments.

yondonfu · 2024-03-14T16:06:44Z

runner/app/pipelines/text_to_image.py

@@ -94,6 +94,8 @@ def __init__(self, model_id: str):
            self.ldm = AutoPipelineForText2Image.from_pretrained(model_id, **kwargs).to(
                torch_device
            )
+        if os.environ.get("LOWVRAM"):
+            self.ldm.enable_sequential_cpu_offload()


@Titan-Node Have you tried model offloading instead of sequential CPU offloading? Would be curious how the VRAM savings + inference speed results compare given that model offloading is supposed to have less of an impact on inference speed.

Interestingly with model offloading it ended up increasing the GPU memory requirements. I tried both with torch_device being set to GPU and CPU but model offloading had either no effect or increased GPU RAM while also slowing down inference by 4x.
I've also tried model offloading with text, image and video pipelines with the same results, I gave up on it once I seen the sequential offloading option. I tried both together but I believe it throw an error.

Here is some breakdowns of some tests I did on an H100.
benchmarks.xlsx

yondonfu · 2024-03-14T16:09:11Z

runner/app/pipelines/text_to_image.py

@@ -94,6 +94,8 @@ def __init__(self, model_id: str):
            self.ldm = AutoPipelineForText2Image.from_pretrained(model_id, **kwargs).to(
                torch_device
            )
+        if os.environ.get("LOWVRAM"):


L94 will move the pipeline to the GPU when cuda is available (torch_device gets set to cuda) via the .to() call. Based on the diffusers docs it looks like you should not move the pipeline to the GPU first if either CPU or model offloading is used. So, if LOWVRAM is enabled you'd probably want to do something like this:

self.ldm = AutoPipelineForText2Image.from_pretrained(model_id, **kwargs) if os.environ.get("LOWVRAM"): # Enable CPU or model offloading else: self.ldm.to(torch_device)

There doesnt seem to be a performance or RAM difference when putting the .to() call before it but if the docs say not to then I will make that modification

I'm also running into issues with using the SFAST flag with the LOWRAM flag, although it seems the SFAST flag is not working anymore by default.

Getting:

File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/sfast/jit/overrides.py", line 21, in __torch_function__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ NotImplementedError: Cannot copy out of meta tensor; no data!

Either way I'm going to do some more testing to make sure the LOWRAM flag is put in the correct place and does not effect the SFAST flag. I'll close this for now and do everything in a single commit.

Ah I wonder if stable-fast only works if the entire pipeline is loaded to the CUDA device.

Added LOWVRAM env variable to text-to-image

1e0f16f

rickstaa mentioned this pull request Mar 14, 2024

Enhancing VRAM Usage and Inference Speed with Diffusers Optimizations #38

Open

yondonfu requested changes Mar 14, 2024

View reviewed changes

This was referenced Mar 14, 2024

Added LOWVRAM env variable to image-to-image #35

Closed

Added LOWVRAM env variable to image-to-video #34

Closed

Titan-Node closed this Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added LOWVRAM env variable to text-to-image #36

Added LOWVRAM env variable to text-to-image #36

Titan-Node commented Mar 13, 2024

yondonfu left a comment

yondonfu Mar 14, 2024

Titan-Node Mar 14, 2024 •

edited

yondonfu Mar 14, 2024

Titan-Node Mar 14, 2024

Titan-Node Mar 15, 2024

yondonfu Mar 15, 2024

Added LOWVRAM env variable to text-to-image #36

Added LOWVRAM env variable to text-to-image #36

Conversation

Titan-Node commented Mar 13, 2024

yondonfu left a comment

Choose a reason for hiding this comment

yondonfu Mar 14, 2024

Choose a reason for hiding this comment

Titan-Node Mar 14, 2024 • edited

Choose a reason for hiding this comment

yondonfu Mar 14, 2024

Choose a reason for hiding this comment

Titan-Node Mar 14, 2024

Choose a reason for hiding this comment

Titan-Node Mar 15, 2024

Choose a reason for hiding this comment

yondonfu Mar 15, 2024

Choose a reason for hiding this comment

Titan-Node Mar 14, 2024 •

edited