Add option for lower VRAM #31

0x1355 · 2022-09-13T04:28:41Z

Enable sliced attention computation using less_vram=True.

enable_attention_slicing() and disable_attention_slicing() are copied from the latest diffusers library.

Enable sliced attention computation using less_vram=True.

nateraw

Thank you so much for your contribution!! I had @anton-l from diffusers glance at this, and they said its good to merge. I'd still like to do a sanity check myself by cloning and running this, though - so whenever I get the chance to do that (probably later today) I'll go ahead and merge if there are no issues.

Thanks again! 🚀

nateraw · 2022-09-13T21:11:49Z

I ran some tests - code runs just fine. But I'm a little confused here...I seem to be getting just about the same speed per frame on both options, but the memory usage is actually higher for less_vram=True. Maybe I'm not feeding the GPU fast enough for it to matter as much here? idk

# less_vram=True
"""
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P0   285W / 300W |   8037MiB / 16160MiB |     92%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8165      C   python                           8035MiB |
+-----------------------------------------------------------------------------+
"""

# less_vram=False
"""
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P0   255W / 300W |   7513MiB / 16160MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      9149      C   python                           7511MiB |
+-----------------------------------------------------------------------------+
"""

0x1355 · 2022-09-14T03:06:36Z

Cheers. Thanks for reviewing the PR!

The effect of attention slicing seems to persist in the same kernel. The tests below are with 960x512 image size.

less_vram=False in a new kernel

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                    0 |
| N/A   45C    P0   234W / 250W |  11784MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

less_vram=True after running less_vram=False once

Same as above.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                    0 |
| N/A   65C    P0   257W / 250W |  11784MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

less_vram=True in a new kernel

Uses less vRAM compared to less_vram=False in a new kernel.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                    0 |
| N/A   53C    P0   228W / 250W |   6356MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

less_vram=False after running less_vram=True once

Uses slightly more VRAM than above but less than less_vram=False in a new kernel.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                    0 |
| N/A   60C    P0   258W / 250W |   7316MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

0x1355 · 2022-09-14T03:59:37Z

Based on above, I expected that using disable_attention_slicing() would eliminate that persistence. Like this:

    if less_vram:
        pipeline.enable_attention_slicing()
    else:
        pipeline.disable_attention_slicing()

It turned out to be again different from what I expected. It seems that in the same kernel session, as soon as diffuser runs without attention slicing just once, enable_attention_slicing() doesn't reduce VRAM usage any more.

Do you know why this happens? Is this behavior intended?

FYI @anton-l

1) less_vram=False in a new kernel

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                    0 |
| N/A   46C    P0   248W / 250W |  11758MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

2) Run less_vram=False first, then less_vram=True (less_vram=False --> less_vram=True)

I expected this to be lower than 1) but it ended up using the same VRAM.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                    0 |
| N/A   50C    P0   254W / 250W |  11758MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

3) less_vram=True in a new kernel

This is lower than 1) as I expected.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                    0 |
| N/A   49C    P0   248W / 250W |   6382MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

4) less_vram=True --> less_vram=False

I expected this to be the same as 1) but it ended up higher.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                    0 |
| N/A   50C    P0   250W / 250W |  14544MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

5) less_vram=True --> less_vram=False --> less_vram=True

I expected this to be the same as 3) but it ended up higher.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                    0 |
| N/A   43C    P0   241W / 250W |  14544MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

nateraw · 2022-09-14T13:19:56Z

Tysm for doing additional experiments!

Seems #28 or #25 would make this a bit less of an issue (because it could be set in init).

Will address these issues later. As for this feature, we'll leave it out of the app for now so folks don't come asking why it's not working as expected haha. Thanks again 🚀

0x1355 · 2022-09-14T18:13:14Z

Awesome! Rebased ⭐️

Add option for lower VRAM

01c96db

Enable sliced attention computation using less_vram=True.

nateraw reviewed Sep 13, 2022

View reviewed changes

nateraw merged commit 985e700 into nateraw:main Sep 14, 2022

0x1355 deleted the attention_slicing branch September 21, 2022 04:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for lower VRAM #31

Add option for lower VRAM #31

0x1355 commented Sep 13, 2022

nateraw left a comment

nateraw commented Sep 13, 2022

0x1355 commented Sep 14, 2022 •

edited

0x1355 commented Sep 14, 2022 •

edited

nateraw commented Sep 14, 2022

0x1355 commented Sep 14, 2022

Add option for lower VRAM #31

Add option for lower VRAM #31

Conversation

0x1355 commented Sep 13, 2022

nateraw left a comment

Choose a reason for hiding this comment

nateraw commented Sep 13, 2022

0x1355 commented Sep 14, 2022 • edited

less_vram=False in a new kernel

less_vram=True after running less_vram=False once

less_vram=True in a new kernel

less_vram=False after running less_vram=True once

0x1355 commented Sep 14, 2022 • edited

1) less_vram=False in a new kernel

2) Run less_vram=False first, then less_vram=True (less_vram=False --> less_vram=True)

3) less_vram=True in a new kernel

4) less_vram=True --> less_vram=False

5) less_vram=True --> less_vram=False --> less_vram=True

nateraw commented Sep 14, 2022

0x1355 commented Sep 14, 2022

0x1355 commented Sep 14, 2022 •

edited

0x1355 commented Sep 14, 2022 •

edited