-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for lower VRAM #31
Conversation
Enable sliced attention computation using less_vram=True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your contribution!! I had @anton-l from diffusers glance at this, and they said its good to merge. I'd still like to do a sanity check myself by cloning and running this, though - so whenever I get the chance to do that (probably later today) I'll go ahead and merge if there are no issues.
Thanks again! 🚀
I ran some tests - code runs just fine. But I'm a little confused here...I seem to be getting just about the same speed per frame on both options, but the memory usage is actually higher for
|
Cheers. Thanks for reviewing the PR! The effect of attention slicing seems to persist in the same kernel. The tests below are with 960x512 image size. less_vram=False in a new kernel
less_vram=True after running less_vram=False onceSame as above.
less_vram=True in a new kernelUses less vRAM compared to less_vram=False in a new kernel.
less_vram=False after running less_vram=True onceUses slightly more VRAM than above but less than less_vram=False in a new kernel.
|
Based on above, I expected that using
It turned out to be again different from what I expected. It seems that in the same kernel session, as soon as diffuser runs without attention slicing just once, Do you know why this happens? Is this behavior intended? FYI @anton-l 1) less_vram=False in a new kernel
2) Run less_vram=False first, then less_vram=True (less_vram=False --> less_vram=True)I expected this to be lower than 1) but it ended up using the same VRAM.
3) less_vram=True in a new kernelThis is lower than 1) as I expected.
4) less_vram=True --> less_vram=FalseI expected this to be the same as 1) but it ended up higher.
5) less_vram=True --> less_vram=False --> less_vram=TrueI expected this to be the same as 3) but it ended up higher.
|
Tysm for doing additional experiments! Seems #28 or #25 would make this a bit less of an issue (because it could be set in init). Will address these issues later. As for this feature, we'll leave it out of the app for now so folks don't come asking why it's not working as expected haha. Thanks again 🚀 |
Awesome! Rebased ⭐️ |
Enable sliced attention computation using less_vram=True.
enable_attention_slicing()
anddisable_attention_slicing()
are copied from the latest diffusers library.