Fix non persistant buffer dispatch #1941

SunMarc · 2023-09-07T22:45:18Z

What does this PR do ?

This PR fixes the offload of buffers when using dispatch_model or cpu_offload (related to hooks).
Currently, dispatch_model will offload all buffers if offload_buffers is set to True. This is problematic when we have non persistant buffers that are, by definition, not saved in the state_dict of the model. When we use dispatch_model alone, we will store the state_dict of some modules on the disk. Hence, we won't be able to retrieve them. To fix that, we simply remove the non persistant buffers from the list of buffers to offload.

PS: diffusers team actually use offload buffers

HuggingFaceDocBuilderDev · 2023-09-07T22:50:14Z

The documentation is not available anymore as the PR was closed or merged.

BenjaminBossan

Thanks for addressing that issue. In general, the implementation LGTM, so no comments from me on that.

One thing I would like to see are some tests to check for excluding non-persistent buffers. There are some tests for named_module_tensors, which could be extended to check for the new option. This doesn't encapsulate every addition of this PR, so ideally the device hooks could also be tested, not sure how easy that is.

Another more general comment is that now, the new parameter remove_non_persistant has to be passed around all over the place. Are there use cases that do require to include non-persistent buffers? If not, always removing them would simplify things.

Finally, the functions big_modeling.load_checkpoint_and_dispatch and utils.bnb.load_and_quantize_model call dispatch_model, should their signatures be extended to include remove_non_persistant too?

SunMarc · 2023-09-11T13:58:33Z

Hi @BenjaminBossan, thanks for your review !

Another more general comment is that now, the new parameter remove_non_persistant has to be passed around all over the place. Are there use cases that do require to include non-persistent buffers? If not, always removing them would simplify things.

I don't think. If a user needs to offload non-persistant buffers, he will need to create a mapping using directly named_buffers. Since we didn't see any issue on that, mabye it means that nobody attempted to offload non persistant the buffers ? If so, it is would be easier to just remove them. Let me know that you think !

One thing I would like to see are some tests to check for excluding non-persistent buffers. There are some tests for named_module_tensors, which could be extended to check for the new option. This doesn't encapsulate every addition of this PR, so ideally the device hooks could also be tested, not sure how easy that is.

In any case, I will add a few test for named_modules_tensors and try to come up with something for the hooks.

Finally, the functions big_modeling.load_checkpoint_and_dispatch and utils.bnb.load_and_quantize_model call dispatch_model, should their signatures be extended to include remove_non_persistant too?

Yes if we decide to keep remove_non_persistant

BenjaminBossan · 2023-09-11T14:21:11Z

I don't think. If a user needs to offload non-persistant buffers, he will need to create a mapping using directly named_buffers. Since we didn't see any issue on that, mabye it means that nobody attempted to offload non persistant the buffers ? If so, it is would be easier to just remove them. Let me know that you think !

Maybe @muellerzr can comment on that? If we decide to make the change, we should ensure to mention it somewhere (like making a note to include it in the next release text).

SunMarc · 2023-09-11T15:39:30Z

I've added a few tests ! Let me know if you want to make the change. If we make the change @muellerzr, I will keep remove_non_persistent in the get_named_tensors. But I will remove it for the other functions by setting it to True when offload_buffers is True.

muellerzr · 2023-09-13T16:30:11Z

@SunMarc let's go with that, and if users request that we expand the param/trickle it down to lower levels, we can.

SunMarc · 2023-09-14T15:48:01Z

still having an issue with falcon 180B gptq offload. I will put it in draft for now

…buffers

SunMarc · 2023-11-16T22:46:30Z

After digging a little bit into it, I think we should keep remove_non_persistent arg for named_module_tensors because I need both cases. Plus, this way we don't break BC. I only set it to True in offloaded modules. Also, I don't expand the param to dispatch_model or cpu_offload as I don't think anyone will change the behavior. The diffusers team had an issue about that and this PR should fix it.

cc @muellerzr @BenjaminBossan @sayakpaul

sayakpaul · 2023-11-17T02:07:34Z

Thanks for tagging me in. From the diffusers side the following piece of code was breaking:

from diffusers import PixArtAlphaPipeline
import torch

def bytes_to_giga_bytes(bytes):
    return bytes / 1024 / 1024 / 1024

pipe = PixArtAlphaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-XL-2-1024-MS", 
    torch_dtype=torch.float16
)
pipe.enable_sequential_cpu_offload()

images = pipe("hey", num_images_per_prompt=4).images[0]
print(
    f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB"
)

BenjaminBossan

Thanks for fixing this issue and adding a couple of tests to check it.

I'm a bit concerned that we have to rely on a private attribute _non_persistent_buffers_set but AFAICT, it's the only place that information is stored, so there is no way around it, and it seems to exist since many PyTorch releases, which hopefully means that it's safe.

muellerzr

This is a great fix! However one big issue with the logic here we need to address :)

src/accelerate/utils/modeling.py

offload only persistant buffer

f4cf186

SunMarc requested review from BenjaminBossan and muellerzr September 7, 2023 22:45

BenjaminBossan reviewed Sep 8, 2023

View reviewed changes

add tests and fix naming

8319889

SunMarc added 3 commits September 14, 2023 08:06

remove_non_persistant=True by default

5cfcf7a

style

24bc765

style again

131d76e

SunMarc requested a review from BenjaminBossan September 14, 2023 08:35

fix hooks

a41394f

SunMarc marked this pull request as draft September 14, 2023 15:49

huggingface deleted a comment from github-actions bot Oct 18, 2023

SunMarc added 2 commits November 16, 2023 20:29

Merge remote-tracking branch 'upstream/main' into fix_non_persistant_…

36b9c70

…buffers

fix logic

bdd7fdb

SunMarc marked this pull request as ready for review November 16, 2023 22:39

BenjaminBossan approved these changes Nov 17, 2023

View reviewed changes

muellerzr reviewed Nov 17, 2023

View reviewed changes

src/accelerate/utils/modeling.py Show resolved Hide resolved

SunMarc requested a review from muellerzr November 17, 2023 15:56

muellerzr approved these changes Nov 17, 2023

View reviewed changes

SunMarc merged commit 35b0206 into huggingface:main Nov 20, 2023
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix non persistant buffer dispatch #1941

Fix non persistant buffer dispatch #1941

SunMarc commented Sep 7, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 7, 2023 •

edited

Loading

BenjaminBossan left a comment

SunMarc commented Sep 11, 2023 •

edited

Loading

BenjaminBossan commented Sep 11, 2023

SunMarc commented Sep 11, 2023

muellerzr commented Sep 13, 2023

SunMarc commented Sep 14, 2023

SunMarc commented Nov 16, 2023 •

edited

Loading

sayakpaul commented Nov 17, 2023

BenjaminBossan left a comment

muellerzr left a comment

Fix non persistant buffer dispatch #1941

Fix non persistant buffer dispatch #1941

Conversation

SunMarc commented Sep 7, 2023 • edited Loading

What does this PR do ?

HuggingFaceDocBuilderDev commented Sep 7, 2023 • edited Loading

BenjaminBossan left a comment

Choose a reason for hiding this comment

SunMarc commented Sep 11, 2023 • edited Loading

BenjaminBossan commented Sep 11, 2023

SunMarc commented Sep 11, 2023

muellerzr commented Sep 13, 2023

SunMarc commented Sep 14, 2023

SunMarc commented Nov 16, 2023 • edited Loading

sayakpaul commented Nov 17, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

SunMarc commented Sep 7, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 7, 2023 •

edited

Loading

SunMarc commented Sep 11, 2023 •

edited

Loading

SunMarc commented Nov 16, 2023 •

edited

Loading