Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix non persistant buffer dispatch #1941

Merged
merged 8 commits into from
Nov 20, 2023

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Sep 7, 2023

What does this PR do ?

This PR fixes the offload of buffers when using dispatch_model or cpu_offload (related to hooks).
Currently, dispatch_model will offload all buffers if offload_buffers is set to True. This is problematic when we have non persistant buffers that are, by definition, not saved in the state_dict of the model. When we use dispatch_model alone, we will store the state_dict of some modules on the disk. Hence, we won't be able to retrieve them. To fix that, we simply remove the non persistant buffers from the list of buffers to offload.

PS: diffusers team actually use offload buffers

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 7, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing that issue. In general, the implementation LGTM, so no comments from me on that.

One thing I would like to see are some tests to check for excluding non-persistent buffers. There are some tests for named_module_tensors, which could be extended to check for the new option. This doesn't encapsulate every addition of this PR, so ideally the device hooks could also be tested, not sure how easy that is.

Another more general comment is that now, the new parameter remove_non_persistant has to be passed around all over the place. Are there use cases that do require to include non-persistent buffers? If not, always removing them would simplify things.

Finally, the functions big_modeling.load_checkpoint_and_dispatch and utils.bnb.load_and_quantize_model call dispatch_model, should their signatures be extended to include remove_non_persistant too?

@SunMarc
Copy link
Member Author

SunMarc commented Sep 11, 2023

Hi @BenjaminBossan, thanks for your review !

Another more general comment is that now, the new parameter remove_non_persistant has to be passed around all over the place. Are there use cases that do require to include non-persistent buffers? If not, always removing them would simplify things.

I don't think. If a user needs to offload non-persistant buffers, he will need to create a mapping using directly named_buffers. Since we didn't see any issue on that, mabye it means that nobody attempted to offload non persistant the buffers ? If so, it is would be easier to just remove them. Let me know that you think !

One thing I would like to see are some tests to check for excluding non-persistent buffers. There are some tests for named_module_tensors, which could be extended to check for the new option. This doesn't encapsulate every addition of this PR, so ideally the device hooks could also be tested, not sure how easy that is.

In any case, I will add a few test for named_modules_tensors and try to come up with something for the hooks.

Finally, the functions big_modeling.load_checkpoint_and_dispatch and utils.bnb.load_and_quantize_model call dispatch_model, should their signatures be extended to include remove_non_persistant too?

Yes if we decide to keep remove_non_persistant

@BenjaminBossan
Copy link
Member

I don't think. If a user needs to offload non-persistant buffers, he will need to create a mapping using directly named_buffers. Since we didn't see any issue on that, mabye it means that nobody attempted to offload non persistant the buffers ? If so, it is would be easier to just remove them. Let me know that you think !

Maybe @muellerzr can comment on that? If we decide to make the change, we should ensure to mention it somewhere (like making a note to include it in the next release text).

@SunMarc
Copy link
Member Author

SunMarc commented Sep 11, 2023

I've added a few tests ! Let me know if you want to make the change. If we make the change @muellerzr, I will keep remove_non_persistent in the get_named_tensors. But I will remove it for the other functions by setting it to True when offload_buffers is True.

@muellerzr
Copy link
Collaborator

@SunMarc let's go with that, and if users request that we expand the param/trickle it down to lower levels, we can.

@SunMarc
Copy link
Member Author

SunMarc commented Sep 14, 2023

still having an issue with falcon 180B gptq offload. I will put it in draft for now

@SunMarc SunMarc marked this pull request as draft September 14, 2023 15:49
@huggingface huggingface deleted a comment from github-actions bot Oct 18, 2023
@SunMarc SunMarc marked this pull request as ready for review November 16, 2023 22:39
@SunMarc
Copy link
Member Author

SunMarc commented Nov 16, 2023

After digging a little bit into it, I think we should keep remove_non_persistent arg for named_module_tensors because I need both cases. Plus, this way we don't break BC. I only set it to True in offloaded modules. Also, I don't expand the param to dispatch_model or cpu_offload as I don't think anyone will change the behavior. The diffusers team had an issue about that and this PR should fix it.

cc @muellerzr @BenjaminBossan @sayakpaul

@sayakpaul
Copy link
Member

Thanks for tagging me in. From the diffusers side the following piece of code was breaking:

from diffusers import PixArtAlphaPipeline
import torch

def bytes_to_giga_bytes(bytes):
    return bytes / 1024 / 1024 / 1024

pipe = PixArtAlphaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-XL-2-1024-MS", 
    torch_dtype=torch.float16
)
pipe.enable_sequential_cpu_offload()

images = pipe("hey", num_images_per_prompt=4).images[0]
print(
    f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB"
)

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this issue and adding a couple of tests to check it.

I'm a bit concerned that we have to rely on a private attribute _non_persistent_buffers_set but AFAICT, it's the only place that information is stored, so there is no way around it, and it seems to exist since many PyTorch releases, which hopefully means that it's safe.

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great fix! However one big issue with the logic here we need to address :)

src/accelerate/utils/modeling.py Show resolved Hide resolved
@SunMarc SunMarc merged commit 35b0206 into huggingface:main Nov 20, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants