Add `adapter_only` option to `save_fsdp_model` and `load_fsdp_model` to only save/load PEFT weights #2321

AjayP13 · 2024-01-09T17:03:44Z

What does this PR do?

See related PR in transformers for motivation: Support saving only PEFT adapter in checkpoints when using PEFT + FSDP transformers#28297
Adds the ability to only save/load PEFT weights for save_fsdp_model and load_fsdp_model
The adapter_only parameter (default off) controls whether only the PEFT weights will be saved.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

HuggingFaceDocBuilderDev · 2024-01-09T17:15:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada

Thanks a lot ! I left one comment - I will let @pacman100 give his final opinion on the PR as he is more familiar with FSDP and accelerate

younesbelkada · 2024-01-10T07:56:03Z

src/accelerate/utils/fsdp_utils.py

+def _is_peft_model(model):
+    if is_peft_available():
+        from peft import PeftModel
+    unwrapped_model = getattr(model.module, "_orig_mod", model.module)


do we have an unwrap model utility method in accelerate? If model is not a DDP model.module will fail here no?

I think you meant FSDP, not DDP, but, this should be safe, these functions (save_fsdp_..., load_fsdp_...) are only used with FSDP wrapped models and therefore they will always have .module.

ok perfect then! maybe worht mentioning in a comment that _is_peft_model is only meant to be used for FSDP models, or maybe change the method name to _is_fsdp_peft_model

I think extract_model_from_parallel in accelerate should cover this case if desired

But worth double checking/writing a test :)

Thanks @muellerzr, I've just updated this with the unwrapping utility and re-ran the test we have on the other branch on the multiple-GPU machine and it works.

pacman100

Thank you @AjayP13, the changes in this PR make sense to me. However, they have been only tested with the FULL_STATE_DICT state dict type, would this work with SHARDED_STATE_DICT too? If not, then this logic should only be limited to the cases wherein FULL_STATE_DICT is used.

AjayP13 · 2024-01-18T14:39:59Z

Thank you @AjayP13, the changes in this PR make sense to me. However, they have been only tested with the FULL_STATE_DICT state dict type, would this work with SHARDED_STATE_DICT too? If not, then this logic should only be limited to the cases wherein FULL_STATE_DICT is used.

Thanks for the review @pacman100 , I just tested this, and this works with SHARDED_STATE_DICT as well (similarly the loss drops, sharded state dicts per worker get saved to disk, resuming & load_best both work).

pacman100

Thank you @AjayP13 for enabling only the storage of adapter weights when using PEFT+FSDP, very useful! ✨

muellerzr

Thanks! Overall this seems very helpful, but let's work on the design a bit to adhear to more of our practices. Left some comments and suggestions 😄

muellerzr · 2024-01-19T20:52:55Z

src/accelerate/utils/fsdp_utils.py

+def _is_peft_model(model):
+    if is_peft_available():
+        from peft import PeftModel
+    return is_peft_available() and isinstance(extract_model_from_parallel(model), PeftModel)
+
+
+def _get_model_state_dict(model, adapter_only=False):
+    if adapter_only and _is_peft_model(model):
+        from peft import get_peft_model_state_dict
+
+        return get_peft_model_state_dict(model, adapter_name=model.active_adapter)
+    else:
+        return model.state_dict()
+
+
+def _set_model_state_dict(model, state_dict, adapter_only=False):
+    if adapter_only and _is_peft_model(model):
+        from peft import set_peft_model_state_dict
+
+        return set_peft_model_state_dict(model, state_dict, adapter_name=model.active_adapter)
+    else:
+        return model.load_state_dict(state_dict)


I'm not the biggest fan of this pattern we're working with here.

These should live inside of one function, and currently from what I can tell we have 2 different ways of getting the state_dict inside accelerate now:

Here

Accelerator.get_state_dict

This hints at me that perhaps we need to chunk up Accelerator's get_state_dict into something that can be called elsewhere (and stored in utils.modeling probably).

Same with _set_state_dict as well. While yes it's just FSDP, then we can write a check for that if necessary.

Secondly: we generally don't follow the practice of hidden function names in Accelerate, and everything should be made public unless for extreme circumstances, and I'm not convinced this is one of those.

Can we rewrite this a bit to make it more extensible?

Perhaps just do an if/else for calling set_peft_model_state_dict or not, and just importing them at the top of the file. This way we can avoid this entirely.

@muellerzr See the newest refactor, but I believe this is not possible.

Accelerator.get_state_dict already has logic for FSDP, but it handles a different use case for getting the FSDP state dict. Accelerator.get_state_dict always returns the FULL_STATE_DICT from FSDP. Meanwhile, this file, gets the state dict according to how the user configured they want it to be saved (could be a SHARDED_STATE_DICT). I believe that is why this file originally never called Accelerator.get_state_dict.

As for your suggestion of importing get_peft_model_state_dict at the top v.s. encapsulated in the hidden function, we can't import at the top because accelerate does not have a dependency on peft and importing at the top would throw an error if peft is not available on a users computer. Also, the hidden function _get_model_state_dict is used 3 times in this file. I could get rid of the hidden function and replace with an inline if-statement in those 3 places, but it would introduce 3 places of repeated code that would need to be updated together. Between the repeated code and the need for keeping the peft imports local v.s. top-of-file imports, I think it may make sense to keep the two hidden functions here.

@muellerzr Have you had a chance to look this over? Waiting on this PR to get huggingface/transformers#28297 merged.

src/accelerate/utils/fsdp_utils.py

src/accelerate/utils/imports.py

muellerzr

Thanks, your comments make sense!

AjayP13 added 2 commits January 9, 2024 10:45

Add adapter_only option to save_fsdp_model and load_fsdp_model

2b77fc2

Gate with adapter_only

8fcffc3

AjayP13 mentioned this pull request Jan 9, 2024

Support saving only PEFT adapter in checkpoints when using PEFT + FSDP huggingface/transformers#28297

Merged

5 tasks

muellerzr requested review from pacman100 and younesbelkada January 9, 2024 17:11

AjayP13 marked this pull request as ready for review January 9, 2024 17:44

AjayP13 added 2 commits January 9, 2024 12:49

Black format

d47833e

Change unwrapping behavior

43c3469

younesbelkada reviewed Jan 10, 2024

View reviewed changes

AjayP13 added 2 commits January 10, 2024 08:26

Use extract_model_from_parallel for model unwrapping

c999137

Fix quality

bc1bceb

pacman100 reviewed Jan 18, 2024

View reviewed changes

pacman100 approved these changes Jan 19, 2024

View reviewed changes

muellerzr reviewed Jan 19, 2024

View reviewed changes

AjayP13 added 2 commits January 19, 2024 16:27

Move functions to utils files

028ad1e

Fix quality

9ff5902

muellerzr approved these changes Jan 26, 2024

View reviewed changes

muellerzr merged commit 581fabb into huggingface:main Jan 26, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `adapter_only` option to `save_fsdp_model` and `load_fsdp_model` to only save/load PEFT weights #2321

Add `adapter_only` option to `save_fsdp_model` and `load_fsdp_model` to only save/load PEFT weights #2321

AjayP13 commented Jan 9, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 9, 2024

younesbelkada left a comment

younesbelkada Jan 10, 2024

AjayP13 Jan 10, 2024

younesbelkada Jan 10, 2024

muellerzr Jan 10, 2024

muellerzr Jan 10, 2024

AjayP13 Jan 10, 2024

pacman100 left a comment

AjayP13 commented Jan 18, 2024

pacman100 left a comment

muellerzr left a comment

muellerzr Jan 19, 2024

AjayP13 Jan 19, 2024

AjayP13 Jan 26, 2024

muellerzr left a comment

Add adapter_only option to save_fsdp_model and load_fsdp_model to only save/load PEFT weights #2321

Add adapter_only option to save_fsdp_model and load_fsdp_model to only save/load PEFT weights #2321

Conversation

AjayP13 commented Jan 9, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Jan 9, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

AjayP13 commented Jan 18, 2024

pacman100 left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

Add `adapter_only` option to `save_fsdp_model` and `load_fsdp_model` to only save/load PEFT weights #2321

Add `adapter_only` option to `save_fsdp_model` and `load_fsdp_model` to only save/load PEFT weights #2321

AjayP13 commented Jan 9, 2024 •

edited

Loading