You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when doing full parameter tuning (peft_config=None), the model can be fine-tuned and ckpt can be saved, however, the ckpt cannot be re-load back.
Root cause has been discovered with offline discussions, and most specifically:
PeftSavingCallback is not necessary for saving a full-parameter-tuning model. This callback is designed to separately save an adapter-only model, and the rationale behind it was that Trainer by itself will save everything in the root folder but for PEFT we want a clean adapter only folder, thus we duplicately save some of them in a separate place.
there is some corner case bug in the model saving with safe tensor if saving with save_pretrained directly as it will drop some shared tensors (e.g. lm_head) during saving. The better way to do it would be save_pretrained(..., state_dict=state_dict) by explicitly passing the full state dict, same as how Trainer natively save it.
So... due to 2, the ckpt is missing some tensors thus cannot load back, and due to 1, this callback-generated broken ckpt is either being used or overwrites original ckpt.
Solution would be revise the callback to skip doing anything on full parameter tuning.
The text was updated successfully, but these errors were encountered:
Currently when doing full parameter tuning (
peft_config=None
), the model can be fine-tuned and ckpt can be saved, however, the ckpt cannot be re-load back.Root cause has been discovered with offline discussions, and most specifically:
PeftSavingCallback
is not necessary for saving a full-parameter-tuning model. This callback is designed to separately save an adapter-only model, and the rationale behind it was that Trainer by itself will save everything in the root folder but for PEFT we want a clean adapter only folder, thus we duplicately save some of them in a separate place.save_pretrained
directly as it will drop some shared tensors (e.g. lm_head) during saving. The better way to do it would besave_pretrained(..., state_dict=state_dict)
by explicitly passing the full state dict, same as how Trainer natively save it.So... due to 2, the ckpt is missing some tensors thus cannot load back, and due to 1, this callback-generated broken ckpt is either being used or overwrites original ckpt.
Solution would be revise the callback to skip doing anything on full parameter tuning.
The text was updated successfully, but these errors were encountered: