Fix ZeRO-3 generation context manager #1617
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR incorporates a fix from @pacman100 to enable the
unwrap_model_for_generation()
context manager to work with other trainers like DPO.The basic issue with our original implementation is that we weren't removing all active parameters during the hook removal and this led to DeepSpeed errors like:
Tested with this gist: https://gist.github.com/pacman100/a6c89a681f8f76bdf17f5bf6874eb983
cc @edbeeching you should rebase on
main
once this is merged so #1605 will workCloses #1543