[REQUEST] Add documentation on how to run fast inference of transformers
models with ZeRO-3
#5498
Labels
enhancement
New feature or request
Is your feature request related to a problem? Please describe.
Hello DeepSpeed team, while looking at how to accelerate text generation in TRL with ZeRO-3, we learned from @pacman100 that the most efficient method is to remove/add hooks within a context manager as follows:
This works well for inference, but during DPO training, we hit a rather cryptic error only when gradient accumulation steps > 1:
The solution that @pacman100 found is that one needs to carefully remove all active parameters during the hook removal which has led to this fix in TRL: huggingface/trl#1617
Getting to the bottom of this issue was quite tricky and the DeepSpeed documentation unfortunately did not contain any guidance on how to do this. I'm sharing the issue here for broader visibility in case others are trying to speed up ZeRO-3 generation during training.
Describe the solution you'd like
An example in the documentation which shows how to run fast text generation with ZeRO-3 within a training loop. This is very useful for online methods like PPO.
Describe alternatives you've considered
N/A
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: