Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
| all_masks = [] | ||
| all_values = [] | ||
|
|
||
| model.eval() |
There was a problem hiding this comment.
Here I'm following the same logic of transformers.Trainer to put the model in eval model during inference - this is needed to ensure the KL divergence is 0 at step 0 with ZeRO-3
| text.append(" ") | ||
| print(text) | ||
|
|
||
| def _prepare_deepspeed_zero3(self, model): |
There was a problem hiding this comment.
Should we move this to separate utils function that can also be used for e.g. sharding the reward model? In that case, the function signature would be something like _prepare_deepspeed_zero3(model, accelerator)
|
|
||
| def _prepare_deepspeed_zero3(self, model): | ||
| # Adapted from accelerate: https://github.com/huggingface/accelerate/blob/739b135f8367becb67ffaada12fe76e3aa60fefd/src/accelerate/accelerator.py#L1473 | ||
| # TODO: figure out if any other parameters are needed for inference |
There was a problem hiding this comment.
The kwargs below are a best guess for what's minimally needed - we can tune them later if needed IMO
|
|
||
| # this hack seems to be needed for DS stage 3 to work | ||
| if self.accelerator.state.deepspeed_plugin.zero_stage == 3: | ||
| self.model.train() |
There was a problem hiding this comment.
This has been moved to the training loop where I think it should be done for all models (including DeepSpeed ones)
| train_stats (dict[str, `torch.Tensor`]): | ||
| Dictionary of training statistics | ||
| """ | ||
| self.model.train() |
There was a problem hiding this comment.
I've added these configs to make it easier for users to run DeepSpeed in various settings (and also for dev testing)
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
lvwerra
left a comment
There was a problem hiding this comment.
Looks good to me! It would be nice to add some info to the docs as well and link to the configs. E.g. here would be a good place:
| - `sentiment_tuning.py`: This script shows how to use the `PPOTrainer` to fine-tune a sentiment analysis model using IMDB dataset | ||
| - `multi_adapter_rl.py`: This script shows how to use the `PPOTrainer` to train a single base model with multiple adapters. This scripts requires you to run the example script with the reward model training beforehand. | ||
| - `stable_diffusion_tuning_example.py`: This script shows to use DDPOTrainer to fine-tune a stable diffusion model using reinforcement learning. | ||
|
|
|
Good idea about adding a section to the docs, done in 8a48d12 I'll merge once the CI is green |
* Initialise ref model with ZeRO-3 * Fix deadlock * Refactor & fix KL div * Refactor * Refactor * Fix imports * Add types * Add accelerate configs * Add more DeepSpeed configs * Fix types * Disable debug * Refactor * Add docs * Disable eval mode for peft * Restore eval mode * Revert ref model prep for peft * Update examples/scripts/README.md Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> * Add docs --------- Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
|
@lewtun With these changes I can now get past the old
This stems from this statement
in The first forward pass through the regular model succeeds, but this error occurs when we try to run a forward pass through the reference model in I initially thought this was a separate problem (there's a discussion about a very similar error in DeepSpeed deepspeedai/DeepSpeed#4229), but the suggested fix does not work in this situation, and the fact that this occurs when trying to use the reference model and that the tensor in this error has shape Could you share the LLaMA run you mention in the description of this PR? Thank you! For reference, I'm using |
* Initialise ref model with ZeRO-3 * Fix deadlock * Refactor & fix KL div * Refactor * Refactor * Fix imports * Add types * Add accelerate configs * Add more DeepSpeed configs * Fix types * Disable debug * Refactor * Add docs * Disable eval mode for peft * Restore eval mode * Revert ref model prep for peft * Update examples/scripts/README.md Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> * Add docs --------- Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
* Initialise ref model with ZeRO-3 * Fix deadlock * Refactor & fix KL div * Refactor * Refactor * Fix imports * Add types * Add accelerate configs * Add more DeepSpeed configs * Fix types * Disable debug * Refactor * Add docs * Disable eval mode for peft * Restore eval mode * Revert ref model prep for peft * Update examples/scripts/README.md Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> * Add docs --------- Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
This PR adds ZeRO-3 support for the
PPOTrainerby ensuring that the active and reference model weights are sharded in the same manner. I ran a few sentiment tuning tests with GPT-2 and find that the general trend of the mean reward is similar both with / without ZeRO-3 and the KL divergence is 0 at step 0 (as it should be):I've also tested that this works with larger models like
llama-2-7band it does (modulo a very small diff in the KL divergence at step 0 which is likely tied to needingbfloat16).There are probably a few more optimisations one can do with the DeepSpeed config, but this seems like a good start for now.
I've also added some
accelerateconfigs so it's a bit easier for people to run the examples.Closes #600
Script commands for testing
TODO
model.train()should be unique to ZeRO-3 or not in train loop