Fix DeepSpeed ZeRO-3 in PPOTrainer by lewtun · Pull Request #730 · huggingface/trl

lewtun · 2023-09-03T19:07:45Z

This PR adds ZeRO-3 support for the PPOTrainer by ensuring that the active and reference model weights are sharded in the same manner. I ran a few sentiment tuning tests with GPT-2 and find that the general trend of the mean reward is similar both with / without ZeRO-3 and the KL divergence is 0 at step 0 (as it should be):

I've also tested that this works with larger models like llama-2-7b and it does (modulo a very small diff in the KL divergence at step 0 which is likely tied to needing bfloat16).

There are probably a few more optimisations one can do with the DeepSpeed config, but this seems like a good start for now.

I've also added some accelerate configs so it's a bit easier for people to run the examples.

Closes #600

Script commands for testing

# Baseline - no DeepSpeed
accelerate launch --config_file=examples/accelerate_configs/multi_gpu.yaml examples/scripts/sentiment_tuning.py --batch_size 32 --mini_batch_size 32 --log
_with wandb

# ZeRO-{1,2,3}
accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero{1,2,3}.yaml examples/scripts/sentiment_tuning.py --batch_size 32 --mini_batch_size 32 --log_with wandb

TODO

Make sure we get parity without DeepSpeed on sentiment tuning
Validate with large models
Test it works with offloading
Decide whether setting model.train() should be unique to ZeRO-3 or not in train loop
Add accelerate configs

HuggingFaceDocBuilderDev · 2023-09-03T19:13:58Z

The documentation is not available anymore as the PR was closed or merged.

lewtun · 2023-09-04T07:07:09Z

        all_masks = []
        all_values = []

+        model.eval()


Here I'm following the same logic of transformers.Trainer to put the model in eval model during inference - this is needed to ensure the KL divergence is 0 at step 0 with ZeRO-3

lewtun · 2023-09-04T07:09:16Z

                text.append(" ")
        print(text)
+
+    def _prepare_deepspeed_zero3(self, model):


Should we move this to separate utils function that can also be used for e.g. sharding the reward model? In that case, the function signature would be something like _prepare_deepspeed_zero3(model, accelerator)

lewtun · 2023-09-04T07:09:44Z

+
+    def _prepare_deepspeed_zero3(self, model):
+        # Adapted from accelerate: https://github.com/huggingface/accelerate/blob/739b135f8367becb67ffaada12fe76e3aa60fefd/src/accelerate/accelerator.py#L1473
+        # TODO: figure out if any other parameters are needed for inference


The kwargs below are a best guess for what's minimally needed - we can tune them later if needed IMO

lewtun · 2023-09-04T09:22:32Z

-
-            # this hack seems to be needed for DS stage 3 to work
-            if self.accelerator.state.deepspeed_plugin.zero_stage == 3:
-                self.model.train()


This has been moved to the training loop where I think it should be done for all models (including DeepSpeed ones)

lewtun · 2023-09-04T09:22:51Z

            train_stats (dict[str, `torch.Tensor`]):
                Dictionary of training statistics
        """
+        self.model.train()


Ditto as above

lewtun · 2023-09-04T11:00:32Z

I've added these configs to make it easier for users to run DeepSpeed in various settings (and also for dev testing)

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

lvwerra

Looks good to me! It would be nice to add some info to the docs as well and link to the configs. E.g. here would be a good place:

https://huggingface.co/docs/trl/customization

younesbelkada

Thanks a lot! @lewtun
I see you have already added instructions on how to use the yaml files directly with accelerate, I agree with @lvwerra we can also add it in an appropriate section in the documentation, no big deal though we can also do it in a follow up PR (to unlock the DS-zero 3 to users)

younesbelkada · 2023-09-04T17:05:47Z

+- `sentiment_tuning.py`: This script shows how to use the `PPOTrainer` to fine-tune a sentiment analysis model using IMDB dataset
+- `multi_adapter_rl.py`: This script shows how to use the `PPOTrainer` to train a single base model with multiple adapters. This scripts requires you to run the example script with the reward model training beforehand.
 - `stable_diffusion_tuning_example.py`: This script shows to use DDPOTrainer to fine-tune a stable diffusion model using reinforcement learning.
+


Very nice!!

lewtun · 2023-09-05T07:37:38Z

Good idea about adding a section to the docs, done in 8a48d12

I'll merge once the CI is green

* Initialise ref model with ZeRO-3 * Fix deadlock * Refactor & fix KL div * Refactor * Refactor * Fix imports * Add types * Add accelerate configs * Add more DeepSpeed configs * Fix types * Disable debug * Refactor * Add docs * Disable eval mode for peft * Restore eval mode * Revert ref model prep for peft * Update examples/scripts/README.md Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> * Add docs --------- Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

andrew-zm-ml · 2023-09-27T16:24:11Z

@lewtun With these changes I can now get past the old RuntimeError: 'weight' must be 2-D issue, but training fails shortly thereafter with the following assertion error:

AssertionError: {'id': 163, 'status': 'NOT_AVAILABLE', 'numel': 0, 'ds_numel': 0, 'shape': (0,), 'ds_shape': (0,), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {207}, 'ds_tensor.shape': torch.Size([0])}

This stems from this statement

assert param.ds_status == ZeroParamStatus.AVAILABLE, param.ds_summary()

in /deepspeed/runtime/zero/partitioned_param_coordinator.py.

The first forward pass through the regular model succeeds, but this error occurs when we try to run a forward pass through the reference model in ppo_trainer::step:

        with torch.no_grad():
            all_logprobs, logits_or_none, values, masks = self.batched_forward_pass(
                self.model,
                queries,
                responses,
                model_inputs,
                response_masks=response_masks,
                return_logits=full_kl_penalty,
            )
            # for when the model is a peft model
            if self.is_peft_model and hasattr(
                self.accelerator.unwrap_model(self.model).pretrained_model,
                "disable_adapter",
            ):
                with self.accelerator.unwrap_model(self.model).pretrained_model.disable_adapter():
                    ref_logprobs, ref_logits_or_none, _, _ = self.batched_forward_pass(
                        self.model, queries, responses, model_inputs, return_logits=full_kl_penalty
                    )
            elif self.is_peft_model and not hasattr(self.model.pretrained_model, "disable_adapter"):
                raise ValueError(
                    "You are using a `peft` version that does not support `disable_adapter`. Please update your `peft` version to the latest version."
                )

            else:
                ref_logprobs, ref_logits_or_none, _, _ = self.batched_forward_pass( # <--------- * HERE *
                    self.ref_model, queries, responses, model_inputs, return_logits=full_kl_penalty
                )

I initially thought this was a separate problem (there's a discussion about a very similar error in DeepSpeed deepspeedai/DeepSpeed#4229), but the suggested fix does not work in this situation, and the fact that this occurs when trying to use the reference model and that the tensor in this error has shape (0,) makes me wonder if it's actually related to the original 'weight' must be 2-D issue` issue.

Could you share the LLaMA run you mention in the description of this PR? Thank you!

For reference, I'm using

deepspeed==0.10.3
transformers==4.31.0
accelerate==0.23.0

* Initialise ref model with ZeRO-3 * Fix deadlock * Refactor & fix KL div * Refactor * Refactor * Fix imports * Add types * Add accelerate configs * Add more DeepSpeed configs * Fix types * Disable debug * Refactor * Add docs * Disable eval mode for peft * Restore eval mode * Revert ref model prep for peft * Update examples/scripts/README.md Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> * Add docs --------- Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

lewtun added 4 commits September 1, 2023 19:52

Initialise ref model with ZeRO-3

89c146c

Fix deadlock

6919137

Refactor & fix KL div

ca8eaa1

Refactor

aa8a37f

Refactor

a2f9998

lewtun commented Sep 3, 2023

View reviewed changes

Comment thread trl/trainer/ppo_trainer.py Outdated

Fix imports

ee387e0

lewtun commented Sep 4, 2023

View reviewed changes

Comment thread trl/trainer/ppo_trainer.py Outdated

lewtun commented Sep 4, 2023

View reviewed changes

lewtun added 8 commits September 4, 2023 07:31

Add types

e0db5a1

Add accelerate configs

bfd0e46

Add more DeepSpeed configs

a7f43dd

Fix types

453e4b1

Disable debug

8c6669b

Refactor

40c5964

Add docs

19133eb

Disable eval mode for peft

c36d1ef

lewtun changed the title ~~[WIP] Fix DeepSpeed ZeRO-3 in PPOTrainer~~ Fix DeepSpeed ZeRO-3 in PPOTrainer Sep 4, 2023

lewtun requested review from lvwerra and younesbelkada September 4, 2023 09:16

lewtun marked this pull request as ready for review September 4, 2023 09:16

lewtun commented Sep 4, 2023

View reviewed changes

lewtun added 2 commits September 4, 2023 09:31

Restore eval mode

104c28b

Revert ref model prep for peft

08ee536

lewtun commented Sep 4, 2023

View reviewed changes

philschmid reviewed Sep 4, 2023

View reviewed changes

Comment thread examples/scripts/README.md Outdated

Update examples/scripts/README.md

f6f02e0

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

lvwerra approved these changes Sep 4, 2023

View reviewed changes

younesbelkada approved these changes Sep 4, 2023

View reviewed changes

Add docs

8a48d12

lewtun merged commit c04074e into main Sep 5, 2023

lewtun deleted the fix-ppo-ds3 branch September 5, 2023 09:00

Conversation

lewtun commented Sep 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Script commands for testing

TODO

Uh oh!

HuggingFaceDocBuilderDev commented Sep 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewtun Sep 4, 2023

Choose a reason for hiding this comment

Uh oh!

lewtun Sep 4, 2023

Choose a reason for hiding this comment

Uh oh!

lewtun Sep 4, 2023

Choose a reason for hiding this comment

Uh oh!

lewtun Sep 4, 2023

Choose a reason for hiding this comment

Uh oh!

lewtun Sep 4, 2023

Choose a reason for hiding this comment

Uh oh!

lewtun Sep 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lvwerra left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada Sep 4, 2023

Choose a reason for hiding this comment

Uh oh!

lewtun commented Sep 5, 2023

Uh oh!

andrew-zm-ml commented Sep 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lewtun commented Sep 3, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 3, 2023 •

edited

Loading

lewtun Sep 4, 2023 •

edited

Loading

andrew-zm-ml commented Sep 27, 2023 •

edited

Loading