Skip to content

Add deepspeed experiment#795

Merged
vwxyzjn merged 5 commits intohuggingface:mainfrom
vwxyzjn:more-benchmark
Sep 20, 2023
Merged

Add deepspeed experiment#795
vwxyzjn merged 5 commits intohuggingface:mainfrom
vwxyzjn:more-benchmark

Conversation

@vwxyzjn
Copy link
Copy Markdown
Contributor

@vwxyzjn vwxyzjn commented Sep 19, 2023

No description provided.

@vwxyzjn
Copy link
Copy Markdown
Contributor Author

vwxyzjn commented Sep 19, 2023

/benchmark-trl-experiments benchmark/benchmark_level1.sh

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark on Comment: succeeded ✅
https://github.com/huggingface/trl/actions/runs/6239403115

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

HuggingFaceDocBuilderDev commented Sep 19, 2023

The documentation is not available anymore as the PR was closed or merged.

@vwxyzjn
Copy link
Copy Markdown
Contributor Author

vwxyzjn commented Sep 19, 2023

/benchmark-trl-experiments benchmark/benchmark_level2.sh

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark on Comment: succeeded ✅
https://github.com/huggingface/trl/actions/runs/6239487870

@vwxyzjn
Copy link
Copy Markdown
Contributor Author

vwxyzjn commented Sep 19, 2023

/benchmark-trl-experiments benchmark/benchmark_level1.sh

@vwxyzjn
Copy link
Copy Markdown
Contributor Author

vwxyzjn commented Sep 19, 2023

/benchmark-trl-experiments benchmark/benchmark_level2.sh

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark on Comment: succeeded ✅
https://github.com/huggingface/trl/actions/runs/6239645721

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark on Comment: succeeded ✅
https://github.com/huggingface/trl/actions/runs/6239646225

@vwxyzjn
Copy link
Copy Markdown
Contributor Author

vwxyzjn commented Sep 19, 2023

[COSTA BENCHMARK BOT]: Here are the results
different_models.png
different_models-time.png

@vwxyzjn
Copy link
Copy Markdown
Contributor Author

vwxyzjn commented Sep 19, 2023

[COSTA BENCHMARK BOT]: Here are the results
different_models.png
deepspeed-time.png
different_models-time.png
deepspeed.png

@vwxyzjn
Copy link
Copy Markdown
Contributor Author

vwxyzjn commented Sep 20, 2023

Cerebras results are expected — it's training against a random reward model, so it's reward learning curve should be more chaotic.

@vwxyzjn vwxyzjn requested a review from lewtun September 20, 2023 13:16
Copy link
Copy Markdown
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding this sweet benchmark 🚀 ! I left a comment about adding a benchmark for ZeRO-3 but that can also be a separate PR if you prefer

Comment thread benchmark/benchmark_level2.sh Outdated
@@ -1,4 +1,4 @@
# compound
# compound: gpt2xl + grad_accu
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding, is this compound arg documented somewhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compound comment simply means we are using more features at once (e.g., in this case, we are using a larger model and gradiant accumulation at the same time) :)


# compound: Cerebras-GPT-6.7B + deepspeed zero2 + grad_accu
python benchmark/benchmark.py \
--command "accelerate launch --config_file examples/accelerate_configs/deepspeed_zero2.yaml examples/scripts/sentiment_tuning.py --ppo_config.exp_name sentiment_tuning_Cerebras-GPT-6.7B_grad_accu_deepspeed_stage2 --ppo_config.batch_size 32 --ppo_config.mini_batch_size 32 --ppo_config.log_with wandb --ppo_config.model_name cerebras/Cerebras-GPT-6.7B --ppo_config.reward_model sentiment-analysis:cerebras/Cerebras-GPT-6.7B" \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually I think we should do the "proper" thing and fine-tune these models on IMDB so we have a genuine good policy / reward model. Of course, not necessary for this PR, but perhaps good to be as realistic as possible for the benchmark

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds good. Perhaps we can set up an end-to-end example where we train the reward model and then the policy model at the same time.

Comment thread benchmark/benchmark_level2.sh Outdated
--slurm-template-path benchmark/trl.slurm_template No newline at end of file
--slurm-template-path benchmark/trl.slurm_template

# compound: Cerebras-GPT-6.7B + deepspeed zero2 + grad_accu
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also benchmark ZeRO-3?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's probably do this in a separate PR.

@vwxyzjn vwxyzjn merged commit b8f0c4c into huggingface:main Sep 20, 2023
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
* Add deepspeed experiment

* add deepspeed pip install

* update hello world.sh

* update comments

* remove cleanup
yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025
* Add deepspeed experiment

* add deepspeed pip install

* update hello world.sh

* update comments

* remove cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants