Conversation
|
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
|
Benchmark on Comment: succeeded ✅ |
|
The documentation is not available anymore as the PR was closed or merged. |
|
/benchmark-trl-experiments benchmark/benchmark_level2.sh |
|
Benchmark on Comment: succeeded ✅ |
|
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
|
/benchmark-trl-experiments benchmark/benchmark_level2.sh |
|
Benchmark on Comment: succeeded ✅ |
|
Benchmark on Comment: succeeded ✅ |
|
Cerebras results are expected — it's training against a random reward model, so it's reward learning curve should be more chaotic. |
lewtun
left a comment
There was a problem hiding this comment.
Thanks a lot for adding this sweet benchmark 🚀 ! I left a comment about adding a benchmark for ZeRO-3 but that can also be a separate PR if you prefer
| @@ -1,4 +1,4 @@ | |||
| # compound | |||
| # compound: gpt2xl + grad_accu | |||
There was a problem hiding this comment.
For my own understanding, is this compound arg documented somewhere?
There was a problem hiding this comment.
The compound comment simply means we are using more features at once (e.g., in this case, we are using a larger model and gradiant accumulation at the same time) :)
|
|
||
| # compound: Cerebras-GPT-6.7B + deepspeed zero2 + grad_accu | ||
| python benchmark/benchmark.py \ | ||
| --command "accelerate launch --config_file examples/accelerate_configs/deepspeed_zero2.yaml examples/scripts/sentiment_tuning.py --ppo_config.exp_name sentiment_tuning_Cerebras-GPT-6.7B_grad_accu_deepspeed_stage2 --ppo_config.batch_size 32 --ppo_config.mini_batch_size 32 --ppo_config.log_with wandb --ppo_config.model_name cerebras/Cerebras-GPT-6.7B --ppo_config.reward_model sentiment-analysis:cerebras/Cerebras-GPT-6.7B" \ |
There was a problem hiding this comment.
Eventually I think we should do the "proper" thing and fine-tune these models on IMDB so we have a genuine good policy / reward model. Of course, not necessary for this PR, but perhaps good to be as realistic as possible for the benchmark
There was a problem hiding this comment.
I think that sounds good. Perhaps we can set up an end-to-end example where we train the reward model and then the policy model at the same time.
| --slurm-template-path benchmark/trl.slurm_template No newline at end of file | ||
| --slurm-template-path benchmark/trl.slurm_template | ||
|
|
||
| # compound: Cerebras-GPT-6.7B + deepspeed zero2 + grad_accu |
There was a problem hiding this comment.
Should we also benchmark ZeRO-3?
There was a problem hiding this comment.
Let's probably do this in a separate PR.
* Add deepspeed experiment * add deepspeed pip install * update hello world.sh * update comments * remove cleanup
* Add deepspeed experiment * add deepspeed pip install * update hello world.sh * update comments * remove cleanup




No description provided.