Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
|
Benchmark on Comment: succeeded ✅ |
vwxyzjn
left a comment
There was a problem hiding this comment.
Love the standardization! Very nice change. I assume multi_adapter_rl.py is deprecated in favor of multi_adapter_rl_v2.py (the now run_ppo_multi_adapter.py)?
Yes, that's correct! |
|
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
|
Benchmark on Comment: failed ❌ |
|
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
|
Benchmark on Comment: succeeded ✅ |
lvwerra
left a comment
There was a problem hiding this comment.
Generally looks great, thanks! Small nit: I don't like the run_xxx.py naming that much, I think just xxx.py would do the job and be less redundant.
Good idea! Done in a6d1d90 I'll merge if all the tests still pass |
|
LG! |
* Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com>
* enable xpu support * fix bug * review commits * fix style * add xou decorator * refactor review commit * fix test * review commit * fix test * Update benchmark.yml (#856) * Standardise example scripts (#842) * Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> * Fix version check in import_utils.py (#853) * dont use get_peft_model if model is already peft (#857) * merge conflict * add xou decorator * resolve * resolves * upstream * refactor and precommit * fix new tests * add device mapping for xpu --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Adam Pauls <adpauls@gmail.com> Co-authored-by: abhishek thakur <1183441+abhishekkrthakur@users.noreply.github.com>
* Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com>
* enable xpu support * fix bug * review commits * fix style * add xou decorator * refactor review commit * fix test * review commit * fix test * Update benchmark.yml (huggingface#856) * Standardise example scripts (huggingface#842) * Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> * Fix version check in import_utils.py (huggingface#853) * dont use get_peft_model if model is already peft (huggingface#857) * merge conflict * add xou decorator * resolve * resolves * upstream * refactor and precommit * fix new tests * add device mapping for xpu --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Adam Pauls <adpauls@gmail.com> Co-authored-by: abhishek thakur <1183441+abhishekkrthakur@users.noreply.github.com>
* Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com>
* enable xpu support * fix bug * review commits * fix style * add xou decorator * refactor review commit * fix test * review commit * fix test * Update benchmark.yml (huggingface#856) * Standardise example scripts (huggingface#842) * Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> * Fix version check in import_utils.py (huggingface#853) * dont use get_peft_model if model is already peft (huggingface#857) * merge conflict * add xou decorator * resolve * resolves * upstream * refactor and precommit * fix new tests * add device mapping for xpu --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Adam Pauls <adpauls@gmail.com> Co-authored-by: abhishek thakur <1183441+abhishekkrthakur@users.noreply.github.com>


This PR standardises all the example scripts to follow the
run_xxx.pyconvention, wherexxxtypically refers to the algorithm instead of the task (i.e. have just 1 PPO example instead of calling it "sentiment tuning"). The resulting structure is as follows:IMO this makes it a bit easier for newcomers to know what each script does by filename instead of guessing whether e.g. multi adapter RL refers to PPO or something else.
I also deleted an old and duplicate multi adapter RL script
multi_adapter_rl.pywhich seems to be outdated.Eventually, we could harmonize the scripts so that the SFT and reward models produced by
run_sft.pyandrun_reward_modeling.pyare the same ones that feed intorun_ppo.pyandrun_dpo.py. This would give a true end to end pipeline that is maintained & solid for many people to work from :)