[examples] Big refactor of examples and documentation#509
[examples] Big refactor of examples and documentation#509younesbelkada merged 17 commits intomainfrom
examples] Big refactor of examples and documentation#509Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
lvwerra
left a comment
There was a problem hiding this comment.
Overall, looks really great! Left a few small comments and two main points:
Doc structure
What do you think about simplifying/renaming the sections a bit in the docs:
-
API
- Model Classes
- Trainer Classes
- Reward Model Training
- Supervised Fine-Tuning
- Best-of-N Samppling
-
Examples
- Sentiment Tuning
- Training with PEFT
- Detoxifying LLMs
- StackLlama
- Multi-Adapter Training
I think we could also do a better job at giving an overview of the docs in the landing page on the docs.
Examples structure
For consistency I would call the reward examples folder reward_trainer. Having said that I wonder if we shouldn't further simplify the examples only to have these folders:
- research_projects
- notebooks
- scripts
Most of the folders only have 1-2 scripts anyway.
|
|
||
| if is_peft_available(): | ||
| from peft import PeftModel, get_peft_model | ||
| from peft import PeftModel, get_peft_model, prepare_model_for_int8_training |
There was a problem hiding this comment.
It works for both, currently in PEFT there is a method called prepare_model_for_kbit_training, but we did not made a release yet (they're the same method, just changed the name).
younesbelkada
left a comment
There was a problem hiding this comment.
Thank @lvwerra for the extensive review, should be all good now!
lvwerra
left a comment
There was a problem hiding this comment.
Looks great, I think we can merge soon! A few small nits:
Short Sections Names
It would be really nice if we could make all the section heads single line in the sidebar. Suggestions:
- Training your own reward model -> Reward Model Training
- Best of N sampling - Better model output without reinforcement learning -> Best of N Sampling
- Multi Adapter RL (MARL) - a single base model for everything -> Multi Adapter RLHF or Multi Adapter RL?
- Using Llama with TRL -> Training StackLlama
Section Capitalisation
Also we should be consistent in capitalisation. I am in favour of capitalising the sections heads. E.g. Reward Model Training rather than Reward model training. What do you think?
SFTTrainer
It seems to be missing from the Trainer Classes docs page.
| API documentation: | ||
|
|
There was a problem hiding this comment.
What do you think about adding 1 sentence here per bullet point for people who don't know e.g. what an SFTTrainer could do or what a StackLlama is :)
|
Looks much cleaner now ! Addressed your comment @lvwerra ! |
) * added sfttrainer and rmtrainer example scripts. * added few lines in the documentation. * moved notebooks. * delete `examples/summarization` * remove from docs as well * refactor sentiment tuning * more refactoring. * updated docs for multi-adapter RL. * add research projects folder * more refactor * refactor docs. * refactor structure * add correct scripts all over the place * final touches * final touches * updated documentation from feedback.
This PR is an attempt to refactor most of the examples and adapt the documentation accordingly
Added also a small enhancement for the reward trainer to directly call
prepare_model_for_int8_trainingmethod under the hood.TODOs:
Examples:
examples/summarizationfolderRemoving all scripts related to merging adapter weights since in PEFT you can simply do:
Documentation
To discuss
cc @lvwerra @vwxyzjn