[`examples`] Big refactor of examples and documentation by younesbelkada · Pull Request #509 · huggingface/trl

younesbelkada · 2023-07-10T15:23:11Z

This PR is an attempt to refactor most of the examples and adapt the documentation accordingly

Added also a small enhancement for the reward trainer to directly call prepare_model_for_int8_training method under the hood.

TODOs:

Examples:

Add a script for SFTTrainer
Modify the docs accordingly for SFTTrainer
Add a script for RMTrainer
Modify the docs accordingly for RewardTrainer
Add one or two scripts for PPOTrainer
Modify the docs accordingly for PPOTrainer
Try to reproduce stack llama only using the new scripts - let's move all research projects inside a dedicated folder
Deal with examples/summarization folder

Removing all scripts related to merging adapter weights since in PEFT you can simply do:

model = model.merge_and_unload()

Documentation

To discuss

cc @lvwerra @vwxyzjn

HuggingFaceDocBuilderDev · 2023-07-10T15:28:08Z

The documentation is not available anymore as the PR was closed or merged.

…xamples

lvwerra

Overall, looks really great! Left a few small comments and two main points:

Doc structure

What do you think about simplifying/renaming the sections a bit in the docs:

API
- Model Classes
- Trainer Classes
- Reward Model Training
- Supervised Fine-Tuning
- Best-of-N Samppling
Examples
- Sentiment Tuning
- Training with PEFT
- Detoxifying LLMs
- StackLlama
- Multi-Adapter Training

I think we could also do a better job at giving an overview of the docs in the landing page on the docs.

Examples structure

For consistency I would call the reward examples folder reward_trainer. Having said that I wonder if we shouldn't further simplify the examples only to have these folders:

research_projects
notebooks
scripts
Most of the folders only have 1-2 scripts anyway.

lvwerra · 2023-07-13T12:05:48Z


 if is_peft_available():
-    from peft import PeftModel, get_peft_model
+    from peft import PeftModel, get_peft_model, prepare_model_for_int8_training


what about int4?

It works for both, currently in PEFT there is a method called prepare_model_for_kbit_training, but we did not made a release yet (they're the same method, just changed the name).

younesbelkada

Thank @lvwerra for the extensive review, should be all good now!

lvwerra

Looks great, I think we can merge soon! A few small nits:

Short Sections Names

It would be really nice if we could make all the section heads single line in the sidebar. Suggestions:

Training your own reward model -> Reward Model Training
Best of N sampling - Better model output without reinforcement learning -> Best of N Sampling
Multi Adapter RL (MARL) - a single base model for everything -> Multi Adapter RLHF or Multi Adapter RL?
Using Llama with TRL -> Training StackLlama

Section Capitalisation

Also we should be consistent in capitalisation. I am in favour of capitalising the sections heads. E.g. Reward Model Training rather than Reward model training. What do you think?

SFTTrainer

It seems to be missing from the Trainer Classes docs page.

lvwerra · 2023-07-14T09:28:12Z

+API documentation:
+


What do you think about adding 1 sentence here per bullet point for people who don't know e.g. what an SFTTrainer could do or what a StackLlama is :)

Makes sense!

younesbelkada · 2023-07-14T09:58:26Z

Looks much cleaner now ! Addressed your comment @lvwerra !

) * added sfttrainer and rmtrainer example scripts. * added few lines in the documentation. * moved notebooks. * delete `examples/summarization` * remove from docs as well * refactor sentiment tuning * more refactoring. * updated docs for multi-adapter RL. * add research projects folder * more refactor * refactor docs. * refactor structure * add correct scripts all over the place * final touches * final touches * updated documentation from feedback.

added sfttrainer and rmtrainer example scripts.

de948f5

younesbelkada added 10 commits July 10, 2023 15:29

added few lines in the documentation.

3038a18

moved notebooks.

3b1701f

delete examples/summarization

2aab17a

remove from docs as well

27c1d22

refactor sentiment tuning

e928a0f

Merge branch 'main' of https://github.com/lvwerra/trl into refactor-e…

75a47c2

…xamples

more refactoring.

6f1e6e6

updated docs for multi-adapter RL.

fd5c585

add research projects folder

73b8df1

more refactor

a443aa7

younesbelkada marked this pull request as ready for review July 11, 2023 09:19

younesbelkada requested review from lvwerra and vwxyzjn July 11, 2023 09:19

lvwerra reviewed Jul 13, 2023

View reviewed changes

younesbelkada added 5 commits July 13, 2023 15:42

refactor docs.

1eef64c

refactor structure

d94ddc8

add correct scripts all over the place

b670543

final touches

4be447e

final touches

7a0c0c3

younesbelkada requested a review from lvwerra July 13, 2023 15:59

younesbelkada commented Jul 13, 2023

View reviewed changes

lvwerra reviewed Jul 14, 2023

View reviewed changes

updated documentation from feedback.

350d5d3

younesbelkada requested a review from lvwerra July 14, 2023 09:58

lvwerra approved these changes Jul 14, 2023

View reviewed changes

younesbelkada merged commit 5c7bfbc into main Jul 14, 2023

younesbelkada deleted the refactor-examples branch July 14, 2023 10:00

edbeeching mentioned this pull request Jul 25, 2023

Add score scaling/normalization/clipping #560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`examples`] Big refactor of examples and documentation#509

[`examples`] Big refactor of examples and documentation#509
younesbelkada merged 17 commits intomainfrom
refactor-examples

younesbelkada commented Jul 10, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 10, 2023 •

edited

Loading

Uh oh!

lvwerra left a comment

Uh oh!

lvwerra Jul 13, 2023

Uh oh!

younesbelkada Jul 13, 2023

Uh oh!

Uh oh!

younesbelkada left a comment

Uh oh!

lvwerra left a comment

Uh oh!

lvwerra Jul 14, 2023

Uh oh!

younesbelkada Jul 14, 2023

Uh oh!

younesbelkada commented Jul 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		API documentation:

Conversation

younesbelkada commented Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODOs:

Examples:

Documentation

Uh oh!

HuggingFaceDocBuilderDev commented Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lvwerra left a comment

Choose a reason for hiding this comment

Doc structure

Examples structure

Uh oh!

lvwerra Jul 13, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Jul 13, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

lvwerra left a comment

Choose a reason for hiding this comment

Short Sections Names

Section Capitalisation

SFTTrainer

Uh oh!

lvwerra Jul 14, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Jul 14, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Jul 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

younesbelkada commented Jul 10, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 10, 2023 •

edited

Loading