Skip to content

[examples] Big refactor of examples and documentation#509

Merged
younesbelkada merged 17 commits intomainfrom
refactor-examples
Jul 14, 2023
Merged

[examples] Big refactor of examples and documentation#509
younesbelkada merged 17 commits intomainfrom
refactor-examples

Conversation

@younesbelkada
Copy link
Copy Markdown
Contributor

@younesbelkada younesbelkada commented Jul 10, 2023

This PR is an attempt to refactor most of the examples and adapt the documentation accordingly

Added also a small enhancement for the reward trainer to directly call prepare_model_for_int8_training method under the hood.

TODOs:

Examples:

  • Add a script for SFTTrainer
  • Modify the docs accordingly for SFTTrainer
  • Add a script for RMTrainer
  • Modify the docs accordingly for RewardTrainer
  • Add one or two scripts for PPOTrainer
  • Modify the docs accordingly for PPOTrainer
  • Try to reproduce stack llama only using the new scripts - let's move all research projects inside a dedicated folder
  • Deal with examples/summarization folder

Removing all scripts related to merging adapter weights since in PEFT you can simply do:

model = model.merge_and_unload()

Documentation

To discuss

cc @lvwerra @vwxyzjn

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

HuggingFaceDocBuilderDev commented Jul 10, 2023

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada younesbelkada marked this pull request as ready for review July 11, 2023 09:19
@younesbelkada younesbelkada requested review from lvwerra and vwxyzjn July 11, 2023 09:19
Copy link
Copy Markdown
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks really great! Left a few small comments and two main points:

Doc structure

What do you think about simplifying/renaming the sections a bit in the docs:

  • API

    • Model Classes
    • Trainer Classes
    • Reward Model Training
    • Supervised Fine-Tuning
    • Best-of-N Samppling
  • Examples

    • Sentiment Tuning
    • Training with PEFT
    • Detoxifying LLMs
    • StackLlama
    • Multi-Adapter Training

I think we could also do a better job at giving an overview of the docs in the landing page on the docs.

Examples structure

For consistency I would call the reward examples folder reward_trainer. Having said that I wonder if we shouldn't further simplify the examples only to have these folders:

  • research_projects
  • notebooks
  • scripts
    Most of the folders only have 1-2 scripts anyway.


if is_peft_available():
from peft import PeftModel, get_peft_model
from peft import PeftModel, get_peft_model, prepare_model_for_int8_training
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about int4?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for both, currently in PEFT there is a method called prepare_model_for_kbit_training, but we did not made a release yet (they're the same method, just changed the name).

Comment thread examples/reward_modeling/train_reward_model.py Outdated
@younesbelkada younesbelkada requested a review from lvwerra July 13, 2023 15:59
Copy link
Copy Markdown
Contributor Author

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @lvwerra for the extensive review, should be all good now!

Copy link
Copy Markdown
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I think we can merge soon! A few small nits:

Short Sections Names

It would be really nice if we could make all the section heads single line in the sidebar. Suggestions:

  • Training your own reward model -> Reward Model Training
  • Best of N sampling - Better model output without reinforcement learning -> Best of N Sampling
  • Multi Adapter RL (MARL) - a single base model for everything -> Multi Adapter RLHF or Multi Adapter RL?
  • Using Llama with TRL -> Training StackLlama

Section Capitalisation

Also we should be consistent in capitalisation. I am in favour of capitalising the sections heads. E.g. Reward Model Training rather than Reward model training. What do you think?

SFTTrainer

It seems to be missing from the Trainer Classes docs page.

Comment thread docs/source/index.mdx
Comment on lines +16 to +17
API documentation:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding 1 sentence here per bullet point for people who don't know e.g. what an SFTTrainer could do or what a StackLlama is :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

@younesbelkada
Copy link
Copy Markdown
Contributor Author

Looks much cleaner now ! Addressed your comment @lvwerra !

@younesbelkada younesbelkada requested a review from lvwerra July 14, 2023 09:58
@younesbelkada younesbelkada merged commit 5c7bfbc into main Jul 14, 2023
@younesbelkada younesbelkada deleted the refactor-examples branch July 14, 2023 10:00
yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025
)

* added sfttrainer and rmtrainer example scripts.

* added few lines in the documentation.

* moved notebooks.

* delete `examples/summarization`

* remove from docs as well

* refactor sentiment tuning

* more refactoring.

* updated docs for multi-adapter RL.

* add research projects folder

* more refactor

* refactor docs.

* refactor structure

* add correct scripts all over the place

* final touches

* final touches

* updated documentation from feedback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants