[Doc] Improve docs #91

younesbelkada · 2023-01-17T09:44:38Z

Add the following on the documentation

API

Model classes (AutoModelForCausalLMWithValueHead & PreTrainedModelWrapper)
Trainer (PPOTrainer & PPOConfig)

HuggingFaceDocBuilderDev · 2023-01-17T09:48:06Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra

Looks great! Left a few small comments. Also fortrain_minibatch: we could make the docstring a bit better: "Train the model for PPO mini-batch."

lvwerra · 2023-01-18T09:45:19Z

docs/source/trainer.mdx

@@ -0,0 +1,12 @@
+# Trainer
+
+At TRL we plan to release several RLHF algorithms, we started our journey with PPO (Proximal Policy Optimisation) with an implementation that largely follows  the structure introduced in the paper "Fine-Tuning Language Models from Human Preferences" by D. Ziegler et al. [[paper](https://arxiv.org/pdf/1909.08593.pdf), [code](https://github.com/openai/lm-human-preferences)].


Since adding new algorithms is not on the roadmap at the moment maybe let's just focus on PPO :)

We could also add a sentence or two about the classes. E.g. that they are inspired/influence by the transformers.Trainer and are adapted to RL.

Thanks! Adapted the text in da456cf

trl/trainer/ppo_trainer.py

docs/source/models.mdx

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

…to improve-doc

v1

d493f87

younesbelkada added 3 commits January 17, 2023 12:39

update doc

e0ba887

update autodoc

fe31456

update doc

8e68e65

younesbelkada mentioned this pull request Jan 17, 2023

Roadmap - trl 0.2 #64

Closed

26 tasks

younesbelkada added 7 commits January 17, 2023 12:48

more docs and PreTrainedModelWrapper in public init

64ec38e

update

6ce201b

update

90a4acf

update

1b2f59f

update docs

2304bfb

few fixes

8dd05c9

add PPOConfig to the docs

85d0e55

younesbelkada changed the title ~~[Draft] Improve docs~~ [Doc] Improve docs Jan 17, 2023

younesbelkada marked this pull request as ready for review January 17, 2023 16:32

younesbelkada requested a review from lvwerra January 17, 2023 16:36

lvwerra reviewed Jan 18, 2023

View reviewed changes

younesbelkada and others added 4 commits January 18, 2023 11:32

Apply suggestions from code review

ebfb163

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

clearer description

da456cf

Merge branch 'improve-doc' of https://github.com/younesbelkada/trl in…

ae1a72d

…to improve-doc

remove dashes

ebf090e

younesbelkada requested a review from lvwerra January 18, 2023 14:33

lvwerra approved these changes Jan 18, 2023

View reviewed changes

younesbelkada merged commit 77273d1 into huggingface:main Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Improve docs #91

[Doc] Improve docs #91

younesbelkada commented Jan 17, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 17, 2023 •

edited

Loading

lvwerra left a comment

lvwerra Jan 18, 2023

younesbelkada Jan 18, 2023

		@@ -0,0 +1,12 @@
		# Trainer

		At TRL we plan to release several RLHF algorithms, we started our journey with PPO (Proximal Policy Optimisation) with an implementation that largely follows the structure introduced in the paper "Fine-Tuning Language Models from Human Preferences" by D. Ziegler et al. [[paper](https://arxiv.org/pdf/1909.08593.pdf), [code](https://github.com/openai/lm-human-preferences)].

[Doc] Improve docs #91

[Doc] Improve docs #91

Conversation

younesbelkada commented Jan 17, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Jan 17, 2023 • edited Loading

lvwerra left a comment

Choose a reason for hiding this comment

lvwerra Jan 18, 2023

Choose a reason for hiding this comment

younesbelkada Jan 18, 2023

Choose a reason for hiding this comment

younesbelkada commented Jan 17, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 17, 2023 •

edited

Loading