Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add initial version of docs for PPOTrainer #665

Merged
merged 5 commits into from
Sep 14, 2023

Conversation

davidberenstein1957
Copy link
Member

@davidberenstein1957 davidberenstein1957 commented Aug 20, 2023

As discussed in #623, I am proposing more elaborate docs for the PPOTrainer.

Closes #623

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 1, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the docs contribution! The preview is also working now :) The PR looks in pretty good shape to me! I added some small suggestions here and there. I'll also let @vwxyzjn and @younesbelkada have a look.

docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
docs/source/ppo_trainer.mdx Show resolved Hide resolved
docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
- specified reference to reward model
- added batched generator
- added line of saving model
- remove reference model
@davidberenstein1957
Copy link
Member Author

@lvwerra I already processed your comments and suggestions.

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, some last small nits only!

docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
docs/source/ppo_trainer.mdx Outdated Show resolved Hide resolved
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool ! Thanks a lot for your great effort on this!

@younesbelkada younesbelkada merged commit 3f7710a into huggingface:main Sep 14, 2023
11 checks passed
kushal-tri pushed a commit to kushalarora/trl that referenced this pull request Sep 19, 2023
* docs: add initial version of docs for  `PPOTrainer`

* Apply suggestions from code review Leandro

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* updated docs based on feedback leandro
- specified reference to reward model
- added batched generator
- added line of saving model
- remove reference model

* Apply suggestions from code review

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

---------

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
* docs: add initial version of docs for  `PPOTrainer`

* Apply suggestions from code review Leandro

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* updated docs based on feedback leandro
- specified reference to reward model
- added batched generator
- added line of saving model
- remove reference model

* Apply suggestions from code review

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

---------

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOCS] PPOTrainer references are missing in the API docs
4 participants