[DPO] remove response/pairs from the DPO side by kashif · Pull Request #540 · huggingface/trl

kashif · 2023-07-19T10:47:42Z

fixes #537

TODO:

docs
tests

HuggingFaceDocBuilderDev · 2023-07-19T10:51:50Z

The documentation is not available anymore as the PR was closed or merged.

tomaarsen · 2023-07-19T11:05:10Z

@@ -95,11 +95,12 @@ def split_prompt_and_responses(ex):

    def gen():


This function can be simplified a lot now:

def get_hh(split: str, sanity_check: bool = False, silent: bool = False, cache_dir: str = None) -> Dataset: """Load the Anthropic Helpful-Harmless dataset from Hugging Face and convert it to the necessary format. The dataset is converted to a dictionary with the following structure: { 'prompt': List[str], 'chosen': List[str], 'rejected': List[str], } Prompts should be structured as follows: \n\nHuman: <prompt>\n\nAssistant: Multiple turns are allowed, but the prompt should always start with \n\nHuman: and end with \n\nAssistant:. """ dataset = load_dataset("Anthropic/hh-rlhf", split=split, cache_dir=cache_dir) if sanity_check: dataset = dataset.select(range(min(len(dataset), 1000))) def split_prompt_and_responses(sample) -> Dict[str, str]: prompt = extract_anthropic_prompt(sample["chosen"]) return { "prompt": prompt, "chosen": sample["chosen"][len(prompt) :], "rejected": sample["rejected"][len(prompt) :], } return dataset.map(split_prompt_and_responses)

I can't push this to the PR directly or add this as a suggestion in the review, sadly.

kashif · 2023-07-19T11:22:08Z

you should be able to push to my branch now... please feel free!

kashif · 2023-07-19T11:23:54Z

let me fix up the docs and test

kashif · 2023-07-19T13:12:33Z

@younesbelkada or @lvwerra should be ready for review!

younesbelkada

Looks good on my side, thanks a lot everyone for working on this!
Since we did not release a pypi version with DPOTrainer the breaking change is ok to have it now!

tomaarsen · 2023-07-19T15:08:10Z

I appreciate that you left some time for people to find these issues before releasing. I'd love to look into the spamminess of the logs during training as well. Do you know when you intend to release?

Tom Aarsen

kashif · 2023-07-19T15:17:59Z

yes @tomaarsen let's open another PR for the logging issue... ideally, I wanted to have this all logged to wandb etc. and note we also have the sample generation helper too which is currently not being used...

tomaarsen · 2023-07-19T15:27:59Z

wandb support should be automatically implemented right? As long as it's installed and the report_to is set up right.
Is the issue simply that we don't call self.callback_handler.on_log(self.args, self.state, self.control, ...)?

I can help with this, as I have a good amount of experience with how Transformers does its Trainer via a SetFit logging/callbacks integration that I've been working on.

Tom Aarsen

kashif · 2023-07-19T15:29:40Z

yup! for some reason i was not getting it to log with the trainer and so I "forced" it to log things via the call to:

if self.accelerator.is_main_process:
     self.log_metrics("test", metrics)

would appreciate any insight here!

younesbelkada · 2023-07-19T15:36:45Z

Thanks everyone! For the release probably mid-next week would be a nice time, but not sure sure @lvwerra

* remove response/pairs from the DPO side * Simplify get_hh helper function * removed unused import * update tests and docs for dpo_trainer --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com> Co-authored-by: Shoaib Burq <saburq@gmail.com>

remove response/pairs from the DPO side

a2ac929

kashif mentioned this pull request Jul 19, 2023

DPOTrainer ignores training & evaluation data if there are more than 2 responses #537

Closed

Simplify get_hh helper function

b4a9751

tomaarsen reviewed Jul 19, 2023

View reviewed changes

kashif and others added 2 commits July 19, 2023 13:26

removed unused import

2c75a04

update tests and docs for dpo_trainer

9e758fe

younesbelkada approved these changes Jul 19, 2023

View reviewed changes

tomaarsen approved these changes Jul 19, 2023

View reviewed changes

younesbelkada requested a review from lvwerra July 19, 2023 15:27

lvwerra merged commit fd50e06 into huggingface:main Jul 19, 2023

tomaarsen mentioned this pull request Jul 25, 2023

DPOTrainer logging too frequent #569

Closed

kashif deleted the dpo-collate branch April 23, 2024 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPO] remove response/pairs from the DPO side#540

[DPO] remove response/pairs from the DPO side#540
lvwerra merged 4 commits intohuggingface:mainfrom
kashif:dpo-collate

kashif commented Jul 19, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 19, 2023 •

edited

Loading

Uh oh!

tomaarsen Jul 19, 2023

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

younesbelkada left a comment

Uh oh!

tomaarsen commented Jul 19, 2023

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

tomaarsen commented Jul 19, 2023 •

edited

Loading

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

younesbelkada commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		@@ -95,11 +95,12 @@ def split_prompt_and_responses(ex):

		def gen():

Conversation

kashif commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomaarsen Jul 19, 2023

Choose a reason for hiding this comment

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

tomaarsen commented Jul 19, 2023

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

tomaarsen commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kashif commented Jul 19, 2023

Uh oh!

younesbelkada commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kashif commented Jul 19, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 19, 2023 •

edited

Loading

tomaarsen commented Jul 19, 2023 •

edited

Loading