Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the used evaluation set #4

Closed
jiacheng-ye opened this issue Mar 5, 2023 · 3 comments
Closed

About the used evaluation set #4

jiacheng-ye opened this issue Mar 5, 2023 · 3 comments

Comments

@jiacheng-ye
Copy link

Hi, thanks for your great work!

  1. In the paper Figure 3, the TL;DR summary task is used to report the ROUGE metric. I'm wondering where is the dataset? Is that from load_dataset('openai/summarize_from_feedback', 'validation') and calculate the rouge between the generate summary and the higher-scored summary?
  2. In Figure 4., how is the multiple-choice prompt look like?
@lhao499
Copy link
Owner

lhao499 commented Mar 7, 2023

Thanks for the nice words.

Yeah the validation split is used for evaluation, with instruction being 'a good summary is:'. For evaluation on hh-rlhf , the choice template is 'The following is a dialogue: {dialogue}. This dialogue is {choice}', where choice is either 'good' or 'bad' chosen per likelihood.

@jiacheng-ye
Copy link
Author

Thanks.
I notice that there can be multiple summaries in the validation set for the same document spanning in different instances, however, I guess maybe only the "policy"="ref" one is the human-written one? Did you preprocess first to get the "ref" summary for each document (which will reduce the validation set) or just use the higher-scored summary as the oracle in each instance (which may not be human-written like the following fig)?
image
image

@lhao499
Copy link
Owner

lhao499 commented Jun 14, 2023

I apologize for the delay. We choose the higher scored summary as oracle in our experiments.

@lhao499 lhao499 closed this as completed Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants