Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SamSum dataset #1

Merged
merged 8 commits into from
Nov 8, 2023
Merged

SamSum dataset #1

merged 8 commits into from
Nov 8, 2023

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Oct 31, 2023

Ok this is working now

Right now the summarization task is about sumamrizing the content in 1 sentence, we can change this by changing the adapter spec

    adapter_spec = get_summarization_adapter_spec(
        num_sents=1, # change this if you need to @weiweiy 
        max_tokens=128,
        temperature=0.3,
    )

Used this as a dataset link since private links are not playing well with wget https://gist.githubusercontent.com/msaroufim/3f1845a5d93b50d849c42b7baeb2f716/raw/11c2d1814a69bb2cfa54549eaa50c0dcc104b9e5/samsum.tsv

  Sample prompts {
      reference index = None, request_mode = None {
        ###
        Article: Jane: I may be 10 min late. Sorry.
        Alex: Ok. I'm by the left entrance.
        Jane: The bus is running late. I've just passed the supermarket.
        Alex: No worries. I'll be waiting for you.
        Jane: Thnx
        Alex: Did you remember to take the file with the xerox copies?
        Jane: Yes. I hope it will be useful. It's quite a brick.
        Alex: Hope so too
        Jane: We'll do fine :-)
        Alex: No other choice. Seems like it's our last chance of getting through to them.
        Jane: Maybe they will be convinced after they've seen our materials.
        
        Summarize the above article in 1 sentence.

GregBowyer and others added 2 commits October 29, 2023 14:32
Fixed the readme to not mangle the directory layout
name="sam_sum",
scenario_spec=scenario_spec,
adapter_spec=adapter_spec,
metric_specs=get_open_ended_generation_metric_specs() + get_generative_harms_metric_specs(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI get_generative_harms_metric_specs() requires a Perspective API key to work. Let me know if you need help with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiweiy are we gonna need or do we already have a perspective API key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No :( But we do have AOI api keys

Comment on lines +4 to +6
Extractiveness: Extend of generated summary being extracted directly from source text (Also looks at ngrams)
Compression: Length of original doc vs summary
Faithfullness: Ask an LLM how good the summary is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to use human evaluation / model evaluation? We have a way of using Mechanical Turk (or GPT-4, simulating a human evaluator) to evaluate generations; let me know and I can send you details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No human evaluation planned right now, this was just some personal notes for myself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yifanmai, can you share GPT-4 simulated human eval? We could use gpt4 if it is valid. Otherwise I guess BERT-score will have to do.

Copy link
Collaborator

@yifanmai yifanmai Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BERT-score should be sufficient!

If you want to try GPT-4 critique, put the following in prod_env/credentials.conf:

critiqueType: model
critiqueModelName: openai/gpt-4-32k-0613

Then add this to metrics in your RunSpec:

MetricSpec(class_name="helm.benchmark.metric.summarization_critique_metrics.SummarizationCritiqueMetric", args={"num_respondents": 1})

On second thought, I am concerned that the prompt in SummarizationCritiqueMetric is not very good (we did not use this in the production official HELM results).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need OpenAI API keys in prod_env/credentials.conf:

openaiOrgId: "org-..."
openaiApiKey: "sk-..."

@msaroufim msaroufim merged commit 30ca283 into neurips_eval Nov 8, 2023
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants