SamSum dataset #1

msaroufim · 2023-10-31T19:31:40Z

Ok this is working now

Right now the summarization task is about sumamrizing the content in 1 sentence, we can change this by changing the adapter spec

    adapter_spec = get_summarization_adapter_spec(
        num_sents=1, # change this if you need to @weiweiy 
        max_tokens=128,
        temperature=0.3,
    )

Used this as a dataset link since private links are not playing well with wget https://gist.githubusercontent.com/msaroufim/3f1845a5d93b50d849c42b7baeb2f716/raw/11c2d1814a69bb2cfa54549eaa50c0dcc104b9e5/samsum.tsv

  Sample prompts {
      reference index = None, request_mode = None {
        ###
        Article: Jane: I may be 10 min late. Sorry.
        Alex: Ok. I'm by the left entrance.
        Jane: The bus is running late. I've just passed the supermarket.
        Alex: No worries. I'll be waiting for you.
        Jane: Thnx
        Alex: Did you remember to take the file with the xerox copies?
        Jane: Yes. I hope it will be useful. It's quite a brick.
        Alex: Hope so too
        Jane: We'll do fine :-)
        Alex: No other choice. Seems like it's our last chance of getting through to them.
        Jane: Maybe they will be convinced after they've seen our materials.
        
        Summarize the above article in 1 sentence.

Fixed the readme to not mangle the directory layout

yifanmai · 2023-11-08T05:15:30Z

src/helm/benchmark/run_specs.py

+        name="sam_sum",
+        scenario_spec=scenario_spec,
+        adapter_spec=adapter_spec,
+        metric_specs=get_open_ended_generation_metric_specs() + get_generative_harms_metric_specs(),


FYI get_generative_harms_metric_specs() requires a Perspective API key to work. Let me know if you need help with this.

@weiweiy are we gonna need or do we already have a perspective API key?

No :( But we do have AOI api keys

yifanmai · 2023-11-08T05:17:38Z

src/helm/benchmark/static/personal-notes.md

+Extractiveness: Extend of generated summary being extracted directly from source text (Also looks at ngrams)
+Compression: Length of original doc vs summary
+Faithfullness: Ask an LLM how good the summary is


Are you planning to use human evaluation / model evaluation? We have a way of using Mechanical Turk (or GPT-4, simulating a human evaluator) to evaluate generations; let me know and I can send you details.

No human evaluation planned right now, this was just some personal notes for myself

@yifanmai, can you share GPT-4 simulated human eval? We could use gpt4 if it is valid. Otherwise I guess BERT-score will have to do.

BERT-score should be sufficient!

If you want to try GPT-4 critique, put the following in prod_env/credentials.conf:

critiqueType: model critiqueModelName: openai/gpt-4-32k-0613

Then add this to metrics in your RunSpec:

MetricSpec(class_name="helm.benchmark.metric.summarization_critique_metrics.SummarizationCritiqueMetric", args={"num_respondents": 1})

On second thought, I am concerned that the prompt in SummarizationCritiqueMetric is not very good (we did not use this in the production official HELM results).

You also need OpenAI API keys in prod_env/credentials.conf:

openaiOrgId: "org-..." openaiApiKey: "sk-..."

GregBowyer and others added 2 commits October 29, 2023 14:32

Fix README.md

a8ad6e5

Fixed the readme to not mangle the directory layout

samsum dataset

b663ad5

msaroufim mentioned this pull request Oct 31, 2023

issue adding a custom dataset stanford-crfm/helm#1972

Closed

msaroufim added 4 commits October 31, 2023 13:02

upd

9ae6f53

fix spec

004e195

black

7515cbf

update

5353e70

yifanmai requested changes Nov 8, 2023

View reviewed changes

msaroufim added 2 commits November 8, 2023 14:17

Update run_specs.py

153ac0d

Merge branch 'neurips_eval' into msaroufim/datasets

43bc446

msaroufim merged commit 30ca283 into neurips_eval Nov 8, 2023
2 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SamSum dataset #1

SamSum dataset #1

msaroufim commented Oct 31, 2023 •

edited

Loading

yifanmai Nov 8, 2023

msaroufim Nov 8, 2023

weiweiy Nov 8, 2023

yifanmai Nov 8, 2023

msaroufim Nov 8, 2023

weiweiy Nov 8, 2023

yifanmai Nov 8, 2023 •

edited

Loading

yifanmai Nov 8, 2023

SamSum dataset #1

SamSum dataset #1

Conversation

msaroufim commented Oct 31, 2023 • edited Loading

yifanmai Nov 8, 2023

Choose a reason for hiding this comment

msaroufim Nov 8, 2023

Choose a reason for hiding this comment

weiweiy Nov 8, 2023

Choose a reason for hiding this comment

yifanmai Nov 8, 2023

Choose a reason for hiding this comment

msaroufim Nov 8, 2023

Choose a reason for hiding this comment

weiweiy Nov 8, 2023

Choose a reason for hiding this comment

yifanmai Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

yifanmai Nov 8, 2023

Choose a reason for hiding this comment

msaroufim commented Oct 31, 2023 •

edited

Loading

yifanmai Nov 8, 2023 •

edited

Loading