Implement the OpenBookQA evaluation #16

StellaAthena · 2020-09-16T16:47:08Z

From the GPT-3 paper

On OpenBookQA [MCKS18], GPT-3 improves significantly from zero to few shot settings but is still over 20 points short of the overall SOTA. GPT-3’s few-shot performance is similar to a fine-tuned BERT Large baseline on the leaderboard.

Data processing code implemented
Evaluation implemented

The evaluation code should be modeled after the interface in lm_eval/base.py and the example of the BoolQ task in lm_eval/tasks/suerglue.py

The text was updated successfully, but these errors were encountered:

cfoster0 · 2020-10-01T04:43:15Z

Note: HuggingFace includes this in its datasets package.

https://huggingface.co/datasets/openbookqa

cfoster0 · 2020-10-05T06:24:22Z

Working on adding this.

cfoster0 · 2020-10-22T03:17:32Z

#42

adding WebNLG challenge sets

StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020

StellaAthena added this to To do in Implementing Evaluations via automation Sep 16, 2020

StellaAthena assigned cfoster0 Oct 5, 2020

StellaAthena moved this from To do to In progress in Implementing Evaluations Oct 5, 2020

cfoster0 mentioned this issue Oct 22, 2020

Add OpenBookQA dataset #42

Merged

StellaAthena moved this from In progress to Data integrated, Eval not done in Implementing Evaluations Oct 22, 2020

StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020

StellaAthena linked a pull request Oct 23, 2020 that will close this issue

Add OpenBookQA dataset #42

Merged

StellaAthena closed this as completed Oct 23, 2020

StellaAthena reopened this Jan 5, 2021

StellaAthena unassigned cfoster0 Jan 5, 2021

StellaAthena added feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Jan 5, 2021

leogao2 moved this from In Progress to To do in Implementing Evaluations Jan 28, 2021

jon-tow moved this from To do, Evaluations to Implement to In Progress in Implementing Evaluations Feb 5, 2021

jon-tow mentioned this issue Feb 9, 2021

Implement OpenBookQA evaluation #136

Merged

leogao2 assigned cfoster0 and jon-tow Feb 9, 2021

leogao2 closed this as completed Feb 9, 2021

Implementing Evaluations automation moved this from In Progress to Done, evaluations Feb 9, 2021

StellaAthena pushed a commit that referenced this issue Apr 29, 2022

Merge pull request #16 from jordiclive/webnlg_challenge

d47f453

adding WebNLG challenge sets

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023

Merge pull request EleutherAI#16 from jordiclive/webnlg_challenge

fcb097a

adding WebNLG challenge sets

LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023

Merge pull request EleutherAI#16 from jordiclive/webnlg_challenge

7511232

adding WebNLG challenge sets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the OpenBookQA evaluation #16

Implement the OpenBookQA evaluation #16

StellaAthena commented Sep 16, 2020 •

edited by jon-tow

cfoster0 commented Oct 1, 2020

cfoster0 commented Oct 5, 2020

cfoster0 commented Oct 22, 2020

Implement the OpenBookQA evaluation #16

Implement the OpenBookQA evaluation #16

Comments

StellaAthena commented Sep 16, 2020 • edited by jon-tow

cfoster0 commented Oct 1, 2020

cfoster0 commented Oct 5, 2020

cfoster0 commented Oct 22, 2020

StellaAthena commented Sep 16, 2020 •

edited by jon-tow