Use standard open-domain validation split in nq_open #3029

craffel · 2021-10-05T14:19:27Z

The nq_open dataset originally drew the validation set from this file:
https://github.com/google-research-datasets/natural-questions/blob/master/nq_open/NQ-open.efficientqa.dev.1.1.sample.jsonl
However, that's the dev set used specifically and only for the efficientqa competition, and it's not the same dev set as is used in every open-domain question answering paper (including the Lee et al paper that introduced the open-domain variant of NQ, cited at the top of the dataset file). This PR changes nq_open to use the standard validation split and bumps the version to 2.0.0 since this is a breaking change.

The nq_open dataset originally drew the validation set from this file: https://github.com/google-research-datasets/natural-questions/blob/master/nq_open/NQ-open.efficientqa.dev.1.1.sample.jsonl However, that's the dev set used specifically and only for the efficientqa competition, and it's not the same dev set as is used in every open-domain question answering paper (including the Lee et al paper that introduced the open-domain variant of NQ, cited at the top of the dataset file). This PR changes nq_open to use the standard validation split and bumps the version to 2.0.0 since this is a breaking change.

nateraw

LGTM

albertvillanova

Thanks for the fix.

I guess the dataset_infos.json should be updated as well:

datasets-cli test datasets/nq_open --save_infos --all_configs

albertvillanova

Also dummy data should be moved to 2.0.0 subdirectory:

From: dummy/nq_open/1.0.0/dummy_data.zip
To: dummy/nq_open/2.0.0/dummy_data.zip

albertvillanova

And a minor change: the tag pretty_name should be added to the header of the README.md file:

pretty_name: NQ-Open

craffel · 2021-10-05T14:41:45Z

I had to run datasets-cli with --ignore_verifications the first time since it was complaining about a missing file, but now it runs without that flag fine. I moved dummy_data.zip to the new folder, but also had to modify the filename of the test file in the zip (should I not have done that?). Finally, I added the pretty name tag.

albertvillanova

Yes @craffel you did right! The renaming of the dev datafile was also required. Sorry I forgot to tell you.

Once all tests pass, I can merge to master.

craffel · 2021-10-05T14:50:26Z

Great, thanks for the help.

albertvillanova

Thank you @craffel for the fix!

nateraw approved these changes Oct 5, 2021

View reviewed changes

albertvillanova requested changes Oct 5, 2021

View reviewed changes

craffel added 3 commits October 5, 2021 10:37

Update dataset_info.json

ba20773

Move and update dummy_data.zip

e9ada82

Add pretty name

0258074

albertvillanova reviewed Oct 5, 2021

View reviewed changes

albertvillanova approved these changes Oct 5, 2021

View reviewed changes

albertvillanova merged commit 83bc8a2 into master Oct 5, 2021

albertvillanova deleted the nq_open_correct_dev branch October 5, 2021 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use standard open-domain validation split in nq_open #3029

Use standard open-domain validation split in nq_open #3029

craffel commented Oct 5, 2021

nateraw left a comment

albertvillanova left a comment

albertvillanova left a comment

albertvillanova left a comment

craffel commented Oct 5, 2021

albertvillanova left a comment

craffel commented Oct 5, 2021

albertvillanova left a comment

Use standard open-domain validation split in nq_open #3029

Use standard open-domain validation split in nq_open #3029

Conversation

craffel commented Oct 5, 2021

nateraw left a comment

Choose a reason for hiding this comment

albertvillanova left a comment

Choose a reason for hiding this comment

albertvillanova left a comment

Choose a reason for hiding this comment

albertvillanova left a comment

Choose a reason for hiding this comment

craffel commented Oct 5, 2021

albertvillanova left a comment

Choose a reason for hiding this comment

craffel commented Oct 5, 2021

albertvillanova left a comment

Choose a reason for hiding this comment