Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use standard open-domain validation split in nq_open #3029

Merged
merged 4 commits into from
Oct 5, 2021

Conversation

craffel
Copy link
Contributor

@craffel craffel commented Oct 5, 2021

The nq_open dataset originally drew the validation set from this file:
https://github.com/google-research-datasets/natural-questions/blob/master/nq_open/NQ-open.efficientqa.dev.1.1.sample.jsonl
However, that's the dev set used specifically and only for the efficientqa competition, and it's not the same dev set as is used in every open-domain question answering paper (including the Lee et al paper that introduced the open-domain variant of NQ, cited at the top of the dataset file). This PR changes nq_open to use the standard validation split and bumps the version to 2.0.0 since this is a breaking change.

The nq_open dataset originally drew the validation set from this file:
https://github.com/google-research-datasets/natural-questions/blob/master/nq_open/NQ-open.efficientqa.dev.1.1.sample.jsonl
However, that's the dev set used specifically and only for the efficientqa competition, and it's not the same dev set as is used in every open-domain question answering paper (including the Lee et al paper that introduced the open-domain variant of NQ, cited at the top of the dataset file). This PR changes nq_open to use the standard validation split and bumps the version to 2.0.0 since this is a breaking change.
Copy link
Contributor

@nateraw nateraw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

I guess the dataset_infos.json should be updated as well:

datasets-cli test datasets/nq_open --save_infos --all_configs

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also dummy data should be moved to 2.0.0 subdirectory:

  • From: dummy/nq_open/1.0.0/dummy_data.zip
  • To: dummy/nq_open/2.0.0/dummy_data.zip

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a minor change: the tag pretty_name should be added to the header of the README.md file:

pretty_name: NQ-Open

@craffel
Copy link
Contributor Author

craffel commented Oct 5, 2021

I had to run datasets-cli with --ignore_verifications the first time since it was complaining about a missing file, but now it runs without that flag fine. I moved dummy_data.zip to the new folder, but also had to modify the filename of the test file in the zip (should I not have done that?). Finally, I added the pretty name tag.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @craffel you did right! The renaming of the dev datafile was also required. Sorry I forgot to tell you.

Once all tests pass, I can merge to master.

@craffel
Copy link
Contributor Author

craffel commented Oct 5, 2021

Great, thanks for the help.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @craffel for the fix!

@albertvillanova albertvillanova merged commit 83bc8a2 into master Oct 5, 2021
@albertvillanova albertvillanova deleted the nq_open_correct_dev branch October 5, 2021 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants