Introduce web and wiki config in triviaqa dataset #2949

shirte · 2021-09-20T14:17:23Z

The TriviaQA paper suggests that the two subsets (Wikipedia and Web)
should be treated differently. There are also different leaderboards
for the two sets on CodaLab. For that reason, introduce additional
builder configs in the trivia_qa dataset.

The TriviaQA paper suggests that the two subsets (Wikipedia and Web) should be treated differently. There are also different leaderboards for the two sets on CodaLab. For that reason, introduce additional builder configs in the trivia_qa dataset.

lhoestq

Nice thanks ! It looks all good to me :)

I only have one comment:

We try to keep the dummy data as small as possible. The rc, rc.web, rc.wikipedia and unfiltered ones are a bit big because of the evidence files. Feel free to only keep a few sentences in them in order to make the dummy data even smaller.

…onfig-triviaqa

lhoestq · 2021-10-01T12:42:45Z

I just made the dummy data smaller :)
Once github refreshes the change I think we can merge !

shirte · 2021-10-02T17:17:06Z

Thank you so much for reviewing and accepting my pull request!! :)

I created these rather large dummy data sets to cover all different cases for the row structure. E.g. in the web configuration, it's possible that a row has evidence from wikipedia ("EntityPages") and the web ("SearchResults"). But it also might happen that either EntityPages or SearchResults is empty. Probably, I will add this thought to the dataset description in the future.

lhoestq · 2021-10-05T13:20:52Z

Ok I see ! Yes feel free to mention it in the dataset card, this can be useful.

For the dummy data though we can keep the small ones, as the tests are mainly about testing the parsing from the dataset script rather than the actual content of the dataset.

shirte added 8 commits September 20, 2021 16:14

Remove invalid config wikipedia.unfiltered

2181c96

Delete dummy_data directory

d9be379

Add helper method in trivia_qa

1bed9d9

Add dummy_data for all trivia_qa configs

2060f0d

Fix style

abf816a

Add tags in TriviaQA dataset card

979eb93

Fix typo

239910b

lhoestq reviewed Sep 29, 2021

View reviewed changes

lhoestq mentioned this pull request Oct 1, 2021

Fix trivia_qa unfiltered #2995

Merged

lhoestq added 2 commits October 1, 2021 14:36

Merge remote-tracking branch 'upstream/master' into create-web-wiki-c…

66f2f6d

…onfig-triviaqa

smaller dummy data

595d760

lhoestq merged commit f9ee6d4 into huggingface:master Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce web and wiki config in triviaqa dataset #2949

Introduce web and wiki config in triviaqa dataset #2949

shirte commented Sep 20, 2021

lhoestq left a comment

lhoestq commented Oct 1, 2021

shirte commented Oct 2, 2021

lhoestq commented Oct 5, 2021

Introduce web and wiki config in triviaqa dataset #2949

Introduce web and wiki config in triviaqa dataset #2949

Conversation

shirte commented Sep 20, 2021

lhoestq left a comment

Choose a reason for hiding this comment

lhoestq commented Oct 1, 2021

shirte commented Oct 2, 2021

lhoestq commented Oct 5, 2021