You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know the copyright/distribution of this one is complex, but it would be great to have! That, combined with the existing wikitext, would provide a complete dataset for pretraining models like BERT.
The text was updated successfully, but these errors were encountered:
jarednielsen
changed the title
[Feature request] Add Toronto BookCorpus
[Feature request] Add Toronto BookCorpus dataset
May 15, 2020
As far as I understand, wikitext is refer to WikiText-103 and WikiText-2 that created by researchers in Salesforce, and mostly used in traditional language modeling.
You might want to say wikipedia, a dump from wikimedia foundation.
Also I would like to have Toronto BookCorpus too ! Though it involves copyright problem...
I know the copyright/distribution of this one is complex, but it would be great to have! That, combined with the existing
wikitext
, would provide a complete dataset for pretraining models like BERT.The text was updated successfully, but these errors were encountered: