You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for releasing your pretrained model and saving us training time. I am currently exploring possible applications, but I ran into a problem that might also annoy many researchers trying to use your model.
AFAIK, you have not released the WebText corpus (although I know this is currently discussed in issue #24). This is fine by me, except for one aspect: it makes it impossible for me to know if my test
data is somehow included in WebText. Which, in turns, makes it impossible for me to tell if any improvement I am getting is due to the quality of GPT or the fact that the pretrained model has already seen my test data.
If you do not plan to release WebText in the very near future, I was thinking you could release the bloom filters you describe in your technical paper (code + filled filters). This would allow us to evaluate the proportion of 8-grams in our test data that is also in WebText.
Would this be possible?
Thank you.
The text was updated successfully, but these errors were encountered:
Hi,
Thank you for releasing your pretrained model and saving us training time. I am currently exploring possible applications, but I ran into a problem that might also annoy many researchers trying to use your model.
AFAIK, you have not released the WebText corpus (although I know this is currently discussed in issue #24). This is fine by me, except for one aspect: it makes it impossible for me to know if my test
data is somehow included in WebText. Which, in turns, makes it impossible for me to tell if any improvement I am getting is due to the quality of GPT or the fact that the pretrained model has already seen my test data.
If you do not plan to release WebText in the very near future, I was thinking you could release the bloom filters you describe in your technical paper (code + filled filters). This would allow us to evaluate the proportion of 8-grams in our test data that is also in WebText.
Would this be possible?
Thank you.
The text was updated successfully, but these errors were encountered: