-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix WT2, restructure splits, and add WMT14 #138
Conversation
Nice; the added download functionality is great, especially for Google Drive. Is there any way to unify the splits for the other datasets even more? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little odd to have the WMT14 be the preprocessed version from Google, but I suppose we can worry about that later if we want to add other versions of WMT.
Agreed, though the others are a lot more complicated and this one allows direct comparison with Google's GNMT/Transformer results |
Also the confirm_token thing sits there for easily five minutes before actually downloading?? I don't know what's up with that; the logic is exactly the same as what Google uses (and it looks like they got theirs from StackOverflow anyway) |
Anyone know what to do about the bogus flake8 error? I think I've seen it before but I don't remember what if anything makes it go away (other than |
Also adds nonoptional dependencies to setup.py so
pip install torchtext
will fetch them.The only known possibly-breaking change (I'd call it a bugfix) is that
Dataset.splits
will now useos.path.join
rather than concatenation to join provided paths.