-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] added xnli dataset #613
Conversation
Change the title of the PR. Once you have done and need a review, please let us know. Thanks for your contributions. |
@zhangguanheng66 What did you want it changed to? |
Just mark it as WIP. Once you have done, we could start the reviewing. |
test/nli.py
Outdated
val_iter, test_iter = data.Iterator.splits((val, test), batch_size=3) | ||
|
||
batch = next(iter(val_iter)) | ||
print("Numericalize premises:\n", batch.premise) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we don't have printout for the unit tests. You could use Assert
to check the values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the other dataset tests, i.e. test/imdb.py, test/sequence_tagging.py, all do it this way. I can re-write the whole of the test/nli.py to use unit tests though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit tests have been added to the whole of test/nli.py now.
|
||
@classmethod | ||
def iters(cls, *args, **kwargs): | ||
raise NotImplementedError('XNLI dataset does not support iters') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why XNLI dataset doesn't support iters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of this line onward. NLIDataset
always assumes there is a training, validation and test set, which is not the case for the XNLI dataset - it only has a validation and test set. I can edit the NLIDataset
class to check if train
is None
and act accordingly?
@bentrevett Do you need a review for this PR? |
@zhangguanheng66 Yes please. |
Rebase to the master branch. Will merge after all the unit tests pass. Thanks for the contributions @bentrevett |
@zhangguanheng66 Sorry, still new to git. It looks like you've rebased to the master for me, is there anything I need to do on my end? |
No. I will just merge the PR. Thanks for the contributions. |
This is initial work on adding the XNLI dataset to TorchText.
Weirdly, the XNLI dataset only has a "dev" and a "test" set, no training set. This means I had to make some decisions that I'm not sure were for the best.
I've either removed or hard coded the train argument to be None, which means the XNLI dataset works fine with .splits, but not with .iters. due to these lines being hardcoded. My solution was to simply raise a
NotImplementedError
when trying to doXNLI.iters
. Hopefully this is sufficient.The XNLI dataset is used for "cross-language sentence encoding", so I have also added a language_field.