[WIP] added xnli dataset #613

bentrevett · 2019-10-07T11:04:41Z

This is initial work on adding the XNLI dataset to TorchText.

Weirdly, the XNLI dataset only has a "dev" and a "test" set, no training set. This means I had to make some decisions that I'm not sure were for the best.

I've either removed or hard coded the train argument to be None, which means the XNLI dataset works fine with .splits, but not with .iters. due to these lines being hardcoded. My solution was to simply raise a NotImplementedError when trying to do XNLI.iters. Hopefully this is sufficient.

The XNLI dataset is used for "cross-language sentence encoding", so I have also added a language_field.

zhangguanheng66 · 2019-10-07T16:42:23Z

Change the title of the PR. Once you have done and need a review, please let us know. Thanks for your contributions.

bentrevett · 2019-10-08T19:11:49Z

@zhangguanheng66 What did you want it changed to?

zhangguanheng66 · 2019-10-09T17:46:54Z

@zhangguanheng66 What did you want it changed to?

Just mark it as WIP. Once you have done, we could start the reviewing.

zhangguanheng66 · 2019-10-09T17:51:40Z

test/nli.py

+val_iter, test_iter = data.Iterator.splits((val, test), batch_size=3)
+
+batch = next(iter(val_iter))
+print("Numericalize premises:\n", batch.premise)


In general, we don't have printout for the unit tests. You could use Assert to check the values.

All of the other dataset tests, i.e. test/imdb.py, test/sequence_tagging.py, all do it this way. I can re-write the whole of the test/nli.py to use unit tests though?

unit tests have been added to the whole of test/nli.py now.

zhangguanheng66 · 2019-10-09T17:52:44Z

torchtext/datasets/nli.py

+
+    @classmethod
+    def iters(cls, *args, **kwargs):
+        raise NotImplementedError('XNLI dataset does not support iters')


So why XNLI dataset doesn't support iters?

Because of this line onward. NLIDataset always assumes there is a training, validation and test set, which is not the case for the XNLI dataset - it only has a validation and test set. I can edit the NLIDataset class to check if train is None and act accordingly?

zhangguanheng66 · 2019-10-23T22:04:56Z

@bentrevett Do you need a review for this PR?

bentrevett · 2019-10-25T12:59:27Z

@zhangguanheng66 Yes please.

zhangguanheng66 · 2019-10-25T16:55:01Z

Rebase to the master branch. Will merge after all the unit tests pass. Thanks for the contributions @bentrevett

bentrevett · 2019-10-28T23:39:30Z

@zhangguanheng66 Sorry, still new to git. It looks like you've rebased to the master for me, is there anything I need to do on my end?

zhangguanheng66 · 2019-10-29T14:12:22Z

@zhangguanheng66 Sorry, still new to git. It looks like you've rebased to the master for me, is there anything I need to do on my end?

No. I will just merge the PR. Thanks for the contributions.

added xnli dataset

36a8869

zhangguanheng66 changed the title ~~added xnli dataset~~ [WIP] added xnli dataset Oct 7, 2019

added newlines at end of files for flake8

ba3a27b

zhangguanheng66 reviewed Oct 9, 2019

View reviewed changes

bentrevett added 2 commits October 10, 2019 16:40

added unit tests for snli and multinli

824d1df

added tests to xnli. add comments to multinli

ad58cf7

zhangguanheng66 approved these changes Oct 25, 2019

View reviewed changes

Merge branch 'master' into add_xnli_dataset

a5129af

zhangguanheng66 merged commit c03fe4b into pytorch:master Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] added xnli dataset #613

[WIP] added xnli dataset #613

bentrevett commented Oct 7, 2019

zhangguanheng66 commented Oct 7, 2019

bentrevett commented Oct 8, 2019

zhangguanheng66 commented Oct 9, 2019

zhangguanheng66 Oct 9, 2019

bentrevett Oct 10, 2019

bentrevett Oct 10, 2019

zhangguanheng66 Oct 9, 2019

bentrevett Oct 10, 2019 •

edited

Loading

zhangguanheng66 commented Oct 23, 2019

bentrevett commented Oct 25, 2019

zhangguanheng66 commented Oct 25, 2019

bentrevett commented Oct 28, 2019

zhangguanheng66 commented Oct 29, 2019

[WIP] added xnli dataset #613

[WIP] added xnli dataset #613

Conversation

bentrevett commented Oct 7, 2019

zhangguanheng66 commented Oct 7, 2019

bentrevett commented Oct 8, 2019

zhangguanheng66 commented Oct 9, 2019

zhangguanheng66 Oct 9, 2019

Choose a reason for hiding this comment

bentrevett Oct 10, 2019

Choose a reason for hiding this comment

bentrevett Oct 10, 2019

Choose a reason for hiding this comment

zhangguanheng66 Oct 9, 2019

Choose a reason for hiding this comment

bentrevett Oct 10, 2019 • edited Loading

Choose a reason for hiding this comment

zhangguanheng66 commented Oct 23, 2019

bentrevett commented Oct 25, 2019

zhangguanheng66 commented Oct 25, 2019

bentrevett commented Oct 28, 2019

zhangguanheng66 commented Oct 29, 2019

bentrevett Oct 10, 2019 •

edited

Loading