Fixes an issue in `text-classification` where MNLI eval/test datasets are not being preprocessed. #10621

allenwang28 · 2021-03-10T01:08:52Z

What does this PR do?

In dfd16af for run_glue.py, {train|eval|test}_dataset was split out and preprocessed individually. However, this misses datasets["{validation|test}_mismatched"] which is appended to the {eval|test}_dataset only when MNLI is used.

When running evaluation on MNLI, that means we eventually hit an un-preprocessed dataset which leads to a stack trace like this:

Traceback (most recent call last):
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
    fn(gindex, *args)
  File "/transformers/examples/text-classification/run_glue.py", line 532, in _mp_fn
    main()
  File "/transformers/examples/text-classification/run_glue.py", line 493, in main
    metrics = trainer.evaluate(eval_dataset=eval_dataset)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1657, in evaluate
    metric_key_prefix=metric_key_prefix,
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1788, in prediction_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1899, in prediction_step
    loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1458, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 625, in forward
    return_dict=return_dict,
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 471, in forward
    raise ValueError("You have to specify either input_ids or inputs_embeds")
ValueError: You have to specify either input_ids or inputs_embeds

This commit resolves this by moving the dataset.map(preprocess...) to the beginning.

Fixes # 10620

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ x] Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

sgugger

Thanks for fixing!

… are not being preprocessed. (huggingface#10621) * Fix MNLI tests * Linter fix

Fix MNLI tests

db96b49

allenwang28 mentioned this pull request Mar 10, 2021

MNLI eval/test dataset is not being preprocessed in run_glue.py #10620

Closed

4 tasks

Linter fix

851052e

LysandreJik requested a review from sgugger March 10, 2021 02:28

sgugger approved these changes Mar 10, 2021

View reviewed changes

sgugger merged commit 6f52fce into huggingface:master Mar 10, 2021

Iwontbecreative pushed a commit to Iwontbecreative/transformers that referenced this pull request Jul 15, 2021

Fixes an issue in text-classification where MNLI eval/test datasets…

5a58e6e

… are not being preprocessed. (huggingface#10621) * Fix MNLI tests * Linter fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes an issue in `text-classification` where MNLI eval/test datasets are not being preprocessed. #10621

Fixes an issue in `text-classification` where MNLI eval/test datasets are not being preprocessed. #10621

allenwang28 commented Mar 10, 2021

sgugger left a comment

Fixes an issue in text-classification where MNLI eval/test datasets are not being preprocessed. #10621

Fixes an issue in text-classification where MNLI eval/test datasets are not being preprocessed. #10621

Conversation

allenwang28 commented Mar 10, 2021

What does this PR do?

Before submitting

Who can review?

sgugger left a comment

Choose a reason for hiding this comment

Fixes an issue in `text-classification` where MNLI eval/test datasets are not being preprocessed. #10621

Fixes an issue in `text-classification` where MNLI eval/test datasets are not being preprocessed. #10621