Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes an issue in text-classification where MNLI eval/test datasets are not being preprocessed. #10621

Merged
merged 2 commits into from Mar 10, 2021

Conversation

allenwang28
Copy link
Contributor

What does this PR do?

In dfd16af for run_glue.py, {train|eval|test}_dataset was split out and preprocessed individually. However, this misses datasets["{validation|test}_mismatched"] which is appended to the {eval|test}_dataset only when MNLI is used.

When running evaluation on MNLI, that means we eventually hit an un-preprocessed dataset which leads to a stack trace like this:

Traceback (most recent call last):
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
    fn(gindex, *args)
  File "/transformers/examples/text-classification/run_glue.py", line 532, in _mp_fn
    main()
  File "/transformers/examples/text-classification/run_glue.py", line 493, in main
    metrics = trainer.evaluate(eval_dataset=eval_dataset)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1657, in evaluate
    metric_key_prefix=metric_key_prefix,
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1788, in prediction_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1899, in prediction_step
    loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1458, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 625, in forward
    return_dict=return_dict,
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 471, in forward
    raise ValueError("You have to specify either input_ids or inputs_embeds")
ValueError: You have to specify either input_ids or inputs_embeds

This commit resolves this by moving the dataset.map(preprocess...) to the beginning.

Fixes # 10620

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ x] Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@sgugger sgugger merged commit 6f52fce into huggingface:master Mar 10, 2021
Iwontbecreative pushed a commit to Iwontbecreative/transformers that referenced this pull request Jul 15, 2021
… are not being preprocessed. (huggingface#10621)

* Fix MNLI tests

* Linter fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants