MNLI eval/test dataset is not being preprocessed in `run_glue.py` #10620

allenwang28 · 2021-03-10T01:08:12Z

Environment info

transformers version: 4.4.0.dev0
Platform: Linux-4.9.0-14-amd64-x86_64-with-debian-9.13
Python version: 3.6.10
PyTorch version (GPU?): 1.8.0 (False)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: no, using TPU
Using distributed or parallel set-up in script?: distributed

Who can help

N/A, I have a fix upcoming 👍

Information

Model I am using (Bert, XLNet ...): Any model within examples/text-classification/run_glue.py that uses MNLI

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (MNLI)
my own task or dataset: (give details below)

Essentially, the issue is that in dfd16af for run_glue.py, {train|eval|test}_dataset was split out and preprocessed individually. However, this misses datasets["{validation|test}_mismatched"] which is appended to the {eval|test}_dataset only when MNLI is used.

To reproduce

Steps to reproduce the behavior:

Run the run_glue.py example on an MNLI dataset and include eval. The full command I'm using on a v2-8 TPU is:

python examples/xla_spawn.py --num_cores 8 examples/text-classification/run_glue.py --logging_dir=./tensorboard-metrics --task_name MNLI --cache_dir ./cache_dir --do_eval --max_seq_length 128 --learning_rate 3e-5 --output_dir MNLI --logging_steps 30 --save_steps 3000 --tpu_metrics_debug --model_name_or_path bert-base-cased --per_device_eval_batch_size 64 --overwrite_output_dir

This results in:

Traceback (most recent call last):
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
    fn(gindex, *args)
  File "/transformers/examples/text-classification/run_glue.py", line 532, in _mp_fn
    main()
  File "/transformers/examples/text-classification/run_glue.py", line 493, in main
    metrics = trainer.evaluate(eval_dataset=eval_dataset)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1657, in evaluate
    metric_key_prefix=metric_key_prefix,
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1788, in prediction_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1899, in prediction_step
    loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 1458, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 625, in forward
    return_dict=return_dict,
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 471, in forward
    raise ValueError("You have to specify either input_ids or inputs_embeds")
ValueError: You have to specify either input_ids or inputs_embeds

Expected behavior

Dataset should be preprocessed for the entirety of the dataset.

Fix: #10621

The text was updated successfully, but these errors were encountered:

sgugger · 2021-03-10T03:15:03Z

Fixed by #10621
Thanks for flagging and fixing :-)

LysandreJik assigned sgugger Mar 10, 2021

sgugger closed this as completed Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNLI eval/test dataset is not being preprocessed in `run_glue.py` #10620

MNLI eval/test dataset is not being preprocessed in `run_glue.py` #10620

allenwang28 commented Mar 10, 2021 •

edited

sgugger commented Mar 10, 2021

MNLI eval/test dataset is not being preprocessed in run_glue.py #10620

MNLI eval/test dataset is not being preprocessed in run_glue.py #10620

Comments

allenwang28 commented Mar 10, 2021 • edited

Environment info

Who can help

Information

To reproduce

Expected behavior

sgugger commented Mar 10, 2021

MNLI eval/test dataset is not being preprocessed in `run_glue.py` #10620

MNLI eval/test dataset is not being preprocessed in `run_glue.py` #10620

allenwang28 commented Mar 10, 2021 •

edited