XLNet evaluation on SQuAD #9351

slvcsl · 2020-12-29T22:25:01Z

Environment info

transformers version: 4.2.0dev0
Platform: Linux-5.3.0-64-generic-x86_64-with-debian-buster-sid
Python version: 3.7.4
PyTorch version (GPU?): 1.7.1+cu101 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help

XLNet @LysandreJik

Information

Model I am using (Bert, XLNet ...): XLNet

The problem arises when using:

the official example scripts: run_qa.py
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: squad v2
my own task or dataset: (give details below)

To reproduce

I installed the transformer package from source, as required.
When I try to evaluate XLNet on the SQUAD dataset, however, I get a problem.
In particular, I run the official script as:

python run_qa.py \
                --model_name_or_path xlnet-base-cased \
                --dataset_name squad_v2 \
                --do_eval \
                --version_2_with_negative \
                --learning_rate 1e-4 \
                --per_device_eval_batch_size=1  \
                --seed 1 \
                --output_dir ../../../../squad_results

This is the whole output, most of which is probably non relevant, for reference (error in bold)

12/29/2020 22:41:21 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 2distributed training: False, 16-bits training: False
12/29/2020 22:41:21 - INFO - main - Training/evaluation parameters TrainingArguments(output_dir=../../../../squad_results, overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, model_parallel=False, evaluation_strategy=EvaluationStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=1, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=1e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_steps=0, logging_dir=runs/Dec29_22-41-21_HLTNLP-GPU-B, logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, seed=1, fp16=False, fp16_opt_level=O1, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=../../../../squad_results, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, fp16_backend=auto, sharded_ddp=False, label_smoothing_factor=0.0, adafactor=False)
Reusing dataset squad_v2 (/home/scasola/.cache/huggingface/datasets/squad_v2/squad_v2/2.0.0/0e44b51f4035c15e218d53dc9eea5fe7123341982e524818b8500e4094fffb7b)
loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /home/scasola/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}

loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /home/scasola/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}

loading file https://huggingface.co/xlnet-base-cased/resolve/main/spiece.model from cache at /home/scasola/.cache/huggingface/transformers/df73bc9f8d13bf2ea4dab95624895e45a550a0f0a825e41fc25440bf367ee3c8.d93497120e3a865e2970f26abdf7bf375896f97fde8b874b70909592a6c785c9
loading file https://huggingface.co/xlnet-base-cased/resolve/main/tokenizer.json from cache at /home/scasola/.cache/huggingface/transformers/46f47734f3dcaef7e236b9a3e887f27814e18836a8db7e6a49148000058a1a54.2a683f915238b4f560dab0c724066cf0a7de9a851e96b0fb3a1e7f0881552f53
loading weights file https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin from cache at /home/scasola/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForQuestionAnsweringSimple: ['lm_loss.weight', 'lm_loss.bias']

This IS expected if you are initializing XLNetForQuestionAnsweringSimple from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing XLNetForQuestionAnsweringSimple from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLNetForQuestionAnsweringSimple were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading cached processed dataset at /home/scasola/.cache/huggingface/datasets/squad_v2/squad_v2/2.0.0/0e44b51f4035c15e218d53dc9eea5fe7123341982e524818b8500e4094fffb7b/cache-c46fe459ef8061d5.arrow
The following columns in the evaluation set don't have a corresponding argument in XLNetForQuestionAnsweringSimple.forward and have been ignored: example_id, offset_mapping.
12/29/2020 22:41:30 - INFO - main - *** Evaluate ***
The following columns in the evaluation set don't have a corresponding argument in XLNetForQuestionAnsweringSimple.forward and have been ignored: example_id, offset_mapping.
***** Running Evaluation *****
Num examples = 12231
Batch size = 2
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6116/6116 [38:14<00:00, 3.32it/s]12/29/2020 23:19:57 - INFO - utils_qa - Post-processing 11873 example predictions split into 12231 features.
0%| | 0/11873 [00:00<?, ?it/s]Traceback (most recent call last): | 0/11873 [00:00<?, ?it/s] File "run_qa.py", line 480, in
main()
File "run_qa.py", line 461, in main
results = trainer.evaluate()
File "/home/scasola/survey/squad/xlnet/transformers/examples/question-answering/trainer_qa.py", line 62, in evaluate
eval_preds = self.post_process_function(eval_examples, eval_dataset, output.predictions)
File "run_qa.py", line 407, in post_processing_function
is_world_process_zero=trainer.is_world_process_zero(),
File "/home/scasola/survey/squad/xlnet/transformers/examples/question-answering/utils_qa.py", line 195, in postprocess_qa_predictions
while predictions[i]["text"] == "":
IndexError: list index out of range

Expected behavior

Evalaution of the model saved in the output dir

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2020-12-30T09:27:21Z

Pinging @sgugger here. Think he has more knowledge about the training script than I do.

sgugger · 2021-01-04T21:57:39Z

This is linked to this issue in the tokenizers repo. Until this is solved, the script run_qa does not work properly with XLNet (the offset mappings computed are incorrect). You can use run_qa_beam_search with the XLNet model while waiting for the issue to be solved.

slvcsl · 2021-01-05T10:01:32Z

Hi @sgugger, thanks for your answer. However, I'm trying to do a (fair) comparison between models, so using beam search is not an option. I might install another package version that works well with XLNet on SQuAD (I've seen, for example, that v. 3.10 also has some problems in evaluation). Do you know if any previous version is ok, at the moment?

sgugger · 2021-01-05T13:02:33Z

You can always use the legacy script if you can't wait for the fix.

slvcsl · 2021-01-05T15:11:11Z

Thank you very much, I was unaware of legacy scripts.

Do I need a particular transformers version to run them? When I run run_squad.py at the moment I get (errors in bolds)

01/05/2021 15:51:31 - WARNING - main - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False
[INFO|configuration_utils.py:431] 2021-01-05 15:51:31,306 >> loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /home/scasola/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
[INFO|configuration_utils.py:467] 2021-01-05 15:51:31,307 >> Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}

[INFO|configuration_utils.py:431] 2021-01-05 15:51:31,607 >> loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /home/scasola/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
[INFO|configuration_utils.py:467] 2021-01-05 15:51:31,608 >> Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}

[INFO|tokenization_utils_base.py:1802] 2021-01-05 15:51:32,221 >> loading file https://huggingface.co/xlnet-base-cased/resolve/main/spiece.model from cache at /home/scasola/.cache/huggingface/transformers/df73bc9f8d13bf2ea4dab95624895e45a550a0f0a825e41fc25440bf367ee3c8.d93497120e3a865e2970f26abdf7bf375896f97fde8b874b70909592a6c785c9
[INFO|tokenization_utils_base.py:1802] 2021-01-05 15:51:32,222 >> loading file https://huggingface.co/xlnet-base-cased/resolve/main/tokenizer.json from cache at /home/scasola/.cache/huggingface/transformers/46f47734f3dcaef7e236b9a3e887f27814e18836a8db7e6a49148000058a1a54.2a683f915238b4f560dab0c724066cf0a7de9a851e96b0fb3a1e7f0881552f53
[INFO|modeling_utils.py:1024] 2021-01-05 15:51:32,564 >> loading weights file https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin from cache at /home/scasola/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
[WARNING|modeling_utils.py:1132] 2021-01-05 15:51:35,070 >> Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForQuestionAnsweringSimple: ['lm_loss.weight', 'lm_loss.bias']
...
01/05/2021 15:51:37 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='../../../../../squad_data', device=device(type='cuda'), do_eval=True, do_lower_case=False,
do_train=True, doc_stride=128, eval_all_checkpoints=True, evaluate_during_training=True, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=4, lang_id=0, learning_rate=0.001, local_rank=-1, logging_steps=500, max_answer_length=30, max_grad_norm=1.0, max_query_length=64, max_seq_length=384, max_steps=-1, model_name_or_path='xlnet-base-cased', model_type='xlnet', n_best_size=20, n_gpu=1, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=10.0,
output_dir='../../../../squad_results/XLNet/1e-3/1', overwrite_cache=True, overwrite_output_dir=False, per_gpu_eval_batch_size=8, per_gpu_train_batch_size=8, predict_file=None, save_steps=4132, seed=1, server_ip='', server_port='', threads=1, tokenizer_name='', train_file=None, verbose_logging=False, version_2_with_negative=True, warmup_steps=4132, weight_decay=0.0)
01/05/2021 15:51:37 - INFO - main - Creating features from dataset file at ../../../../../squad_data
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 442/442 [00:39<00:00, 11.33it/s]convert squad examples to features: 0%| | 0/130319 [00:00<?, ?it/s]multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/scasola/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/scasola/anaconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 189, in squad_convert_example_to_features
return_token_type_ids=True,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2462, in encode_plus
**kwargs,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py", line 465, in _encode_plus
**kwargs,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py", line 378, in _batch_encode_plus
is_pretokenized=is_split_into_words,
TypeError: TextInputSequence must be str
"""

**The above exception was the direct cause of the following exception:

Traceback (most recent call last):**
File "run_squad.py", line 833, in
main()
File "run_squad.py", line 772, in main
train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)
File "run_squad.py", line 461, in load_and_cache_examples
threads=args.threads,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 382, in squad_convert_examples_to_features
disable=not tqdm_enabled,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/tqdm/std.py", line 1133, in iter
for obj in iterable:
File "/home/scasola/anaconda3/lib/python3.7/multiprocessing/pool.py", line 325, in
return (item for chunk in result for item in chunk)
File "/home/scasola/anaconda3/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
TypeError: TextInputSequence must be str

This might be related to the tokenizer, as in #7735 .
However, the used tokenizer should not be fast (see code snippet) even if it seems from the traceback that the fast tokenizer is actually called. Any workaround?
tokenizer = AutoTokenizer.from_pretrained( args.tokenizer_name if args.tokenizer_name else args.model_name_or_path, do_lower_case=args.do_lower_case, cache_dir=args.cache_dir if args.cache_dir else None, use_fast=False, # SquadDataset is not compatible with Fast tokenizers which have a smarter overflow handeling )

github-actions · 2021-03-06T00:13:52Z

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

wenting-zhao · 2021-10-02T04:52:48Z

am having the same issue and a fix would be really nice...

LysandreJik · 2021-10-05T12:42:19Z

Thank you for opening an issue - Unfortunately, we're limited on bandwidth and fixing QA for XLNet is quite low on our priority list. If you would like to go ahead and fix this issue, we would love to review a PR, but we won't find the time to get to it right away.

github-actions bot added the wontfix label Mar 6, 2021

github-actions bot closed this as completed Mar 6, 2021

LysandreJik added Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! and removed wontfix labels Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLNet evaluation on SQuAD #9351

XLNet evaluation on SQuAD #9351

slvcsl commented Dec 29, 2020

patrickvonplaten commented Dec 30, 2020

sgugger commented Jan 4, 2021 •

edited

slvcsl commented Jan 5, 2021

sgugger commented Jan 5, 2021

slvcsl commented Jan 5, 2021 •

edited

github-actions bot commented Mar 6, 2021

wenting-zhao commented Oct 2, 2021

LysandreJik commented Oct 5, 2021

XLNet evaluation on SQuAD #9351

XLNet evaluation on SQuAD #9351

Comments

slvcsl commented Dec 29, 2020

Environment info

Who can help

Information

To reproduce

Expected behavior

patrickvonplaten commented Dec 30, 2020

sgugger commented Jan 4, 2021 • edited

slvcsl commented Jan 5, 2021

sgugger commented Jan 5, 2021

slvcsl commented Jan 5, 2021 • edited

github-actions bot commented Mar 6, 2021

wenting-zhao commented Oct 2, 2021

LysandreJik commented Oct 5, 2021

sgugger commented Jan 4, 2021 •

edited

slvcsl commented Jan 5, 2021 •

edited