Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLNet evaluation on SQuAD #9351

Closed
2 of 4 tasks
slvcsl opened this issue Dec 29, 2020 · 8 comments
Closed
2 of 4 tasks

XLNet evaluation on SQuAD #9351

slvcsl opened this issue Dec 29, 2020 · 8 comments
Labels
Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!

Comments

@slvcsl
Copy link

slvcsl commented Dec 29, 2020

Environment info

  • transformers version: 4.2.0dev0
  • Platform: Linux-5.3.0-64-generic-x86_64-with-debian-buster-sid
  • Python version: 3.7.4
  • PyTorch version (GPU?): 1.7.1+cu101 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help

XLNet @LysandreJik

Information

Model I am using (Bert, XLNet ...): XLNet

The problem arises when using:

  • the official example scripts: run_qa.py
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: squad v2
  • my own task or dataset: (give details below)

To reproduce

I installed the transformer package from source, as required.
When I try to evaluate XLNet on the SQUAD dataset, however, I get a problem.
In particular, I run the official script as:

python run_qa.py \
                --model_name_or_path xlnet-base-cased \
                --dataset_name squad_v2 \
                --do_eval \
                --version_2_with_negative \
                --learning_rate 1e-4 \
                --per_device_eval_batch_size=1  \
                --seed 1 \
                --output_dir ../../../../squad_results

This is the whole output, most of which is probably non relevant, for reference (error in bold)

12/29/2020 22:41:21 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 2distributed training: False, 16-bits training: False
12/29/2020 22:41:21 - INFO - main - Training/evaluation parameters TrainingArguments(output_dir=../../../../squad_results, overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, model_parallel=False, evaluation_strategy=EvaluationStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=1, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=1e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_steps=0, logging_dir=runs/Dec29_22-41-21_HLTNLP-GPU-B, logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, seed=1, fp16=False, fp16_opt_level=O1, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=../../../../squad_results, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, fp16_backend=auto, sharded_ddp=False, label_smoothing_factor=0.0, adafactor=False)
Reusing dataset squad_v2 (/home/scasola/.cache/huggingface/datasets/squad_v2/squad_v2/2.0.0/0e44b51f4035c15e218d53dc9eea5fe7123341982e524818b8500e4094fffb7b)
loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /home/scasola/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}

loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /home/scasola/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}

loading file https://huggingface.co/xlnet-base-cased/resolve/main/spiece.model from cache at /home/scasola/.cache/huggingface/transformers/df73bc9f8d13bf2ea4dab95624895e45a550a0f0a825e41fc25440bf367ee3c8.d93497120e3a865e2970f26abdf7bf375896f97fde8b874b70909592a6c785c9
loading file https://huggingface.co/xlnet-base-cased/resolve/main/tokenizer.json from cache at /home/scasola/.cache/huggingface/transformers/46f47734f3dcaef7e236b9a3e887f27814e18836a8db7e6a49148000058a1a54.2a683f915238b4f560dab0c724066cf0a7de9a851e96b0fb3a1e7f0881552f53
loading weights file https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin from cache at /home/scasola/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForQuestionAnsweringSimple: ['lm_loss.weight', 'lm_loss.bias']

  • This IS expected if you are initializing XLNetForQuestionAnsweringSimple from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing XLNetForQuestionAnsweringSimple from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of XLNetForQuestionAnsweringSimple were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Loading cached processed dataset at /home/scasola/.cache/huggingface/datasets/squad_v2/squad_v2/2.0.0/0e44b51f4035c15e218d53dc9eea5fe7123341982e524818b8500e4094fffb7b/cache-c46fe459ef8061d5.arrow
    The following columns in the evaluation set don't have a corresponding argument in XLNetForQuestionAnsweringSimple.forward and have been ignored: example_id, offset_mapping.
    12/29/2020 22:41:30 - INFO - main - *** Evaluate ***
    The following columns in the evaluation set don't have a corresponding argument in XLNetForQuestionAnsweringSimple.forward and have been ignored: example_id, offset_mapping.
    ***** Running Evaluation *****
    Num examples = 12231
    Batch size = 2
    █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6116/6116 [38:14<00:00, 3.32it/s]12/29/2020 23:19:57 - INFO - utils_qa - Post-processing 11873 example predictions split into 12231 features.
    0%| | 0/11873 [00:00<?, ?it/s]Traceback (most recent call last): | 0/11873 [00:00<?, ?it/s] File "run_qa.py", line 480, in
    main()
    File "run_qa.py", line 461, in main
    results = trainer.evaluate()
    File "/home/scasola/survey/squad/xlnet/transformers/examples/question-answering/trainer_qa.py", line 62, in evaluate
    eval_preds = self.post_process_function(eval_examples, eval_dataset, output.predictions)
    File "run_qa.py", line 407, in post_processing_function
    is_world_process_zero=trainer.is_world_process_zero(),
    File "/home/scasola/survey/squad/xlnet/transformers/examples/question-answering/utils_qa.py", line 195, in postprocess_qa_predictions
    while predictions[i]["text"] == "":
    IndexError: list index out of range

Expected behavior

Evalaution of the model saved in the output dir

@patrickvonplaten
Copy link
Contributor

Pinging @sgugger here. Think he has more knowledge about the training script than I do.

@sgugger
Copy link
Collaborator

sgugger commented Jan 4, 2021

This is linked to this issue in the tokenizers repo. Until this is solved, the script run_qa does not work properly with XLNet (the offset mappings computed are incorrect). You can use run_qa_beam_search with the XLNet model while waiting for the issue to be solved.

@slvcsl
Copy link
Author

slvcsl commented Jan 5, 2021

Hi @sgugger, thanks for your answer. However, I'm trying to do a (fair) comparison between models, so using beam search is not an option. I might install another package version that works well with XLNet on SQuAD (I've seen, for example, that v. 3.10 also has some problems in evaluation). Do you know if any previous version is ok, at the moment?

@sgugger
Copy link
Collaborator

sgugger commented Jan 5, 2021

You can always use the legacy script if you can't wait for the fix.

@slvcsl
Copy link
Author

slvcsl commented Jan 5, 2021

Thank you very much, I was unaware of legacy scripts.

Do I need a particular transformers version to run them? When I run run_squad.py at the moment I get (errors in bolds)

01/05/2021 15:51:31 - WARNING - main - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False
[INFO|configuration_utils.py:431] 2021-01-05 15:51:31,306 >> loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /home/scasola/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
[INFO|configuration_utils.py:467] 2021-01-05 15:51:31,307 >> Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}

[INFO|configuration_utils.py:431] 2021-01-05 15:51:31,607 >> loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /home/scasola/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
[INFO|configuration_utils.py:467] 2021-01-05 15:51:31,608 >> Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}

[INFO|tokenization_utils_base.py:1802] 2021-01-05 15:51:32,221 >> loading file https://huggingface.co/xlnet-base-cased/resolve/main/spiece.model from cache at /home/scasola/.cache/huggingface/transformers/df73bc9f8d13bf2ea4dab95624895e45a550a0f0a825e41fc25440bf367ee3c8.d93497120e3a865e2970f26abdf7bf375896f97fde8b874b70909592a6c785c9
[INFO|tokenization_utils_base.py:1802] 2021-01-05 15:51:32,222 >> loading file https://huggingface.co/xlnet-base-cased/resolve/main/tokenizer.json from cache at /home/scasola/.cache/huggingface/transformers/46f47734f3dcaef7e236b9a3e887f27814e18836a8db7e6a49148000058a1a54.2a683f915238b4f560dab0c724066cf0a7de9a851e96b0fb3a1e7f0881552f53
[INFO|modeling_utils.py:1024] 2021-01-05 15:51:32,564 >> loading weights file https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin from cache at /home/scasola/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
[WARNING|modeling_utils.py:1132] 2021-01-05 15:51:35,070 >> Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForQuestionAnsweringSimple: ['lm_loss.weight', 'lm_loss.bias']
...
01/05/2021 15:51:37 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='../../../../../squad_data', device=device(type='cuda'), do_eval=True, do_lower_case=False,
do_train=True, doc_stride=128, eval_all_checkpoints=True, evaluate_during_training=True, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=4, lang_id=0, learning_rate=0.001, local_rank=-1, logging_steps=500, max_answer_length=30, max_grad_norm=1.0, max_query_length=64, max_seq_length=384, max_steps=-1, model_name_or_path='xlnet-base-cased', model_type='xlnet', n_best_size=20, n_gpu=1, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=10.0,
output_dir='../../../../squad_results/XLNet/1e-3/1', overwrite_cache=True, overwrite_output_dir=False, per_gpu_eval_batch_size=8, per_gpu_train_batch_size=8, predict_file=None, save_steps=4132, seed=1, server_ip='', server_port='', threads=1, tokenizer_name='', train_file=None, verbose_logging=False, version_2_with_negative=True, warmup_steps=4132, weight_decay=0.0)
01/05/2021 15:51:37 - INFO - main - Creating features from dataset file at ../../../../../squad_data
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 442/442 [00:39<00:00, 11.33it/s]convert squad examples to features: 0%| | 0/130319 [00:00<?, ?it/s]multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/scasola/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/scasola/anaconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 189, in squad_convert_example_to_features
return_token_type_ids=True,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2462, in encode_plus
**kwargs,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py", line 465, in _encode_plus
**kwargs,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py", line 378, in _batch_encode_plus
is_pretokenized=is_split_into_words,
TypeError: TextInputSequence must be str
"""

**The above exception was the direct cause of the following exception:

Traceback (most recent call last):**
File "run_squad.py", line 833, in
main()
File "run_squad.py", line 772, in main
train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)
File "run_squad.py", line 461, in load_and_cache_examples
threads=args.threads,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 382, in squad_convert_examples_to_features
disable=not tqdm_enabled,
File "/home/scasola/survey/squad/mypython/lib/python3.7/site-packages/tqdm/std.py", line 1133, in iter
for obj in iterable:
File "/home/scasola/anaconda3/lib/python3.7/multiprocessing/pool.py", line 325, in
return (item for chunk in result for item in chunk)
File "/home/scasola/anaconda3/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
TypeError: TextInputSequence must be str

This might be related to the tokenizer, as in #7735 .
However, the used tokenizer should not be fast (see code snippet) even if it seems from the traceback that the fast tokenizer is actually called. Any workaround?
tokenizer = AutoTokenizer.from_pretrained( args.tokenizer_name if args.tokenizer_name else args.model_name_or_path, do_lower_case=args.do_lower_case, cache_dir=args.cache_dir if args.cache_dir else None, use_fast=False, # SquadDataset is not compatible with Fast tokenizers which have a smarter overflow handeling )

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

@wenting-zhao
Copy link

am having the same issue and a fix would be really nice...

@LysandreJik
Copy link
Member

Thank you for opening an issue - Unfortunately, we're limited on bandwidth and fixing QA for XLNet is quite low on our priority list. If you would like to go ahead and fix this issue, we would love to review a PR, but we won't find the time to get to it right away.

@LysandreJik LysandreJik added Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! and removed wontfix labels Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!
Projects
None yet
Development

No branches or pull requests

5 participants