You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
03/08/2021 09:54:08 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 2distributed training: False, 16-bits training: False
03/08/2021 09:54:08 - INFO - __main__ - Training/evaluation parameters TrainingArguments(output_dir=./models/, overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=1, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=3e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=4.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/Mar08_09-54-06_inf-105-gpu-1, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=./models/, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=[], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, _n_gpu=2)
03/08/2021 09:54:08 - WARNING - datasets.builder - Reusing dataset squad (/home/blozano/.cache/huggingface/datasets/squad/plain_text/1.0.0/0fd9e01360d229a22adfe0ab7e2dd2adc6e2b3d6d3db03636a51235947d4c6e9)
[INFO|configuration_utils.py:463] 2021-03-08 09:54:09,206 >> loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/blozano/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.637c6035640bacb831febcc2b7f7bee0a96f9b30c2d7e9ef84082d9f252f3170
[INFO|configuration_utils.py:499] 2021-03-08 09:54:09,207 >> Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.4.0.dev0",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
[INFO|configuration_utils.py:463] 2021-03-08 09:54:09,509 >> loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/blozano/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.637c6035640bacb831febcc2b7f7bee0a96f9b30c2d7e9ef84082d9f252f3170
[INFO|configuration_utils.py:499] 2021-03-08 09:54:09,510 >> Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.4.0.dev0",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
[INFO|tokenization_utils_base.py:1721] 2021-03-08 09:54:10,138 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/blozano/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
[INFO|tokenization_utils_base.py:1721] 2021-03-08 09:54:10,138 >> loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/blozano/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
[INFO|modeling_utils.py:1051] 2021-03-08 09:54:10,501 >> loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/blozano/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
[WARNING|modeling_utils.py:1158] 2021-03-08 09:54:12,594 >> Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForQuestionAnswering: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1169] 2021-03-08 09:54:12,594 >> Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
03/08/2021 09:54:12 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/blozano/.cache/huggingface/datasets/squad/plain_text/1.0.0/0fd9e01360d229a22adfe0ab7e2dd2adc6e2b3d6d3db03636a51235947d4c6e9/cache-a560de6b2f76743b.arrow
03/08/2021 09:54:12 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/blozano/.cache/huggingface/datasets/squad/plain_text/1.0.0/0fd9e01360d229a22adfe0ab7e2dd2adc6e2b3d6d3db03636a51235947d4c6e9/cache-15b011eed342eca6.arrow
[INFO|trainer.py:471] 2021-03-08 09:54:15,885 >> The following columns in the evaluation set don't have a corresponding argument in `BertForQuestionAnswering.forward` and have been ignored: example_id, offset_mapping.
[INFO|trainer.py:929] 2021-03-08 09:54:15,937 >> ***** Running training *****
[INFO|trainer.py:930] 2021-03-08 09:54:15,937 >> Num examples = 88524
[INFO|trainer.py:931] 2021-03-08 09:54:15,937 >> Num Epochs = 4
[INFO|trainer.py:932] 2021-03-08 09:54:15,937 >> Instantaneous batch size per device = 1
[INFO|trainer.py:933] 2021-03-08 09:54:15,937 >> Total train batch size (w. parallel, distributed & accumulation) = 2
[INFO|trainer.py:934] 2021-03-08 09:54:15,937 >> Gradient Accumulation steps = 1
[INFO|trainer.py:935] 2021-03-08 09:54:15,937 >> Total optimization steps = 177048
0%| | 0/177048 [00:00<?, ?it/s]Traceback (most recent call last):
File "transformers/examples/question-answering/run_qa.py", line 507, in <module>
main()
File "transformers/examples/question-answering/run_qa.py", line 481, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/blozano/finetune_qa/transformers/src/transformers/trainer.py", line 1036, in train
tr_loss += self.training_step(model, inputs)
File "/home/blozano/finetune_qa/transformers/src/transformers/trainer.py", line 1420, in training_step
loss = self.compute_loss(model, inputs)
File "/home/blozano/finetune_qa/transformers/src/transformers/trainer.py", line 1452, in compute_loss
outputs = model(**inputs)
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/blozano/finetune_qa/transformers/src/transformers/models/bert/modeling_bert.py", line 1775, in forward
outputs = self.bert(
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/blozano/finetune_qa/transformers/src/transformers/models/bert/modeling_bert.py", line 971, in forward
encoder_outputs = self.encoder(
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/blozano/finetune_qa/transformers/src/transformers/models/bert/modeling_bert.py", line 568, in forward
layer_outputs = layer_module(
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/blozano/finetune_qa/transformers/src/transformers/models/bert/modeling_bert.py", line 456, in forward
self_attention_outputs = self.attention(
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/blozano/finetune_qa/transformers/src/transformers/models/bert/modeling_bert.py", line 387, in forward
self_outputs = self.self(
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/blozano/finetune_qa/transformers/src/transformers/models/bert/modeling_bert.py", line 253, in forward
mixed_query_layer = self.query(hidden_states)
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
return F.linear(input, self.weight, self.bias)
File "/home/blozano/finetune_qa/env/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`
NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2
Expected behavior
The expected default behavior as stated in transformers/examples/question-answering/README.md
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Environment info
transformers
version: 4.4.0.dev0Who can help
Information
Model I am using (Bert, XLNet ...):
The problem arises when using:
The tasks I am working on is:
To reproduce
Expected behavior
The expected default behavior as stated in transformers/examples/question-answering/README.md
The text was updated successfully, but these errors were encountered: