-
Notifications
You must be signed in to change notification settings - Fork 30.7k
Closed
Description
Environment info
transformers
version: 4.10.0- Platform: Windows-10-10.0.19042-SP0
- Python version: 3.9.6
- PyTorch version (GPU?): 1.9.0+cpu (False)
- Tensorflow version (GPU?): 2.6.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
@patrickvonplaten @patil-suraj
Information
I am using EncoderDecoderModel (encoder=TransfoXLModel, decoder=TransfoXLLMHeadModel) to train a generative model for text summarization using the 'multi_x_science_sum' huggingface dataset
When the training starts below error is given and training stops
TypeError: forward() got an unexpected keyword argument 'attention_mask'
To reproduce
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
txl2txl = EncoderDecoderModel.from_encoder_decoder_pretrained('transfo-xl-wt103', 'transfo-xl-wt103')
training_args = Seq2SeqTrainingArguments(
predict_with_generate=True,
evaluation_strategy="steps",
per_device_train_batch_size=batch_size, # 4
per_device_eval_batch_size=batch_size, # 4
output_dir="output",
logging_steps=2,
save_steps=10,
eval_steps=4,
num_train_epochs=1
)
trainer = Seq2SeqTrainer(
model=txl2txl,
tokenizer=tokenizer,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_data_processed,
eval_dataset=validation_data_processed
)
trainer.train()
TypeError: forward() got an unexpected keyword argument 'attention_mask'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\Users\DILEEP~1\AppData\Local\Temp/ipykernel_21416/3777690609.py in <module>
7 eval_dataset=validation_data_processed
8 )
----> 9 trainer.train()
~\.conda\envs\msresearch\lib\site-packages\transformers\trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1282 tr_loss += self.training_step(model, inputs)
1283 else:
-> 1284 tr_loss += self.training_step(model, inputs)
1285 self.current_flos += float(self.floating_point_ops(inputs))
1286
~\.conda\envs\msresearch\lib\site-packages\transformers\trainer.py in training_step(self, model, inputs)
1787 loss = self.compute_loss(model, inputs)
1788 else:
-> 1789 loss = self.compute_loss(model, inputs)
1790
1791 if self.args.n_gpu > 1:
~\.conda\envs\msresearch\lib\site-packages\transformers\trainer.py in compute_loss(self, model, inputs, return_outputs)
1819 else:
1820 labels = None
-> 1821 outputs = model(**inputs)
1822 # Save past state if it exists
1823 # TODO: this needs to be fixed and made cleaner later.
~\.conda\envs\msresearch\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
~\.conda\envs\msresearch\lib\site-packages\transformers\models\encoder_decoder\modeling_encoder_decoder.py in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, **kwargs)
423
424 if encoder_outputs is None:
--> 425 encoder_outputs = self.encoder(
426 input_ids=input_ids,
427 attention_mask=attention_mask,
~\.conda\envs\msresearch\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
TypeError: forward() got an unexpected keyword argument 'attention_mask'
As a sidenote, when I do the same task with following setting, the training starts without a problem
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
bert2bert= EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased')
Please provide me assistance on how to do the training with TransformerXL to TransformerXL model
Metadata
Metadata
Assignees
Labels
No labels