Skip to content

TypeError: forward() got an unexpected keyword argument 'attention_mask' #13812

@dpitawela

Description

@dpitawela

Environment info

  • transformers version: 4.10.0
  • Platform: Windows-10-10.0.19042-SP0
  • Python version: 3.9.6
  • PyTorch version (GPU?): 1.9.0+cpu (False)
  • Tensorflow version (GPU?): 2.6.0 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

@patrickvonplaten @patil-suraj

Information

I am using EncoderDecoderModel (encoder=TransfoXLModel, decoder=TransfoXLLMHeadModel) to train a generative model for text summarization using the 'multi_x_science_sum' huggingface dataset

When the training starts below error is given and training stops
TypeError: forward() got an unexpected keyword argument 'attention_mask'

To reproduce


tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
txl2txl = EncoderDecoderModel.from_encoder_decoder_pretrained('transfo-xl-wt103', 'transfo-xl-wt103')

training_args = Seq2SeqTrainingArguments(
    predict_with_generate=True,
    evaluation_strategy="steps",
    per_device_train_batch_size=batch_size, # 4
    per_device_eval_batch_size=batch_size,  # 4
    output_dir="output",
    logging_steps=2,
    save_steps=10,
    eval_steps=4,
    num_train_epochs=1
)

trainer = Seq2SeqTrainer(
    model=txl2txl,
    tokenizer=tokenizer,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_data_processed,
    eval_dataset=validation_data_processed
)
trainer.train()

TypeError: forward() got an unexpected keyword argument 'attention_mask'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\Users\DILEEP~1\AppData\Local\Temp/ipykernel_21416/3777690609.py in <module>
      7     eval_dataset=validation_data_processed
      8 )
----> 9 trainer.train()

~\.conda\envs\msresearch\lib\site-packages\transformers\trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1282                         tr_loss += self.training_step(model, inputs)
   1283                 else:
-> 1284                     tr_loss += self.training_step(model, inputs)
   1285                 self.current_flos += float(self.floating_point_ops(inputs))
   1286 

~\.conda\envs\msresearch\lib\site-packages\transformers\trainer.py in training_step(self, model, inputs)
   1787                 loss = self.compute_loss(model, inputs)
   1788         else:
-> 1789             loss = self.compute_loss(model, inputs)
   1790 
   1791         if self.args.n_gpu > 1:

~\.conda\envs\msresearch\lib\site-packages\transformers\trainer.py in compute_loss(self, model, inputs, return_outputs)
   1819         else:
   1820             labels = None
-> 1821         outputs = model(**inputs)
   1822         # Save past state if it exists
   1823         # TODO: this needs to be fixed and made cleaner later.

~\.conda\envs\msresearch\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~\.conda\envs\msresearch\lib\site-packages\transformers\models\encoder_decoder\modeling_encoder_decoder.py in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, **kwargs)
    423 
    424         if encoder_outputs is None:
--> 425             encoder_outputs = self.encoder(
    426                 input_ids=input_ids,
    427                 attention_mask=attention_mask,

~\.conda\envs\msresearch\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

TypeError: forward() got an unexpected keyword argument 'attention_mask'

As a sidenote, when I do the same task with following setting, the training starts without a problem
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
bert2bert= EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased')

Please provide me assistance on how to do the training with TransformerXL to TransformerXL model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions