Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds Transformers Translation Tutorial Repro #24254

Closed
4 tasks done
SoyGema opened this issue Jun 13, 2023 · 5 comments

Comments

@SoyGema
Copy link
Contributor

SoyGema commented Jun 13, 2023

System Info

Context

Hello There!
First and foremost, congrats for Transformers Translation tutorial. 👍
It serves as a Spark for building english-to-many translation languages models!
I´m following it along with TF mostly reproducing it in a jupyter Notebook with TF for mac with GPU enabled
Using the following dependency versions.

tensorflow-macos==2.9.0
tensorflow-metal==0.5.0
transformers ==4.29.2

* NOTE : tensorflow-macos dependencies are fixed for ensuring GPU training

Who can help?

@ArthurZucker @younesbelkada
@gante maybe?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Issue Description

Im finding the following error when fitting the model for finetunning a model coming from TFAutoModelForSeq2SeqLM autoclass

with tf.device('/device:GPU:0'):
    model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=1, callbacks= callbacks ) 

It is returning

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds
        
        
        Call arguments received by layer "decoder" (type TFT5MainLayer):
          • self=None
          • input_ids=None
          • attention_mask=None
          • encoder_hidden_states=tf.Tensor(shape=(32, 96, 512), dtype=float32)
          • encoder_attention_mask=tf.Tensor(shape=(32, 96), dtype=int32)
          • inputs_embeds=None
          • head_mask=None
          • encoder_head_mask=None
          • past_key_values=None
          • use_cache=True
          • output_attentions=False
          • output_hidden_states=False
          • return_dict=True
          • training=False
    
    
    Call arguments received by layer "tft5_for_conditional_generation" (type TFT5ForConditionalGeneration):
      • self={'input_ids': 'tf.Tensor(shape=(32, 96), dtype=int64)', 'attention_mask': 'tf.Tensor(shape=(32, 96), dtype=int64)'}
      • input_ids=None
      • attention_mask=None
      • decoder_input_ids=None
      • decoder_attention_mask=None
      • head_mask=None
      • decoder_head_mask=None
      • encoder_outputs=None
      • past_key_values=None
      • inputs_embeds=None
      • decoder_inputs_embeds=None
      • labels=None
      • use_cache=None
      • output_attentions=None
      • output_hidden_states=None
      • return_dict=None
      • training=False

Backtrace

Tried:

model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)

Seems to be working correctly. Therefore I assume that the pre-trained model is loaded

Expected behavior

Model trained should be uploaded to the Hub.
The folder appears empty , there is an error

Hypothesis

At this point, what Im guessing is that once I load the model I shall redefine the verbose error trace?
Any help please of how to do this ? :) or how can I fix it ? Do I have to define a specific Trainer ? Any idea of where I can find this in docs?

@gante
Copy link
Member

gante commented Jun 14, 2023

Hey @SoyGema 👋

From your exception, I believe the issue is at the data preparation stage -- it is pretty much complaining that your dataset has no labels. Have you followed the data preprocessing steps described here?

@SoyGema
Copy link
Contributor Author

SoyGema commented Jun 16, 2023

Hello there @gante ! Thanks for your quick response and help !
I really appreciate it . 🥇
I´ve uploaded the notebook here . As far as I can understand (let me know if Im missing something here ), Im using the preprocessing function.

In fact, the _tokenized_books (cell 16) returns something in the form of

DatasetDict({
    train: Dataset({
        features: ['id', 'translation', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 1123
    })
    test: Dataset({
        features: ['id', 'translation', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 281
    })
})

And data_collator (cell 19) returns something like

DataCollatorForSeq2Seq(tokenizer=T5Tokenizer(name_or_path='t5-small', vocab_size=32100, model_max_length=512, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>', 'additional_special_tokens': ['<extra_id_0>', .....

Am I missing something from the video that should be in code ?
for quick testing purposes, Im with pt_to_en dataset, that seems to have same characteristics. I've checked that tokenized_books function returns the same data structure type in pt_to_en that in fr_to_en dataset

My apologies in advance for the extremely notebook verbose code regarding GPU low level operation use. I am trying to optimize for that therefore all trace.

Thanks so so much for your time on this
Happy if you can point me on the right direction! 👍

@gante
Copy link
Member

gante commented Jun 16, 2023

Hey @SoyGema 👋

Your KerasMetricCallback was missing predict_with_generate=True -- metrics that rely on text generation must pass this flag, as generating text is different from a model forward pass. It should become metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_test_set, predict_with_generate=True)

For future reference in case you encounter further bugs, have a look at our complete translation example: https://github.com/huggingface/transformers/blob/main/examples/tensorflow/translation/run_translation.py

@SoyGema
Copy link
Contributor Author

SoyGema commented Jun 18, 2023

Hello there @gante 👋

Thanks for the reference. I'm definetly having this as a north script and also using it !
Been thinking about how to structure this exploration and also indexing the roadblocks/bugs/solutions so other users can benefit from it .

I'm closing this issue (as it is solved but other arised )and probably open another ones in my own repo as it goes so issues are unitary-structured . Hope this makes sense. Hope I can take it from there and not disturb you!

Thanks again!

@SoyGema
Copy link
Contributor Author

SoyGema commented Jul 2, 2023

Just for Reproducibility. If someone wants to go through the script example. Documentation about flag configuration and more can be found here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants