You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently seq2seq Trainer uses --max_length for prediction step. However in the trainer there is no argument --max_length in here and here.
During training (with --predict_with_generate) when the evaluate function is called, it performs prediction step with model.config.max_length by this line. Unless you call the trainer.evaluate(eval_dataset = eval_dataset, max_length=max_target_length) manually, in the training time it uses model.config.max_length.
Also without reviewing the source code, it is difficult to grasp this.
So in the training time, for prediction_loop, the model performs evaluation based on this. It uses self.model.config.max_length for doing prediction. It is kind of confusing I would say. Let's look into this,
>>> import transformers
>>> transformers.__version__
'4.10.0.dev0'
>>> model = transformers.AutoModel.from_pretrained("google/mt5-large")
Some weights of the model checkpoint at google/mt5-large were not used when initializing MT5Model: ['lm_head.weight']
- This IS expected if you are initializing MT5Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing MT5Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
>>> model.config.max_length
20
A user who is not careful about this argument would totally miss this. Personally I spent quite a few time on this. My compute_metrics() values at the training time on dev set was not good but at the end of training prediction on the test dataset score (using my own call trainer.evaluate()) was high.
Motivation
Adding --max_length in Seq2SeqTrainer would help the user to be-aware of this parameter.
馃殌 Feature request
Currently seq2seq Trainer uses
--max_length
for prediction step. However in the trainer there is no argument--max_length
in here and here.During training (with
--predict_with_generate
) when the evaluate function is called, it performs prediction step withmodel.config.max_length
by this line. Unless you call thetrainer.evaluate(eval_dataset = eval_dataset, max_length=max_target_length)
manually, in the training time it usesmodel.config.max_length
.Also without reviewing the source code, it is difficult to grasp this.
So in the training time, for
prediction_loop
, the model performs evaluation based on this. It usesself.model.config.max_length
for doing prediction. It is kind of confusing I would say. Let's look into this,A user who is not careful about this argument would totally miss this. Personally I spent quite a few time on this. My
compute_metrics()
values at the training time on dev set was not good but at the end of training prediction on the test dataset score (using my own calltrainer.evaluate()
) was high.Motivation
Adding
--max_length
in Seq2SeqTrainer would help the user to be-aware of this parameter.@sgugger
The text was updated successfully, but these errors were encountered: