Add `--max_length` argument in seq2seq trainer. #13252

sbmaruf · 2021-08-25T05:40:14Z

🚀 Feature request

Currently seq2seq Trainer uses --max_length for prediction step. However in the trainer there is no argument --max_length in here and here.

During training (with --predict_with_generate) when the evaluate function is called, it performs prediction step with model.config.max_length by this line. Unless you call the trainer.evaluate(eval_dataset = eval_dataset, max_length=max_target_length) manually, in the training time it uses model.config.max_length.

Also without reviewing the source code, it is difficult to grasp this.

So in the training time, for prediction_loop, the model performs evaluation based on this. It uses self.model.config.max_length for doing prediction. It is kind of confusing I would say. Let's look into this,

>>> import transformers
>>> transformers.__version__
'4.10.0.dev0'
>>> model = transformers.AutoModel.from_pretrained("google/mt5-large")
Some weights of the model checkpoint at google/mt5-large were not used when initializing MT5Model: ['lm_head.weight']
- This IS expected if you are initializing MT5Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing MT5Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
>>> model.config.max_length
20

A user who is not careful about this argument would totally miss this. Personally I spent quite a few time on this. My compute_metrics() values at the training time on dev set was not good but at the end of training prediction on the test dataset score (using my own call trainer.evaluate()) was high.

Motivation

Adding --max_length in Seq2SeqTrainer would help the user to be-aware of this parameter.

@sgugger

The text was updated successfully, but these errors were encountered:

sgugger · 2021-08-30T19:18:57Z

This is added by the PR mentioned above.

sbmaruf · 2021-08-30T19:35:24Z

Thanks a lot for the new feature. Closing the issue.

sgugger mentioned this issue Aug 30, 2021

Add generate kwargs to Seq2SeqTrainingArguments #13339

Merged

sbmaruf closed this as completed Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--max_length` argument in seq2seq trainer. #13252

Add `--max_length` argument in seq2seq trainer. #13252

sbmaruf commented Aug 25, 2021 •

edited

sgugger commented Aug 30, 2021

sbmaruf commented Aug 30, 2021

Add --max_length argument in seq2seq trainer. #13252

Add --max_length argument in seq2seq trainer. #13252

Comments

sbmaruf commented Aug 25, 2021 • edited

🚀 Feature request

Motivation

sgugger commented Aug 30, 2021

sbmaruf commented Aug 30, 2021

Add `--max_length` argument in seq2seq trainer. #13252

Add `--max_length` argument in seq2seq trainer. #13252

sbmaruf commented Aug 25, 2021 •

edited