Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --max_length argument in seq2seq trainer. #13252

Closed
sbmaruf opened this issue Aug 25, 2021 · 2 comments
Closed

Add --max_length argument in seq2seq trainer. #13252

sbmaruf opened this issue Aug 25, 2021 · 2 comments

Comments

@sbmaruf
Copy link

sbmaruf commented Aug 25, 2021

馃殌 Feature request

Currently seq2seq Trainer uses --max_length for prediction step. However in the trainer there is no argument --max_length in here and here.

During training (with --predict_with_generate) when the evaluate function is called, it performs prediction step with model.config.max_length by this line. Unless you call the trainer.evaluate(eval_dataset = eval_dataset, max_length=max_target_length) manually, in the training time it uses model.config.max_length.

Also without reviewing the source code, it is difficult to grasp this.

So in the training time, for prediction_loop, the model performs evaluation based on this. It uses self.model.config.max_length for doing prediction. It is kind of confusing I would say. Let's look into this,

>>> import transformers
>>> transformers.__version__
'4.10.0.dev0'
>>> model = transformers.AutoModel.from_pretrained("google/mt5-large")
Some weights of the model checkpoint at google/mt5-large were not used when initializing MT5Model: ['lm_head.weight']
- This IS expected if you are initializing MT5Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing MT5Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
>>> model.config.max_length
20

A user who is not careful about this argument would totally miss this. Personally I spent quite a few time on this. My compute_metrics() values at the training time on dev set was not good but at the end of training prediction on the test dataset score (using my own call trainer.evaluate()) was high.

Motivation

Adding --max_length in Seq2SeqTrainer would help the user to be-aware of this parameter.

@sgugger

@sgugger
Copy link
Collaborator

sgugger commented Aug 30, 2021

This is added by the PR mentioned above.

@sbmaruf
Copy link
Author

sbmaruf commented Aug 30, 2021

Thanks a lot for the new feature. Closing the issue.

@sbmaruf sbmaruf closed this as completed Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants