New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[examples/s2s] add test set predictions #10085
[examples/s2s] add test set predictions #10085
Conversation
examples/seq2seq/run_seq2seq.py
Outdated
@@ -523,7 +551,7 @@ def compute_metrics(eval_preds): | |||
if training_args.do_eval: | |||
logger.info("*** Evaluate ***") | |||
|
|||
results = trainer.evaluate() | |||
results = trainer.evaluate(max_length=data_args.val_max_target_length, num_beams=data_args.eval_beams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eval_beams
was never used before this PR, we should pass it to evaluate
and predict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good, but I didn't have a chance to run side-by-side tests.
I propose that the best approach would be to finish everything that is planned and then we will run tests side by side and note any small discrepancies if any and fix them in one go? Does that work?
I'm waiting for the datasets hub to port the datasets to be able to compare the old and the new.
examples/seq2seq/run_seq2seq.py
Outdated
source_lang: Optional[str] = field(default=None, metadata={"help": "Source language id for translation."}) | ||
target_lang: Optional[str] = field(default=None, metadata={"help": "Target language id for translation."}) | ||
eval_beams: Optional[int] = field(default=None, metadata={"help": "Number of beams to use for evaluation."}) | ||
eval_beams: Optional[int] = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's num_beams
everywhere in the core, so perhaps while we are re-shuffling things we could match that name?
It's always eval when beam search is used, so it can't be train_beams, so eval
is a redundant info in the current name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_beams
seems good to me. In the previous script, we called it eval_beams
because we had two beams arguments one for eval and one for the test. But here we just have one, so num_beams
makes sense to me. Let's wait for @sgugger opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong opinion. It's another name variable change from the old script but we already have a few renames. If you think it's better as num_beams
, let's go for the change!
Yes, this was the last major missing piece from this script. Now I'm going to start running both scripts side by side (manually converting the old datasets to new datasets format) and note the discrepancies, I'll also wait for your tests.
Let's not wait for the hub, for now, we could just manually convert the datasets for tests and later upload them to the hub once it's ready. |
That works.
Sure - I already wrote the code for wmt en-ro #10044 (comment) need to adapt to others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this to the script! It looks great to me, I just have one small nit.
examples/seq2seq/run_seq2seq.py
Outdated
source_lang: Optional[str] = field(default=None, metadata={"help": "Source language id for translation."}) | ||
target_lang: Optional[str] = field(default=None, metadata={"help": "Target language id for translation."}) | ||
eval_beams: Optional[int] = field(default=None, metadata={"help": "Number of beams to use for evaluation."}) | ||
eval_beams: Optional[int] = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong opinion. It's another name variable change from the old script but we already have a few renames. If you think it's better as num_beams
, let's go for the change!
examples/seq2seq/run_seq2seq.py
Outdated
"help": "Number of beams to use for evaluation. This argument is used to override the ``num_beams`` " | ||
"param of ``model.generate``, which is used during ``evaluate`` and ``predict``." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"help": "Number of beams to use for evaluation. This argument is used to override the ``num_beams`` " | |
"param of ``model.generate``, which is used during ``evaluate`` and ``predict``." | |
"help": "Number of beams to use for evaluation. This argument will be passed to ``model.generate``, which is used during ``evaluate`` and ``predict``." |
examples/seq2seq/run_seq2seq.py
Outdated
column_names = datasets["validation"].column_names | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe do an elif training_args.do_predict
here and have an else that prints something like "There is nothing to do, please pass along --do_train, --do_eval and/or --do_predict." and returns early?
Changed |
What does this PR do?
This PR adds the
do_predict
option to therun_seq2seq.py
script for test set predictions.Fixes #10032
cc. @stas00