Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[examples/s2s] add test set predictions #10085

Merged
merged 3 commits into from Feb 9, 2021

Conversation

patil-suraj
Copy link
Contributor

@patil-suraj patil-suraj commented Feb 9, 2021

What does this PR do?

This PR adds the do_predict option to the run_seq2seq.py script for test set predictions.

Fixes #10032

cc. @stas00

@@ -523,7 +551,7 @@ def compute_metrics(eval_preds):
if training_args.do_eval:
logger.info("*** Evaluate ***")

results = trainer.evaluate()
results = trainer.evaluate(max_length=data_args.val_max_target_length, num_beams=data_args.eval_beams)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval_beams was never used before this PR, we should pass it to evaluate and predict.

Copy link
Contributor

@stas00 stas00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good, but I didn't have a chance to run side-by-side tests.

I propose that the best approach would be to finish everything that is planned and then we will run tests side by side and note any small discrepancies if any and fix them in one go? Does that work?

I'm waiting for the datasets hub to port the datasets to be able to compare the old and the new.

source_lang: Optional[str] = field(default=None, metadata={"help": "Source language id for translation."})
target_lang: Optional[str] = field(default=None, metadata={"help": "Target language id for translation."})
eval_beams: Optional[int] = field(default=None, metadata={"help": "Number of beams to use for evaluation."})
eval_beams: Optional[int] = field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's num_beams everywhere in the core, so perhaps while we are re-shuffling things we could match that name?

It's always eval when beam search is used, so it can't be train_beams, so eval is a redundant info in the current name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_beams seems good to me. In the previous script, we called it eval_beams because we had two beams arguments one for eval and one for the test. But here we just have one, so num_beams makes sense to me. Let's wait for @sgugger opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion. It's another name variable change from the old script but we already have a few renames. If you think it's better as num_beams, let's go for the change!

@patil-suraj
Copy link
Contributor Author

I propose that the best approach would be to finish everything that is planned and then we will run tests side by side and note any small discrepancies if any and fix them in one go? Does that work?

Yes, this was the last major missing piece from this script. Now I'm going to start running both scripts side by side (manually converting the old datasets to new datasets format) and note the discrepancies, I'll also wait for your tests.

I'm waiting for the datasets hub to port the datasets to be able to compare the old and the new.

Let's not wait for the hub, for now, we could just manually convert the datasets for tests and later upload them to the hub once it's ready.

@stas00
Copy link
Contributor

stas00 commented Feb 9, 2021

I propose that the best approach would be to finish everything that is planned and then we will run tests side by side and note any small discrepancies if any and fix them in one go? Does that work?

Yes, this was the last major missing piece from this script. Now I'm going to start running both scripts side by side (manually converting the old datasets to new datasets format) and note the discrepancies, I'll also wait for your tests.

That works.

I'm waiting for the datasets hub to port the datasets to be able to compare the old and the new.

Let's not wait for the hub, for now, we could just manually convert the datasets for tests and later upload them to the hub once it's ready.

Sure - I already wrote the code for wmt en-ro #10044 (comment) need to adapt to others.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this to the script! It looks great to me, I just have one small nit.

source_lang: Optional[str] = field(default=None, metadata={"help": "Source language id for translation."})
target_lang: Optional[str] = field(default=None, metadata={"help": "Target language id for translation."})
eval_beams: Optional[int] = field(default=None, metadata={"help": "Number of beams to use for evaluation."})
eval_beams: Optional[int] = field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion. It's another name variable change from the old script but we already have a few renames. If you think it's better as num_beams, let's go for the change!

Comment on lines 182 to 183
"help": "Number of beams to use for evaluation. This argument is used to override the ``num_beams`` "
"param of ``model.generate``, which is used during ``evaluate`` and ``predict``."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"help": "Number of beams to use for evaluation. This argument is used to override the ``num_beams`` "
"param of ``model.generate``, which is used during ``evaluate`` and ``predict``."
"help": "Number of beams to use for evaluation. This argument will be passed to ``model.generate``, which is used during ``evaluate`` and ``predict``."

column_names = datasets["validation"].column_names
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do an elif training_args.do_predict here and have an else that prints something like "There is nothing to do, please pass along --do_train, --do_eval and/or --do_predict." and returns early?

@patil-suraj
Copy link
Contributor Author

Changed eval_beams to num_beams. Hopefully final name change. Merging!

@patil-suraj patil-suraj merged commit 63fddcf into huggingface:master Feb 9, 2021
@patil-suraj patil-suraj deleted the run_s2s_do_predict branch February 9, 2021 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

generation length always equal to 20 when using run_seq2seq.py script
3 participants