[examples/s2s] add test set predictions #10085

patil-suraj · 2021-02-09T04:09:42Z

What does this PR do?

This PR adds the do_predict option to the run_seq2seq.py script for test set predictions.

patil-suraj · 2021-02-09T04:11:27Z

examples/seq2seq/run_seq2seq.py

@@ -523,7 +551,7 @@ def compute_metrics(eval_preds):
    if training_args.do_eval:
        logger.info("*** Evaluate ***")

-        results = trainer.evaluate()
+        results = trainer.evaluate(max_length=data_args.val_max_target_length, num_beams=data_args.eval_beams)


eval_beams was never used before this PR, we should pass it to evaluate and predict.

stas00

In general looks good, but I didn't have a chance to run side-by-side tests.

I propose that the best approach would be to finish everything that is planned and then we will run tests side by side and note any small discrepancies if any and fix them in one go? Does that work?

I'm waiting for the datasets hub to port the datasets to be able to compare the old and the new.

stas00 · 2021-02-09T04:16:50Z

examples/seq2seq/run_seq2seq.py

    source_lang: Optional[str] = field(default=None, metadata={"help": "Source language id for translation."})
    target_lang: Optional[str] = field(default=None, metadata={"help": "Target language id for translation."})
-    eval_beams: Optional[int] = field(default=None, metadata={"help": "Number of beams to use for evaluation."})
+    eval_beams: Optional[int] = field(


It's num_beams everywhere in the core, so perhaps while we are re-shuffling things we could match that name?

It's always eval when beam search is used, so it can't be train_beams, so eval is a redundant info in the current name.

num_beams seems good to me. In the previous script, we called it eval_beams because we had two beams arguments one for eval and one for the test. But here we just have one, so num_beams makes sense to me. Let's wait for @sgugger opinion.

I have no strong opinion. It's another name variable change from the old script but we already have a few renames. If you think it's better as num_beams, let's go for the change!

patil-suraj · 2021-02-09T04:27:50Z

I propose that the best approach would be to finish everything that is planned and then we will run tests side by side and note any small discrepancies if any and fix them in one go? Does that work?

Yes, this was the last major missing piece from this script. Now I'm going to start running both scripts side by side (manually converting the old datasets to new datasets format) and note the discrepancies, I'll also wait for your tests.

I'm waiting for the datasets hub to port the datasets to be able to compare the old and the new.

Let's not wait for the hub, for now, we could just manually convert the datasets for tests and later upload them to the hub once it's ready.

stas00 · 2021-02-09T04:29:54Z

I propose that the best approach would be to finish everything that is planned and then we will run tests side by side and note any small discrepancies if any and fix them in one go? Does that work?

Yes, this was the last major missing piece from this script. Now I'm going to start running both scripts side by side (manually converting the old datasets to new datasets format) and note the discrepancies, I'll also wait for your tests.

That works.

I'm waiting for the datasets hub to port the datasets to be able to compare the old and the new.

Let's not wait for the hub, for now, we could just manually convert the datasets for tests and later upload them to the hub once it's ready.

Sure - I already wrote the code for wmt en-ro #10044 (comment) need to adapt to others.

sgugger

Thanks for adding this to the script! It looks great to me, I just have one small nit.

sgugger · 2021-02-09T14:27:42Z

examples/seq2seq/run_seq2seq.py

    source_lang: Optional[str] = field(default=None, metadata={"help": "Source language id for translation."})
    target_lang: Optional[str] = field(default=None, metadata={"help": "Target language id for translation."})
-    eval_beams: Optional[int] = field(default=None, metadata={"help": "Number of beams to use for evaluation."})
+    eval_beams: Optional[int] = field(


I have no strong opinion. It's another name variable change from the old script but we already have a few renames. If you think it's better as num_beams, let's go for the change!

sgugger · 2021-02-09T14:28:09Z

examples/seq2seq/run_seq2seq.py

+            "help": "Number of beams to use for evaluation. This argument is used to override the ``num_beams`` "
+            "param of ``model.generate``, which is used during ``evaluate`` and ``predict``."


Suggested change

"help": "Number of beams to use for evaluation. This argument is used to override the ``num_beams`` "

"param of ``model.generate``, which is used during ``evaluate`` and ``predict``."

"help": "Number of beams to use for evaluation. This argument will be passed to ``model.generate``, which is used during ``evaluate`` and ``predict``."

sgugger · 2021-02-09T14:29:46Z

examples/seq2seq/run_seq2seq.py

        column_names = datasets["validation"].column_names
+    else:


Maybe do an elif training_args.do_predict here and have an else that prints something like "There is nothing to do, please pass along --do_train, --do_eval and/or --do_predict." and returns early?

patil-suraj · 2021-02-09T15:11:35Z

Changed eval_beams to num_beams. Hopefully final name change. Merging!

patil-suraj added 2 commits February 9, 2021 09:21

add do_predict, pass eval_beams durig eval

3f76077

update help

fd03e77

patil-suraj requested a review from sgugger February 9, 2021 04:09

patil-suraj commented Feb 9, 2021

View reviewed changes

stas00 approved these changes Feb 9, 2021

View reviewed changes

patil-suraj mentioned this pull request Feb 9, 2021

generation length always equal to 20 when using run_seq2seq.py script #10032

Closed

4 tasks

sgugger approved these changes Feb 9, 2021

View reviewed changes

apply suggestions from code review

287744f

patil-suraj merged commit 63fddcf into huggingface:master Feb 9, 2021

patil-suraj deleted the run_s2s_do_predict branch February 9, 2021 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[examples/s2s] add test set predictions #10085

[examples/s2s] add test set predictions #10085

patil-suraj commented Feb 9, 2021 •

edited

patil-suraj Feb 9, 2021

stas00 left a comment

stas00 Feb 9, 2021

patil-suraj Feb 9, 2021

sgugger Feb 9, 2021

patil-suraj commented Feb 9, 2021

stas00 commented Feb 9, 2021

sgugger left a comment

sgugger Feb 9, 2021

sgugger Feb 9, 2021

sgugger Feb 9, 2021

patil-suraj commented Feb 9, 2021

		"help": "Number of beams to use for evaluation. This argument is used to override the ``num_beams`` "
		"param of ``model.generate``, which is used during ``evaluate`` and ``predict``."

[examples/s2s] add test set predictions #10085

[examples/s2s] add test set predictions #10085

Conversation

patil-suraj commented Feb 9, 2021 • edited

What does this PR do?

patil-suraj Feb 9, 2021

Choose a reason for hiding this comment

stas00 left a comment

Choose a reason for hiding this comment

stas00 Feb 9, 2021

Choose a reason for hiding this comment

patil-suraj Feb 9, 2021

Choose a reason for hiding this comment

sgugger Feb 9, 2021

Choose a reason for hiding this comment

patil-suraj commented Feb 9, 2021

stas00 commented Feb 9, 2021

sgugger left a comment

Choose a reason for hiding this comment

sgugger Feb 9, 2021

Choose a reason for hiding this comment

sgugger Feb 9, 2021

Choose a reason for hiding this comment

sgugger Feb 9, 2021

Choose a reason for hiding this comment

patil-suraj commented Feb 9, 2021

patil-suraj commented Feb 9, 2021 •

edited