Fix PABEE & PL CI failure #6433

JetRunner · 2020-08-12T09:10:53Z

codecov · 2020-08-12T09:28:39Z

Codecov Report

Merging #6433 into master will decrease coverage by 2.51%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #6433      +/-   ##
==========================================
- Coverage   79.89%   77.37%   -2.52%     
==========================================
  Files         153      153              
  Lines       27902    27902              
==========================================
- Hits        22291    21588     -703     
- Misses       5611     6314     +703

Impacted Files	Coverage Δ
src/transformers/optimization.py	`25.55% <0.00%> (-70.00%)`	⬇️
src/transformers/modeling_tf_distilbert.py	`34.11% <0.00%> (-63.30%)`	⬇️
src/transformers/pipelines.py	`26.98% <0.00%> (-52.81%)`	⬇️
src/transformers/optimization_tf.py	`33.33% <0.00%> (-24.33%)`	⬇️
src/transformers/modeling_tf_auto.py	`48.79% <0.00%> (-18.08%)`	⬇️
src/transformers/modeling_auto.py	`64.16% <0.00%> (-14.46%)`	⬇️
src/transformers/data/processors/squad.py	`13.76% <0.00%> (-14.38%)`	⬇️
src/transformers/modeling_tf_gpt2.py	`65.68% <0.00%> (-6.16%)`	⬇️
src/transformers/modelcard.py	`82.71% <0.00%> (-2.47%)`	⬇️
src/transformers/modeling_distilbert.py	`96.19% <0.00%> (-1.64%)`	⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ffea5c...cd1ca4c. Read the comment docs.

LysandreJik

LGTM!

LysandreJik · 2020-08-12T10:23:55Z

Ah, PL test still failing!

JetRunner · 2020-08-12T10:24:31Z

@LysandreJik Don't worry. It seems I mistype some parameter name

JetRunner · 2020-08-12T11:15:55Z

Oops! @sshleifer could you have a look at the PL example? I've tried tweaking the parameters but it doesn't seem to work.

JetRunner · 2020-08-12T23:45:03Z

@stas00 Can you take a look? @sshleifer is on a vacation. Lots of thanks!

stas00 · 2020-08-13T01:08:40Z

Yes, of course, I will be able to investigate in a few hours.

stas00 · 2020-08-13T02:22:57Z

(we are talking about examples/test_examples.py::ExamplesTests::test_run_pl_glue)

I'm able to reproduce the problem of low acc with those changes proposed in this PR. This PR I get:

acc = 0.5
f1 = 0.666

The original pre-PR changes gives acc/f1=1.0 on my machine.

If you have a look at #6034 I tried various hparams to no avail, it was working fine on my machine, but CI kept on failing. It was just very counterproductive trying to experiment w/o being able to reproduce it locally, so after some time I gave up. So the test is not ideal, but at least it's testing that it runs.

@sshleifer said he was able to match the CI's low accuracy on his hardware (pre this PR).

JetRunner · 2020-08-13T04:50:48Z

@stas00 Yes I've already found the problem in #6034 (output_dir) and fixed that in our PR. However the accuracy is still too low compared to the trainer version of run_glue. Since you can now reproduce the low acc, please give it a look!

stas00 · 2020-08-13T05:58:14Z

Thank you for explaining what is happening, @JetRunner

I have no perms to push, so try to use this:

        testargs = """
            run_pl_glue.py
            --model_name_or_path bert-base-cased
            --data_dir ./tests/fixtures/tests_samples/MRPC/
            --task mrpc
            --do_train
            --do_predict
            --output_dir ./tests/fixtures/tests_samples/pl_temp_dir
            --train_batch_size=32
            --learning_rate=1e-4
            --num_train_epochs=4
            --warmup_steps=3
            --seed=42
            --max_seq_length=128
            """.split()`

I get acc/f1 of 1.0 with this config, the key was more --num_train_epochs and some warm-up.

So you uncovered that these tests are very unreliable as they don't clean up after themselves and re-runs give invalid results. It's enough to get one run that succeeded, all the subsequent test re-runs will succeed at the moment. At the very least pl_glue needs to support --overwrite_output_dir.

That explains why I couldn't get CI to work, as mine probably wasn't working all along, other than succeeding once and then always reporting the old success. So I was getting false positives.

Should transformers warn a user when a pre-existing dir filled with outdated data is found or plainly refuse to run?

JetRunner · 2020-08-13T06:51:24Z

@stas00 this perm also outputs 0.5, sadly. I feel maybe there's another bug here in the PL example?
cc @LysandreJik @sshleifer

JetRunner · 2020-08-13T16:35:41Z

PABEE's bug is fixed in #6453. The reproducible low acc is still existing for PL.
cc @LysandreJik @sshleifer

Debug PABEE CI failure

9fbcfa9

JetRunner added 4 commits August 12, 2020 17:49

Update test_run_glue_with_pabee.py

f9e0424

Update test_run_glue_with_pabee.py

f179260

Update config.yml

1767a16

Update config.yml

7136239

JetRunner changed the title ~~Debug PABEE CI failure~~ Fix PABEE CI failure Aug 12, 2020

Update test_examples.py

cd1ca4c

JetRunner changed the title ~~Fix PABEE CI failure~~ Fix PABEE & PL CI failure Aug 12, 2020

LysandreJik approved these changes Aug 12, 2020

View reviewed changes

JetRunner added 3 commits August 12, 2020 18:29

Update test_examples.py

255ecdc

Update test_examples.py

2e49a20

Update test_examples.py

91bbd4d

JetRunner added 3 commits August 13, 2020 14:30

Update test_examples.py

0edaaa5

Update test_examples.py

60203d1

Update test_examples.py

22dc12f

JetRunner mentioned this pull request Aug 13, 2020

Clean directory after script testing #6453

Merged

JetRunner closed this Aug 13, 2020

JetRunner deleted the debug_pabee_ci_failure branch August 13, 2020 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PABEE & PL CI failure #6433

Fix PABEE & PL CI failure #6433

JetRunner commented Aug 12, 2020 •

edited

codecov bot commented Aug 12, 2020 •

edited

LysandreJik left a comment

LysandreJik commented Aug 12, 2020

JetRunner commented Aug 12, 2020

JetRunner commented Aug 12, 2020

JetRunner commented Aug 12, 2020

stas00 commented Aug 13, 2020

stas00 commented Aug 13, 2020 •

edited

JetRunner commented Aug 13, 2020

stas00 commented Aug 13, 2020 •

edited

JetRunner commented Aug 13, 2020

JetRunner commented Aug 13, 2020

Fix PABEE & PL CI failure #6433

Fix PABEE & PL CI failure #6433

Conversation

JetRunner commented Aug 12, 2020 • edited

codecov bot commented Aug 12, 2020 • edited

Codecov Report

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik commented Aug 12, 2020

JetRunner commented Aug 12, 2020

JetRunner commented Aug 12, 2020

JetRunner commented Aug 12, 2020

stas00 commented Aug 13, 2020

stas00 commented Aug 13, 2020 • edited

JetRunner commented Aug 13, 2020

stas00 commented Aug 13, 2020 • edited

JetRunner commented Aug 13, 2020

JetRunner commented Aug 13, 2020

JetRunner commented Aug 12, 2020 •

edited

codecov bot commented Aug 12, 2020 •

edited

stas00 commented Aug 13, 2020 •

edited

stas00 commented Aug 13, 2020 •

edited