Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PABEE & PL CI failure #6433

Closed
wants to merge 12 commits into from
Closed

Fix PABEE & PL CI failure #6433

wants to merge 12 commits into from

Conversation

JetRunner
Copy link
Contributor

@JetRunner JetRunner commented Aug 12, 2020

@codecov
Copy link

codecov bot commented Aug 12, 2020

Codecov Report

Merging #6433 into master will decrease coverage by 2.51%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6433      +/-   ##
==========================================
- Coverage   79.89%   77.37%   -2.52%     
==========================================
  Files         153      153              
  Lines       27902    27902              
==========================================
- Hits        22291    21588     -703     
- Misses       5611     6314     +703     
Impacted Files Coverage Δ
src/transformers/optimization.py 25.55% <0.00%> (-70.00%) ⬇️
src/transformers/modeling_tf_distilbert.py 34.11% <0.00%> (-63.30%) ⬇️
src/transformers/pipelines.py 26.98% <0.00%> (-52.81%) ⬇️
src/transformers/optimization_tf.py 33.33% <0.00%> (-24.33%) ⬇️
src/transformers/modeling_tf_auto.py 48.79% <0.00%> (-18.08%) ⬇️
src/transformers/modeling_auto.py 64.16% <0.00%> (-14.46%) ⬇️
src/transformers/data/processors/squad.py 13.76% <0.00%> (-14.38%) ⬇️
src/transformers/modeling_tf_gpt2.py 65.68% <0.00%> (-6.16%) ⬇️
src/transformers/modelcard.py 82.71% <0.00%> (-2.47%) ⬇️
src/transformers/modeling_distilbert.py 96.19% <0.00%> (-1.64%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ffea5c...cd1ca4c. Read the comment docs.

@JetRunner JetRunner changed the title Debug PABEE CI failure Fix PABEE CI failure Aug 12, 2020
@JetRunner JetRunner changed the title Fix PABEE CI failure Fix PABEE & PL CI failure Aug 12, 2020
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@LysandreJik
Copy link
Member

Ah, PL test still failing!

@JetRunner
Copy link
Contributor Author

@LysandreJik Don't worry. It seems I mistype some parameter name

@JetRunner
Copy link
Contributor Author

Oops! @sshleifer could you have a look at the PL example? I've tried tweaking the parameters but it doesn't seem to work.

@JetRunner
Copy link
Contributor Author

@stas00 Can you take a look? @sshleifer is on a vacation. Lots of thanks!

@stas00
Copy link
Contributor

stas00 commented Aug 13, 2020

Yes, of course, I will be able to investigate in a few hours.

@stas00
Copy link
Contributor

stas00 commented Aug 13, 2020

(we are talking about examples/test_examples.py::ExamplesTests::test_run_pl_glue)

I'm able to reproduce the problem of low acc with those changes proposed in this PR. This PR I get:

acc = 0.5
f1 = 0.666

The original pre-PR changes gives acc/f1=1.0 on my machine.

If you have a look at #6034 I tried various hparams to no avail, it was working fine on my machine, but CI kept on failing. It was just very counterproductive trying to experiment w/o being able to reproduce it locally, so after some time I gave up. So the test is not ideal, but at least it's testing that it runs.

@sshleifer said he was able to match the CI's low accuracy on his hardware (pre this PR).

@JetRunner
Copy link
Contributor Author

@stas00 Yes I've already found the problem in #6034 (output_dir) and fixed that in our PR. However the accuracy is still too low compared to the trainer version of run_glue. Since you can now reproduce the low acc, please give it a look!

@stas00
Copy link
Contributor

stas00 commented Aug 13, 2020

Thank you for explaining what is happening, @JetRunner

I have no perms to push, so try to use this:

        testargs = """
            run_pl_glue.py
            --model_name_or_path bert-base-cased
            --data_dir ./tests/fixtures/tests_samples/MRPC/
            --task mrpc
            --do_train
            --do_predict
            --output_dir ./tests/fixtures/tests_samples/pl_temp_dir
            --train_batch_size=32
            --learning_rate=1e-4
            --num_train_epochs=4
            --warmup_steps=3
            --seed=42
            --max_seq_length=128
            """.split()`

I get acc/f1 of 1.0 with this config, the key was more --num_train_epochs and some warm-up.

So you uncovered that these tests are very unreliable as they don't clean up after themselves and re-runs give invalid results. It's enough to get one run that succeeded, all the subsequent test re-runs will succeed at the moment. At the very least pl_glue needs to support --overwrite_output_dir.

That explains why I couldn't get CI to work, as mine probably wasn't working all along, other than succeeding once and then always reporting the old success. So I was getting false positives.

Should transformers warn a user when a pre-existing dir filled with outdated data is found or plainly refuse to run?

@JetRunner
Copy link
Contributor Author

@stas00 this perm also outputs 0.5, sadly. I feel maybe there's another bug here in the PL example?
cc @LysandreJik @sshleifer

@JetRunner
Copy link
Contributor Author

PABEE's bug is fixed in #6453. The reproducible low acc is still existing for PL.
cc @LysandreJik @sshleifer

@JetRunner JetRunner closed this Aug 13, 2020
@JetRunner JetRunner deleted the debug_pabee_ci_failure branch August 13, 2020 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants