GLUE MTL issue: low cola score, big variation in restarting #342

shuningjin · 2018-08-10T10:20:47Z

Config: defaults + train_for_eval = 0, val_interval = 9000, scaling = none, train_tasks = glue, eval_tasks = glue
Restart with weighting: proportional * 2, power_0.75 * 3

Issue 1: cola score is low (around 5), in four out of five experiments
Issue 2: glue score has large variation (1~2 points) between same experiments

Results and logs can be found in: GLUE Dev Results -> Experiment: MTL Mixing -> Row 273~277

Logs:
/nfs/jsalt/exp/shuning-worker34/sampling_master/proportional_glue/log.log
/nfs/jsalt/exp/shuning-worker31/sampling_master3/proportional_glue/log.log
/nfs/jsalt/exp/shuning-worker38/sampling_master2/power_0.75_glue/log.log
/nfs/jsalt/exp/shuning-worker32/sampling_master3/power_0.75_glue/log.log
/nfs/jsalt/exp/shuning-worker32/sampling_master4/power_0.75_glue/log.log

sleepinyourhat · 2018-08-10T23:04:06Z

I wonder if CoLA is behaving unusually since it's the first task... Task order seems to matter for the minor issue I noticed as part of #341.

sleepinyourhat · 2018-08-21T00:13:52Z

Any further results/guesses? If not, could you point me to the commands you used? (I think I've seen them, but making sure...)

sleepinyourhat · 2018-08-21T21:37:55Z

( @shuningjin )

shuningjin · 2018-08-22T07:53:42Z

Sorry, no further results. The commands are here (the only things varying are weighting_method and names).

COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c config/weighting.conf
-r --notify jinxx596@d.umn.edu
-o "exp_name = sampling_master3, run_name = proportional_glue, weighting_method = proportional, train_tasks = glue, eval_tasks = glue, val_interval = 9000"; sudo poweroff ''';
gcloud compute ssh shuning-worker31
--command="tmux new -s demo -d; tmux send '$COMMAND' Enter; tmux attach -t demo" -- -t

COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c config/weighting.conf
-r --notify jinxx596@d.umn.edu
-o "exp_name = sampling_master3, run_name = power_0.75_glue, weighting_method = power_0.75, train_tasks = glue, eval_tasks = glue, val_interval = 9000"; sudo poweroff ''';
gcloud compute ssh shuning-worker32
--command="tmux new -s demo -d; tmux send '$COMMAND' Enter; tmux attach -t demo" -- -t

sleepinyourhat · 2018-08-22T18:13:15Z

My guess is that something is wrong with CoLA evaluation more generally (probably because it's the first GLUE task). But to be sure... You're running this on a branch, right? At master, weighting.conf uses the model setup from default.conf, not from final.conf.

…

On Wed, Aug 22, 2018 at 12:53 AM Shuning Jin ***@***.***> wrote: Sorry, no further results. The commands are here (the only things varying are weighting_method and names). COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c config/weighting.conf -r --notify ***@***.*** -o "exp_name = sampling_master3, run_name = proportional_glue, weighting_method = proportional, train_tasks = glue, eval_tasks = glue, val_interval = 9000"; sudo poweroff '''; gcloud compute ssh shuning-worker31 --command="tmux new -s demo -d; tmux send '$COMMAND' Enter; tmux attach -t demo" -- -t COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c config/weighting.conf -r --notify ***@***.*** -o "exp_name = sampling_master3, run_name = power_0.75_glue, weighting_method = power_0.75, train_tasks = glue, eval_tasks = glue, val_interval = 9000"; sudo poweroff '''; gcloud compute ssh shuning-worker32 --command="tmux new -s demo -d; tmux send '$COMMAND' Enter; tmux attach -t demo" -- -t — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#342 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOZWcstGIYrfoQgLM6deD1k-sAzs3x2ks5uTQ4NgaJpZM4V382u> .

shuningjin · 2018-08-22T18:32:02Z

I am running on the master branch. Is there anything I can do about CoLA?

sleepinyourhat · 2018-08-22T21:14:05Z

Hrm. I guess that isn't a major problem, but could increase variance since patience is lower. I guess there's nothing to do right now, unless you can think of possible bugs to investigate.

…

On Wed, Aug 22, 2018 at 11:32 AM Shuning Jin ***@***.***> wrote: I am running on the master branch. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#342 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOZWXO8ruIUTeDImbX5ndON6FDYVNk0ks5uTaOigaJpZM4V382u> .

sleepinyourhat · 2018-08-30T16:10:38Z

If you're out of ideas, how about this:

Finish all the remaining issues on your pull request, and we can merge it in.
At master/head, rerun everything. Make sure you get a recent set of preprocessing data—by deleting things if necessary.
@shuningjin

W4ngatang · 2018-08-30T20:05:27Z

possibly relevant, I've been running some experiments on just the small tasks in GLUE (CoLA, MRPC, STS-B, SST, RTE) and have found fairly consistent CoLA scores roughly in 15-20 without using ELMo, and on a fairly recent branch.

sleepinyourhat · 2018-08-30T20:06:17Z

That's reassuring—seems like what we should expect.

W4ngatang · 2018-08-30T20:10:43Z

and some larger runs on full GLUE also have fairly consistent results on overall GLUE score (stddev = .6 over 3 runs) and CoLA (stddev is usually relatively high; for a set of runs varying only the seed, it's anywhere from .5 - 3)

I'm less sure this is an issue, unless someone has recently run full GLUE MTL or a full train for eval phase on master?

sleepinyourhat · 2018-08-30T20:14:02Z

Okay, hopefully this is just a branch-specific issue, then. @shuningjin - Okay to mark this as closed?

shuningjin · 2018-08-30T20:38:47Z

Can you share the script/configuration of your recent runs? @W4ngatang
The problem is specific to train_for_eval = 0 when it originally arises.

W4ngatang · 2018-08-30T21:25:48Z

I've gotten pretty decent results (consistently over 10+) without train for eval and with

command is something like
python main.py --config config/final.conf --overrides "train_tasks = \"mnli-alt,mrpc,qnli-alt,sst,sts-b-alt,rte,wnli,qqp-alt,cola\", eval_tasks = \"mnli,mrpc,qnli,sst,sts-b,rte,wnli,qqp,cola\", val_interval = 9000, run_name = debug_cola, elmo_chars_only = 1, allow_reuse_of_pretraining_parameters = 0, do_train = 1, train_for_eval = 1, do_eval = 1, cuda = 0"

I was running on not-master but a pretty recent branch (metalearn)

shuningjin · 2018-08-31T05:59:20Z

Why is it "without train for eval", when you are setting "allow_reuse_of_pretraining_parameters = 0, train_for_eval = 1"?
I just ran your exact command on master, and got an error:

Fatal error in main():
Traceback (most recent call last):
  File "main.py", line 316, in <module>
    main(sys.argv[1:])
  File "main.py", line 179, in main
    "If you're pretraining on a task you plan to reuse as a target task, set\n"
  File "/nfs/jsalt/home/shuning/jiant/src/utils.py", line 1112, in assert_for_log
    assert condition, error_message
AssertionError: If you're pretraining on a task you plan to reuse as a target task, set
allow_reuse_of_pretraining_parameters = 1(risky), or train in two steps:
  train with do_train = 1, train_for_eval = 0, stop, and restart with
  do_train = 0 and train_for_eval = 1.

@W4ngatang

I can rerun things within a few days. Shall I just abandon the configuration here and use the same one as Alex's ? I am confused about what experiments to run.
@sleepinyourhat

sleepinyourhat · 2018-08-31T13:45:23Z

@shuningjin I think your configuration is reasonable, unless you see anything that needs to change. I'd say finish this PR, then run the same commands as before on master.

sleepinyourhat · 2018-09-05T13:21:48Z

@shuningjin IIRC, you've gotten more normal results recently. If that's right, please close this.

shuningjin changed the title ~~GLUE MTL issue: low COLA score, big variation in restarting~~ GLUE MTL issue: low cola score, big variation in restarting Aug 10, 2018

sleepinyourhat mentioned this issue Aug 10, 2018

Predictions written to disk don't match metrics #343

Closed

sleepinyourhat added the help wanted Extra attention is needed label Aug 30, 2018

sleepinyourhat assigned shuningjin Aug 31, 2018

sleepinyourhat removed the help wanted Extra attention is needed label Aug 31, 2018

shuningjin closed this as completed Sep 5, 2018

jeswan mentioned this issue Sep 17, 2020

[CLOSED] GLUE MTL issue: low cola score, big variation in restarting nyu-mll/jiant-v1-legacy#342

Closed

jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLUE MTL issue: low cola score, big variation in restarting #342

GLUE MTL issue: low cola score, big variation in restarting #342

shuningjin commented Aug 10, 2018 •

edited

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 21, 2018

sleepinyourhat commented Aug 21, 2018

shuningjin commented Aug 22, 2018

sleepinyourhat commented Aug 22, 2018 via email

shuningjin commented Aug 22, 2018 •

edited

sleepinyourhat commented Aug 22, 2018 via email

sleepinyourhat commented Aug 30, 2018

W4ngatang commented Aug 30, 2018 •

edited

sleepinyourhat commented Aug 30, 2018

W4ngatang commented Aug 30, 2018 •

edited

sleepinyourhat commented Aug 30, 2018

shuningjin commented Aug 30, 2018 •

edited

W4ngatang commented Aug 30, 2018

shuningjin commented Aug 31, 2018 •

edited

sleepinyourhat commented Aug 31, 2018

sleepinyourhat commented Sep 5, 2018

GLUE MTL issue: low cola score, big variation in restarting #342

GLUE MTL issue: low cola score, big variation in restarting #342

Comments

shuningjin commented Aug 10, 2018 • edited

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 21, 2018

sleepinyourhat commented Aug 21, 2018

shuningjin commented Aug 22, 2018

sleepinyourhat commented Aug 22, 2018 via email

shuningjin commented Aug 22, 2018 • edited

sleepinyourhat commented Aug 22, 2018 via email

sleepinyourhat commented Aug 30, 2018

W4ngatang commented Aug 30, 2018 • edited

sleepinyourhat commented Aug 30, 2018

W4ngatang commented Aug 30, 2018 • edited

sleepinyourhat commented Aug 30, 2018

shuningjin commented Aug 30, 2018 • edited

W4ngatang commented Aug 30, 2018

shuningjin commented Aug 31, 2018 • edited

sleepinyourhat commented Aug 31, 2018

sleepinyourhat commented Sep 5, 2018

shuningjin commented Aug 10, 2018 •

edited

shuningjin commented Aug 22, 2018 •

edited

W4ngatang commented Aug 30, 2018 •

edited

W4ngatang commented Aug 30, 2018 •

edited

shuningjin commented Aug 30, 2018 •

edited

shuningjin commented Aug 31, 2018 •

edited