Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLUE MTL issue: low cola score, big variation in restarting #342

Closed
shuningjin opened this issue Aug 10, 2018 · 17 comments
Closed

GLUE MTL issue: low cola score, big variation in restarting #342

shuningjin opened this issue Aug 10, 2018 · 17 comments
Assignees
Labels
jiant-v1-legacy Relevant to versions <= v1.3.2

Comments

@shuningjin
Copy link
Contributor

shuningjin commented Aug 10, 2018

Config: defaults + train_for_eval = 0, val_interval = 9000, scaling = none, train_tasks = glue, eval_tasks = glue
Restart with weighting: proportional * 2, power_0.75 * 3

Issue 1: cola score is low (around 5), in four out of five experiments
Issue 2: glue score has large variation (1~2 points) between same experiments

Results and logs can be found in: GLUE Dev Results -> Experiment: MTL Mixing -> Row 273~277

Logs:
/nfs/jsalt/exp/shuning-worker34/sampling_master/proportional_glue/log.log
/nfs/jsalt/exp/shuning-worker31/sampling_master3/proportional_glue/log.log
/nfs/jsalt/exp/shuning-worker38/sampling_master2/power_0.75_glue/log.log
/nfs/jsalt/exp/shuning-worker32/sampling_master3/power_0.75_glue/log.log
/nfs/jsalt/exp/shuning-worker32/sampling_master4/power_0.75_glue/log.log

@shuningjin shuningjin changed the title GLUE MTL issue: low COLA score, big variation in restarting GLUE MTL issue: low cola score, big variation in restarting Aug 10, 2018
@sleepinyourhat
Copy link
Contributor

I wonder if CoLA is behaving unusually since it's the first task... Task order seems to matter for the minor issue I noticed as part of #341.

@sleepinyourhat
Copy link
Contributor

Any further results/guesses? If not, could you point me to the commands you used? (I think I've seen them, but making sure...)

@sleepinyourhat
Copy link
Contributor

( @shuningjin )

@shuningjin
Copy link
Contributor Author

Sorry, no further results. The commands are here (the only things varying are weighting_method and names).

COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c config/weighting.conf
-r --notify jinxx596@d.umn.edu
-o "exp_name = sampling_master3, run_name = proportional_glue, weighting_method = proportional, train_tasks = glue, eval_tasks = glue, val_interval = 9000"; sudo poweroff ''';
gcloud compute ssh shuning-worker31
--command="tmux new -s demo -d; tmux send '$COMMAND' Enter; tmux attach -t demo" -- -t

COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c config/weighting.conf
-r --notify jinxx596@d.umn.edu
-o "exp_name = sampling_master3, run_name = power_0.75_glue, weighting_method = power_0.75, train_tasks = glue, eval_tasks = glue, val_interval = 9000"; sudo poweroff ''';
gcloud compute ssh shuning-worker32
--command="tmux new -s demo -d; tmux send '$COMMAND' Enter; tmux attach -t demo" -- -t

@sleepinyourhat
Copy link
Contributor

sleepinyourhat commented Aug 22, 2018 via email

@shuningjin
Copy link
Contributor Author

shuningjin commented Aug 22, 2018

I am running on the master branch. Is there anything I can do about CoLA?

@sleepinyourhat
Copy link
Contributor

sleepinyourhat commented Aug 22, 2018 via email

@sleepinyourhat
Copy link
Contributor

If you're out of ideas, how about this:

  • Finish all the remaining issues on your pull request, and we can merge it in.
  • At master/head, rerun everything. Make sure you get a recent set of preprocessing data—by deleting things if necessary.
    @shuningjin

@sleepinyourhat sleepinyourhat added the help wanted Extra attention is needed label Aug 30, 2018
@W4ngatang
Copy link
Collaborator

W4ngatang commented Aug 30, 2018

possibly relevant, I've been running some experiments on just the small tasks in GLUE (CoLA, MRPC, STS-B, SST, RTE) and have found fairly consistent CoLA scores roughly in 15-20 without using ELMo, and on a fairly recent branch.

@sleepinyourhat
Copy link
Contributor

That's reassuring—seems like what we should expect.

@W4ngatang
Copy link
Collaborator

W4ngatang commented Aug 30, 2018

and some larger runs on full GLUE also have fairly consistent results on overall GLUE score (stddev = .6 over 3 runs) and CoLA (stddev is usually relatively high; for a set of runs varying only the seed, it's anywhere from .5 - 3)

I'm less sure this is an issue, unless someone has recently run full GLUE MTL or a full train for eval phase on master?

@sleepinyourhat
Copy link
Contributor

Okay, hopefully this is just a branch-specific issue, then. @shuningjin - Okay to mark this as closed?

@shuningjin
Copy link
Contributor Author

shuningjin commented Aug 30, 2018

Can you share the script/configuration of your recent runs? @W4ngatang
The problem is specific to train_for_eval = 0 when it originally arises.

@W4ngatang
Copy link
Collaborator

I've gotten pretty decent results (consistently over 10+) without train for eval and with

command is something like
python main.py --config config/final.conf --overrides "train_tasks = \"mnli-alt,mrpc,qnli-alt,sst,sts-b-alt,rte,wnli,qqp-alt,cola\", eval_tasks = \"mnli,mrpc,qnli,sst,sts-b,rte,wnli,qqp,cola\", val_interval = 9000, run_name = debug_cola, elmo_chars_only = 1, allow_reuse_of_pretraining_parameters = 0, do_train = 1, train_for_eval = 1, do_eval = 1, cuda = 0"

I was running on not-master but a pretty recent branch (metalearn)

@shuningjin
Copy link
Contributor Author

shuningjin commented Aug 31, 2018

  • Why is it "without train for eval", when you are setting "allow_reuse_of_pretraining_parameters = 0, train_for_eval = 1"?
  • I just ran your exact command on master, and got an error:
Fatal error in main():
Traceback (most recent call last):
  File "main.py", line 316, in <module>
    main(sys.argv[1:])
  File "main.py", line 179, in main
    "If you're pretraining on a task you plan to reuse as a target task, set\n"
  File "/nfs/jsalt/home/shuning/jiant/src/utils.py", line 1112, in assert_for_log
    assert condition, error_message
AssertionError: If you're pretraining on a task you plan to reuse as a target task, set
allow_reuse_of_pretraining_parameters = 1(risky), or train in two steps:
  train with do_train = 1, train_for_eval = 0, stop, and restart with
  do_train = 0 and train_for_eval = 1.

@W4ngatang

I can rerun things within a few days. Shall I just abandon the configuration here and use the same one as Alex's ? I am confused about what experiments to run.
@sleepinyourhat

@sleepinyourhat
Copy link
Contributor

@shuningjin I think your configuration is reasonable, unless you see anything that needs to change. I'd say finish this PR, then run the same commands as before on master.

@sleepinyourhat sleepinyourhat removed the help wanted Extra attention is needed label Aug 31, 2018
@sleepinyourhat
Copy link
Contributor

@shuningjin IIRC, you've gotten more normal results recently. If that's right, please close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jiant-v1-legacy Relevant to versions <= v1.3.2
Projects
None yet
Development

No branches or pull requests

4 participants