New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLUE MTL issue: low cola score, big variation in restarting #342
Comments
I wonder if CoLA is behaving unusually since it's the first task... Task order seems to matter for the minor issue I noticed as part of #341. |
Any further results/guesses? If not, could you point me to the commands you used? (I think I've seen them, but making sure...) |
( @shuningjin ) |
Sorry, no further results. The commands are here (the only things varying are weighting_method and names). COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c config/weighting.conf COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c config/weighting.conf |
My guess is that something is wrong with CoLA evaluation more generally
(probably because it's the first GLUE task). But to be sure...
You're running this on a branch, right? At master, weighting.conf uses the
model setup from default.conf, not from final.conf.
…On Wed, Aug 22, 2018 at 12:53 AM Shuning Jin ***@***.***> wrote:
Sorry, no further results. The commands are here (the only things varying
are weighting_method and names).
COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c
config/weighting.conf
-r --notify ***@***.***
-o "exp_name = sampling_master3, run_name = proportional_glue,
weighting_method = proportional, train_tasks = glue, eval_tasks = glue,
val_interval = 9000"; sudo poweroff ''';
gcloud compute ssh shuning-worker31
--command="tmux new -s demo -d; tmux send '$COMMAND' Enter; tmux attach -t
demo" -- -t
COMMAND='''cd /nfs/jsalt/home/shuning/jiant; python main.py -c
config/weighting.conf
-r --notify ***@***.***
-o "exp_name = sampling_master3, run_name = power_0.75_glue,
weighting_method = power_0.75, train_tasks = glue, eval_tasks = glue,
val_interval = 9000"; sudo poweroff ''';
gcloud compute ssh shuning-worker32
--command="tmux new -s demo -d; tmux send '$COMMAND' Enter; tmux attach -t
demo" -- -t
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#342 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABOZWcstGIYrfoQgLM6deD1k-sAzs3x2ks5uTQ4NgaJpZM4V382u>
.
|
I am running on the master branch. Is there anything I can do about CoLA? |
Hrm. I guess that isn't a major problem, but could increase variance since
patience is lower. I guess there's nothing to do right now, unless you can
think of possible bugs to investigate.
…On Wed, Aug 22, 2018 at 11:32 AM Shuning Jin ***@***.***> wrote:
I am running on the master branch.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#342 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABOZWXO8ruIUTeDImbX5ndON6FDYVNk0ks5uTaOigaJpZM4V382u>
.
|
If you're out of ideas, how about this:
|
possibly relevant, I've been running some experiments on just the small tasks in GLUE (CoLA, MRPC, STS-B, SST, RTE) and have found fairly consistent CoLA scores roughly in 15-20 without using ELMo, and on a fairly recent branch. |
That's reassuring—seems like what we should expect. |
and some larger runs on full GLUE also have fairly consistent results on overall GLUE score (stddev = .6 over 3 runs) and CoLA (stddev is usually relatively high; for a set of runs varying only the seed, it's anywhere from .5 - 3) I'm less sure this is an issue, unless someone has recently run full GLUE MTL or a full train for eval phase on master? |
Okay, hopefully this is just a branch-specific issue, then. @shuningjin - Okay to mark this as closed? |
Can you share the script/configuration of your recent runs? @W4ngatang |
I've gotten pretty decent results (consistently over 10+) without train for eval and with command is something like I was running on not-master but a pretty recent branch ( |
I can rerun things within a few days. Shall I just abandon the configuration here and use the same one as Alex's ? I am confused about what experiments to run. |
@shuningjin I think your configuration is reasonable, unless you see anything that needs to change. I'd say finish this PR, then run the same commands as before on master. |
@shuningjin IIRC, you've gotten more normal results recently. If that's right, please close this. |
Config: defaults + train_for_eval = 0, val_interval = 9000, scaling = none, train_tasks = glue, eval_tasks = glue
Restart with weighting: proportional * 2, power_0.75 * 3
Issue 1: cola score is low (around 5), in four out of five experiments
Issue 2: glue score has large variation (1~2 points) between same experiments
Results and logs can be found in: GLUE Dev Results -> Experiment: MTL Mixing -> Row 273~277
Logs:
/nfs/jsalt/exp/shuning-worker34/sampling_master/proportional_glue/log.log
/nfs/jsalt/exp/shuning-worker31/sampling_master3/proportional_glue/log.log
/nfs/jsalt/exp/shuning-worker38/sampling_master2/power_0.75_glue/log.log
/nfs/jsalt/exp/shuning-worker32/sampling_master3/power_0.75_glue/log.log
/nfs/jsalt/exp/shuning-worker32/sampling_master4/power_0.75_glue/log.log
The text was updated successfully, but these errors were encountered: