Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation score on WSC decreases with training #18

Closed
Raldir opened this issue Aug 24, 2022 · 3 comments
Closed

Validation score on WSC decreases with training #18

Raldir opened this issue Aug 24, 2022 · 3 comments

Comments

@Raldir
Copy link

Raldir commented Aug 24, 2022

Thank you for the amazing work on t-few! I've noticed strange behavior when I am running superglue's wsc. I've been logging the validation score every 40 epochs using self.eval_epoch_interval = 40 and when running the command:
python -m src.pl_train -c ia3.json+wsc.json -k save_model=False exp_name=first_exp the output is as following:

{"accuracy": 0.6730769230769231, "score_gt": 0.5068197436630726, "score_cand": 0.7191649047801127}
{"accuracy": 0.49038461538461536, "score_gt": 1.4563168384707892, "score_cand": 1.505529030584372}
{"accuracy": 0.47115384615384615, "score_gt": 3.4743554890155792, "score_cand": 2.727144861450562}
{"accuracy": 0.46153846153846156, "score_gt": 4.202766236777489, "score_cand": 3.5702959763316007}
{"accuracy": 0.40384615384615385, "score_gt": 5.157541000499175, "score_cand": 3.5657502871293287}
{"accuracy": 0.3942307692307692, "score_gt": 5.397989429533482, "score_cand": 3.975659689651086}
{"accuracy": 0.40384615384615385, "score_gt": 5.073869264469697, "score_cand": 3.995581218542961}

The last accuracy score is reported at 240 epochs out of a total 250 epochs.

Any ideas on what is going on here? Thanks!

@Raldir Raldir changed the title Val score decreases on WSC Validation score on WSC decreases with training Aug 24, 2022
@HaokunLiu
Copy link
Collaborator

I can try running this experiment maybe later half of this week. Meanwhile, I remember WSC to be a tricky dataset that often produces unstable results. Would you mind running it with a few other seeds and seeing if this behavior persists?

@HaokunLiu
Copy link
Collaborator

And btw, is this just WSC? Do other datasets have this problem?

@Raldir
Copy link
Author

Raldir commented Aug 29, 2022

Hi Haokun, thank you for the response. Indeed, after changing the seed results are more as expected. I have been having similar problems with WiC, but again it appears to be caused by the variability of the seed. RTE seems more stable.

@Raldir Raldir closed this as completed Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants