BERT EXTRACTION: Unable to reproduce results on MNLI #52

ahmadrash · 2020-03-30T19:25:30Z

Following the scrips I trained a teacher model successfully, generated the extraction data and ran the knowledge distillation on teacher logits. On evaluation on the dev set I am getting 0.319 eval accuracy.

martiansideofthemoon · 2020-03-30T20:06:30Z

Hi Ahmad, thanks for your interest! An accuracy of 31.9% indicates worse than random guess performance. A few questions to help debug this,

What is the class distribution of the extracted data?
What scheme did you use, RANDOM / WIKI?
What was the dev set accuracy of the teacher model?

ahmadrash · 2020-03-30T21:09:59Z

Thanks a lot for the prompt response.

The class distribution is [26.76%, 26.31%, 46.93%] respectively
I used DATA_SCHEME="random_ed_k_uniform"
Dev set accuracy of the teacher model is 0.851

martiansideofthemoon · 2020-03-30T21:19:55Z

Hi Ahmad,
1,2 and 3 look good to me. A few more follow-up questions,

I guess you are using BERT-large?
Are you using this file to train the student model? https://github.com/google-research/language/blob/master/language/bert_extraction/steal_bert_classifier/models/run_classifier_distillation.py
Is the training loss decreasing? (just confirming if the weight updates are happening)
Does the same script work for SST2 / SQuAD?

ahmadrash · 2020-03-30T21:49:07Z

Thanks Kalpesh,

Yes I am using BERT_large
Yes I am using the file.
I am adding the loss curve from Tensorboard. It shows oscillations.
I am still running the other experiments.

martiansideofthemoon · 2020-03-30T22:39:22Z

regarding your curve, how many epochs are you training it for / what's your batch size? A loss of 1.1 indicates nothing is being learnt, but I do see a strong decrease after the first few ~10k steps. Also, what is your learning rate, optimizer and learning rate schedule? Finally, what hardware are you using?

ahmadrash · 2020-03-30T22:49:26Z

I am training it for 3 epochs. I have a batch size of 8 on an NVIDIA V100 GPU. The learning rate,optimizer and schedule are the default in the script.

--learning_rate=3e-5
--warmup_propotion=0.1

And optimizer is same as the default for BERT

martiansideofthemoon · 2020-03-30T23:42:32Z

I think the batch size might be the issue, learning is less stable for RANDOM than the original MNLI, and smaller batch sizes (hence weaker gradient estimates) could put the model off the optimization path. I'd recommend trying batch size 32. If it doesn't fit on the GPU, you could try using BERT-base or gradient accumulation.

Another thing you could try is a learning rate decay. From your graph, it is clear that the training loss reduces during the warmup phase of training, but then the learning rate is too high and a bad gradient (from a small batch) can put off the optimization. You could also simply try smaller learning rates, maybe 1e-5

ahmadrash · 2020-03-30T23:59:07Z

Thanks a lot for the suggestions. I will try these and report back.

ahmadrash · 2020-04-02T16:43:28Z

Thanks Kalesh! I was able to to get 78 on MNLI dev and 90 on SST-2 reducing the learning rate to 1e-5. The loss curve still is not ideal but much better than what we were seeing before.

Jimntu · 2022-07-25T06:53:43Z

Hi, I am a beginner in deep learning and have little experience in implementing the code. May I ask how can you draw the loss curve from tensorboard? I would really appreciate if you can help me!

ahmadrash closed this as completed Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT EXTRACTION: Unable to reproduce results on MNLI #52

BERT EXTRACTION: Unable to reproduce results on MNLI #52

ahmadrash commented Mar 30, 2020

martiansideofthemoon commented Mar 30, 2020

ahmadrash commented Mar 30, 2020

martiansideofthemoon commented Mar 30, 2020

ahmadrash commented Mar 30, 2020

martiansideofthemoon commented Mar 30, 2020 •

edited

ahmadrash commented Mar 30, 2020

martiansideofthemoon commented Mar 30, 2020 •

edited

ahmadrash commented Mar 30, 2020

ahmadrash commented Apr 2, 2020

Jimntu commented Jul 25, 2022

BERT EXTRACTION: Unable to reproduce results on MNLI #52

BERT EXTRACTION: Unable to reproduce results on MNLI #52

Comments

ahmadrash commented Mar 30, 2020

martiansideofthemoon commented Mar 30, 2020

ahmadrash commented Mar 30, 2020

martiansideofthemoon commented Mar 30, 2020

ahmadrash commented Mar 30, 2020

martiansideofthemoon commented Mar 30, 2020 • edited

ahmadrash commented Mar 30, 2020

martiansideofthemoon commented Mar 30, 2020 • edited

ahmadrash commented Mar 30, 2020

ahmadrash commented Apr 2, 2020

Jimntu commented Jul 25, 2022

martiansideofthemoon commented Mar 30, 2020 •

edited

martiansideofthemoon commented Mar 30, 2020 •

edited