masked_lm_accuracy is low at 0.51, but next_sentence_accuracy is high at 0.93 #557

SeekPoint · 2019-04-06T03:11:54Z

how to explain that,

my training set about 1M line, runing 50000 steps, batchsize is 32

KavyaGujjala · 2019-04-11T17:59:33Z

Hi, Did you solve this issue?

I am pretraining bert on domain specific dataset ( 1 million sentences, all as one document ) masked lm accuracy doesnt go beyond 70%

Tried running for 100000 steps also, didnt help much.

What to do about this?

I am pretraining to get sentence level embeddings and compare the similarities between them.

SeekPoint · 2019-04-12T03:05:02Z

acturally, after 300000 steps, masked_lm_accuracy is low at 0.88
so, there is no problemn, just not enough training steps

maggieezzat · 2019-05-14T13:48:03Z

Hello, my data isn't that large: 720 Million words only, so 300,000 steps with sequence length 128 would mean 50 epochs for me, would that lead to overfitting? Also what learning rate and warmup steps did you use? @lovejasmine

linWujl · 2019-10-24T11:56:38Z

Hi, I follow the guide for pre-trained, which use sample_text.txt, but got low mlm accuracy and high nsp accuracy, do you know why?

SeekPoint closed this as completed Apr 12, 2019

Timoeller mentioned this issue Jan 6, 2021

bad results after pretraining #529

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

masked_lm_accuracy is low at 0.51, but next_sentence_accuracy is high at 0.93 #557

masked_lm_accuracy is low at 0.51, but next_sentence_accuracy is high at 0.93 #557

SeekPoint commented Apr 6, 2019

KavyaGujjala commented Apr 11, 2019

SeekPoint commented Apr 12, 2019

maggieezzat commented May 14, 2019

linWujl commented Oct 24, 2019

masked_lm_accuracy is low at 0.51, but next_sentence_accuracy is high at 0.93 #557

masked_lm_accuracy is low at 0.51, but next_sentence_accuracy is high at 0.93 #557

Comments

SeekPoint commented Apr 6, 2019

KavyaGujjala commented Apr 11, 2019

SeekPoint commented Apr 12, 2019

maggieezzat commented May 14, 2019

linWujl commented Oct 24, 2019