Question about pre-training language model #43

dy1998 · 2021-09-11T03:03:21Z

Hello, I find the setting of epoch in pretrain_language_model.yaml is 80, and I use 4 titan xp to pre-train language model. However, 1 epoch takes 8 hours, and 80 epochs need 640 hours(nearly 26 days). Should I have to train 80 epochs? How to judge the training of language model become convergence?

Jack-Lee-NULL · 2021-10-03T13:50:38Z

Hello, I find the setting of epoch in pretrain_language_model.yaml is 80, and I use 4 titan xp to pre-train language model. However, 1 epoch takes 8 hours, and 80 epochs need 640 hours(nearly 26 days). Should I have to train 80 epochs? How to judge the training of language model become convergence?

As I reproduced, 1~2 epochs are enough and final model (vision model + language model we pretrained + alignment model) could catch up with the performance author published in paper. But another problem I found, param use_sm is set False in pretrain_language_model.yaml. Is spelling mutation used when pretrain language model ?

baudm · 2021-10-08T09:05:55Z

@Jack-Lee-NULL did you try to evaluate LM separately? I'm training it using a smaller dataset (3.1M words, lowercase alphanumeric) and a similar setup (effective batch size = 4096), but word accuracy always saturates below 40%. This is well below the performance of the VM alone (> 85%) which converges to an acceptable state much more quickly.

baudm · 2021-10-08T13:59:48Z

@FangShancheng I'm using the pretrained weights for the LM and a small test script to probe its outputs given arbitrary inputs. I'm getting weird results. Below are some examples:

Input: hello tensor([[ 8, 5, 12, 12, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), length: tensor([6])
Output: hello tensor([[ 8, 5, 12, 12, 15, 0, 23, 13, 4, 19, 19, 23, 19, 20, 4, 13, 9, 23, 23, 25, 1, 12, 13, 13, 23, 20]])

Input: hello2 tensor([[ 8, 5, 12, 12, 15, 28, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), length: tensor([7])
Output: celicw tensor([[ 3, 5, 12, 9, 3, 23, 0, 30, 30, 12, 19, 28, 20, 14, 28, 29, 3, 25, 19, 20, 4, 28, 3, 16, 16, 20]])

Input: hllo tensor([[ 8, 12, 12, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), length: tensor([5])
Output: aaia tensor([[ 1, 1, 9, 1, 0, 23, 19, 1, 1, 2, 19, 23, 23, 1, 9, 13, 23, 23, 23, 1, 1, 9, 23, 23, 23, 1]])

Input: test tensor([[20, 5, 19, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), length: tensor([5])
Output: west tensor([[23, 5, 19, 20, 0, 19, 19, 9, 1, 19, 19, 19, 7, 2, 5, 5, 5, 19, 19, 9, 14, 5, 5, 19, 19, 19]])

For the first sample, the output is as expected. However, for the next two ones, the outputs are way off. For the last one, the model erroneously corrected test into west. Is this the expected behavior? I'm getting very low word accuracy when training the LM.

Jack-Lee-NULL · 2021-10-09T02:31:26Z

@baudm I did. Different training methods I tried. On different dataset(MJ+ST lexicon, Wiki103), similar conclusion I got.

feed incorrect word to language model when training, whose performance exceeds paper mentioned;
feed correct word to language model when training, whose performance is similar as paper mentioned;
@FangShancheng I test the performance of language model weight you published and contrast with I reproduced. I guess that training method 2(as I mentioned above) is used in paper. I don't realize, it looks like that training method 1 is also reasonable.

baudm · 2021-10-09T06:00:29Z

@Jack-Lee-NULL what's the metric you're using for evaluation? When using the pretrained LM weights, I'm getting unexpected results (see previous comment). Did you try to check the actual individual outputs of the LM? Here's my minimal test script for checking individual inputs.

baudm · 2021-10-09T07:20:34Z

@FangShancheng @Jack-Lee-NULL I just checked Table 4 of the paper and word accuracy of BCN is indeed just above 40%. So I guess the results I posted in my previous comments were expected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about pre-training language model #43

Question about pre-training language model #43

dy1998 commented Sep 11, 2021

Jack-Lee-NULL commented Oct 3, 2021

baudm commented Oct 8, 2021

baudm commented Oct 8, 2021 •

edited

Loading

Jack-Lee-NULL commented Oct 9, 2021

baudm commented Oct 9, 2021

baudm commented Oct 9, 2021

Question about pre-training language model #43

Question about pre-training language model #43

Comments

dy1998 commented Sep 11, 2021

Jack-Lee-NULL commented Oct 3, 2021

baudm commented Oct 8, 2021

baudm commented Oct 8, 2021 • edited Loading

Jack-Lee-NULL commented Oct 9, 2021

baudm commented Oct 9, 2021

baudm commented Oct 9, 2021

baudm commented Oct 8, 2021 •

edited

Loading