fine-tune with bert models #42

JaheimLee · 2021-01-11T07:20:05Z

Have you ever tested adabelief for fine-tuning bert models? And what's the recommended hyper-parameters?

juntang-zhuang · 2021-01-11T17:06:58Z

@JaheimLee Hi, I did not test with fine-tune bert, bert is too large for me. I tested a small transformer here https://github.com/juntang-zhuang/fairseq-adabelief , it seems the default in adabelief-pytorch==0.2.0 works. eps=1e-16 helps. I'm not so sure about rectify, sometimes it helps sometimes not, perhaps need some tuning. Other hyper-parameters, such as lr, beta, the same value as Adam works.
BTW, if you use fp16 to accelerate, v0.2.0 might be problematic because eps=1e-16 is rounded to 0 in fp16. A by-pass is to forward and backward in fp16, but update parameter in fp32. See the link below #31 (comment), we are considering adding this to the next release

JaheimLee · 2021-01-12T06:49:25Z

ok, thanks!

JaheimLee closed this as completed Jan 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fine-tune with bert models #42

fine-tune with bert models #42

JaheimLee commented Jan 11, 2021

juntang-zhuang commented Jan 11, 2021 •

edited

Loading

JaheimLee commented Jan 12, 2021

fine-tune with bert models #42

fine-tune with bert models #42

Comments

JaheimLee commented Jan 11, 2021

juntang-zhuang commented Jan 11, 2021 • edited Loading

JaheimLee commented Jan 12, 2021

juntang-zhuang commented Jan 11, 2021 •

edited

Loading