Why did the Dual gradient collapse on my own Chinese dataset? #10

wangqian97 · 2022-10-25T12:56:44Z

Dear author, your framework is valid on the English dataset, but when I used dual-loss deficiency on my Chinese dataset, gradient collapse occurred. My Chinese label is two characters, is it related to this? Or do I have to adjust somewhere? Thank you very much. Look forward to hearing from you soon

hiyouga · 2022-10-26T03:08:10Z

Hi, we should use a single word to tokenize each label in DualCL. I conjecture that if the two-character Chinese label is encoded by two or more tokens, the DualCL loss will perform abnormally. Consider adding the whole label to the dictionary or alerting the label to a single (Chinese) character.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why did the Dual gradient collapse on my own Chinese dataset? #10

Why did the Dual gradient collapse on my own Chinese dataset? #10

wangqian97 commented Oct 25, 2022

hiyouga commented Oct 26, 2022 •

edited

Loading

Why did the Dual gradient collapse on my own Chinese dataset? #10

Why did the Dual gradient collapse on my own Chinese dataset? #10

Comments

wangqian97 commented Oct 25, 2022

hiyouga commented Oct 26, 2022 • edited Loading

hiyouga commented Oct 26, 2022 •

edited

Loading