Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why did the Dual gradient collapse on my own Chinese dataset? #10

Open
wangqian97 opened this issue Oct 25, 2022 · 1 comment
Open

Why did the Dual gradient collapse on my own Chinese dataset? #10

wangqian97 opened this issue Oct 25, 2022 · 1 comment

Comments

@wangqian97
Copy link

Dear author, your framework is valid on the English dataset, but when I used dual-loss deficiency on my Chinese dataset, gradient collapse occurred. My Chinese label is two characters, is it related to this? Or do I have to adjust somewhere? Thank you very much. Look forward to hearing from you soon

@hiyouga
Copy link
Owner

hiyouga commented Oct 26, 2022

Hi, we should use a single word to tokenize each label in DualCL. I conjecture that if the two-character Chinese label is encoded by two or more tokens, the DualCL loss will perform abnormally. Consider adding the whole label to the dictionary or alerting the label to a single (Chinese) character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants