Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why don't train the k_bias #52

Closed
slchenchn opened this issue Dec 9, 2021 · 2 comments
Closed

Why don't train the k_bias #52

slchenchn opened this issue Dec 9, 2021 · 2 comments

Comments

@slchenchn
Copy link

slchenchn commented Dec 9, 2021

Hi, I found that the k_bias of every attention layer is set as not requiring gradient, while the q_bias and v_bias do require gradient, see the code

Is there any reason behind this? Since all the biases are set to learnable in common implementations.

@liyz15
Copy link

liyz15 commented Dec 9, 2021

Check this microsoft/unilm#510

@slchenchn
Copy link
Author

Check this microsoft/unilm#510

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants