We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, I found that the k_bias of every attention layer is set as not requiring gradient, while the q_bias and v_bias do require gradient, see the code
k_bias
q_bias
v_bias
Is there any reason behind this? Since all the biases are set to learnable in common implementations.
The text was updated successfully, but these errors were encountered:
Check this microsoft/unilm#510
Sorry, something went wrong.
thanks!
No branches or pull requests
Hi, I found that the
k_bias
of every attention layer is set as not requiring gradient, while theq_bias
andv_bias
do require gradient, see the codeIs there any reason behind this? Since all the biases are set to learnable in common implementations.
The text was updated successfully, but these errors were encountered: