Why don't train the k_bias #52

slchenchn · 2021-12-09T03:52:23Z

Hi, I found that the k_bias of every attention layer is set as not requiring gradient, while the q_bias and v_bias do require gradient, see the code

Is there any reason behind this? Since all the biases are set to learnable in common implementations.

The text was updated successfully, but these errors were encountered:

liyz15 · 2021-12-09T04:51:54Z

Check this microsoft/unilm#510

slchenchn · 2021-12-09T04:57:42Z

Check this microsoft/unilm#510

thanks!

slchenchn closed this as completed Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why don't train the k_bias #52

Why don't train the k_bias #52

slchenchn commented Dec 9, 2021 •

edited

Loading

liyz15 commented Dec 9, 2021

slchenchn commented Dec 9, 2021

Why don't train the k_bias #52

Why don't train the k_bias #52

Comments

slchenchn commented Dec 9, 2021 • edited Loading

liyz15 commented Dec 9, 2021

slchenchn commented Dec 9, 2021

slchenchn commented Dec 9, 2021 •

edited

Loading