You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because the query is the same over all the keys, so the term (q.weight * key.bias + q.bias * key.bias) remains the same across all the keys, which in turn can be cancelled without affecting the softmax results.
The k bias is always zero in code. Is there any reason for this? This is different from the normal implement.
unilm/beit/modeling_finetune.py
Line 124 in 421cffe
In my test. when finetune, k bias has little affect on performance. But I do not have a test on pretrain.
The text was updated successfully, but these errors were encountered: