You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the Equation (5) in the paper below. Given the output of sigmoid() is the attention (i.e., As, of size l1 x 1) between the task-specific token and all patch tokens, what does As*Vs mean if the Vs is a value of the task-specific token? Why not using values of patch tokens?
The text was updated successfully, but these errors were encountered:
Excellent work! Thanks very much for the repo.
I have a question regarding the Equation (5) in the paper below. Given the output of sigmoid() is the attention (i.e., As, of size l1 x 1) between the task-specific token and all patch tokens, what does As*Vs mean if the Vs is a value of the task-specific token? Why not using values of patch tokens?
The text was updated successfully, but these errors were encountered: