Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quesiton about attention's qkv matrix #237

Open
JearSYY opened this issue Sep 21, 2022 · 2 comments
Open

Quesiton about attention's qkv matrix #237

JearSYY opened this issue Sep 21, 2022 · 2 comments

Comments

@JearSYY
Copy link

JearSYY commented Sep 21, 2022

Hello thanks for this great repo!

I am confused about the details in the vit.py. In the attention's section, when compute the q, k, v matrix, you project x from ( b, n, d ) to ( b, n, 3d) and then split x to 3 parts using chunk() like:

qkv = self.to_qkv(x).chunk(3, dim = -1)

However, I think in this way q, k and v matrix only contains part of the information of the original matrix x, which is not the exact meaning of the transformer paper. In the original paper, q, k, v contains all the information of the input matrix, and then perform dot production to compute attentions. Please check xD.

PS: I am a beginner in this topic, if I have any misunderstanding, please figure it out and sorry for any possible inconveniece.

@YGwhere
Copy link

YGwhere commented Nov 9, 2023

You may want to take a look at the linear layer in self.to_qkv(x). The output of the linear layer is set to inner_dim*3.
The author merged the weights of qkv into a linear layer.
So the chunk(3, dim=-1) decomposes qkv into a list.

PS: I am also a beginner, so I don’t know if this is the right way to understand it. 🤓

@chengengliu
Copy link

It is just a shortcut to avoid repeating such code:self.to_q = nn.Linear(dim, inner_dim, bias = bias) three times(for qkv you need three linear mappings). Of course there are other implementations that are written in the above styles, but the idea of building QKV is the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants