Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial 6: error in the MultiheadAttention.forward method #51

Closed
MTDzi opened this issue Aug 21, 2022 · 2 comments
Closed

Tutorial 6: error in the MultiheadAttention.forward method #51

MTDzi opened this issue Aug 21, 2022 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@MTDzi
Copy link

MTDzi commented Aug 21, 2022

Tutorial: 6

Describe the bug
In the MultiheadAttention.forward method, the line:

        values = values.reshape(batch_size, seq_length, embed_dim)

should read:

        values = values.reshape(batch_size, seq_length, self.embed_dim)

The embed_dim should not come from the input tensor, i.e. instead of:

        batch_size, seq_length, embed_dim = x.size()

we should probably have something like:

        batch_size, seq_length, _ = x.size()

or

        batch_size, seq_length, input_dim = x.size()

To Reproduce (if any steps necessary)
Steps to reproduce the behavior:

  1. Go to the In [5]: cell, the one containing class MultiheadAttention(nn.Module):
  2. Run it
  3. Insert a cell under it
  4. Run the following:
batch_size = 3
seq_len = 11
input_dim = 13
num_heads = 19
embed_dim = 17 * num_heads

mha = MultiheadAttention(input_dim, embed_dim, num_heads)

input_tensor = torch.rand((batch_size, seq_len, input_dim))
values = mha(input_tensor)

values.shape

which yields the following error:

RuntimeError                              Traceback (most recent call last)
[<ipython-input-50-38c850c37259>](https://localhost:8080/#) in <module>
      8 
      9 input_tensor = torch.rand((batch_size, seq_len, input_dim))
---> 10 values = mha(input_tensor)
     11 
     12 values.shape

1 frames
[<ipython-input-49-45be71448f04>](https://localhost:8080/#) in forward(self, x, mask, return_attention)
     36         values = values.permute(0, 2, 1, 3) # [Batch, SeqLen, Head, Dims]
     37         # values = values.reshape(batch_size, seq_length, embed_dim)
---> 38         values = values.reshape(batch_size, seq_length, embed_dim)
     39         o = self.o_proj(values)
     40 

RuntimeError: shape '[3, 11, 13]' is invalid for input of size 10659

Expected behavior
After making the suggested change, the output is:

torch.Size([3, 11, 323])

which is what I was expecting to get.

Runtime environment (please complete the following information):
Google Colab, both CPU and GPU.

@MTDzi MTDzi added the bug Something isn't working label Aug 21, 2022
@phlippe
Copy link
Owner

phlippe commented Aug 22, 2022

Hi @MTDzi, thanks a lot for pointing this bug out! It should be fixed in the newest version. Let me know if you find any other bugs :)

@phlippe phlippe closed this as completed Aug 22, 2022
@MTDziCOCO
Copy link

Awesome, thanks!

Great tutorial, BTW, I should have started with that ;)

I'll let you know if I find anything else, sure thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants