Tutorial 6: error in the `MultiheadAttention.forward` method #51

MTDzi · 2022-08-21T20:08:50Z

Tutorial: 6

Describe the bug
In the MultiheadAttention.forward method, the line:

        values = values.reshape(batch_size, seq_length, embed_dim)

should read:

        values = values.reshape(batch_size, seq_length, self.embed_dim)

The embed_dim should not come from the input tensor, i.e. instead of:

        batch_size, seq_length, embed_dim = x.size()

we should probably have something like:

        batch_size, seq_length, _ = x.size()

or

        batch_size, seq_length, input_dim = x.size()

To Reproduce (if any steps necessary)
Steps to reproduce the behavior:

Go to the In [5]: cell, the one containing class MultiheadAttention(nn.Module):
Run it
Insert a cell under it
Run the following:

batch_size = 3
seq_len = 11
input_dim = 13
num_heads = 19
embed_dim = 17 * num_heads

mha = MultiheadAttention(input_dim, embed_dim, num_heads)

input_tensor = torch.rand((batch_size, seq_len, input_dim))
values = mha(input_tensor)

values.shape

which yields the following error:

RuntimeError                              Traceback (most recent call last)
[<ipython-input-50-38c850c37259>](https://localhost:8080/#) in <module>
      8 
      9 input_tensor = torch.rand((batch_size, seq_len, input_dim))
---> 10 values = mha(input_tensor)
     11 
     12 values.shape

1 frames
[<ipython-input-49-45be71448f04>](https://localhost:8080/#) in forward(self, x, mask, return_attention)
     36         values = values.permute(0, 2, 1, 3) # [Batch, SeqLen, Head, Dims]
     37         # values = values.reshape(batch_size, seq_length, embed_dim)
---> 38         values = values.reshape(batch_size, seq_length, embed_dim)
     39         o = self.o_proj(values)
     40 

RuntimeError: shape '[3, 11, 13]' is invalid for input of size 10659

Expected behavior
After making the suggested change, the output is:

torch.Size([3, 11, 323])

which is what I was expecting to get.

Runtime environment (please complete the following information):
Google Colab, both CPU and GPU.

The text was updated successfully, but these errors were encountered:

phlippe · 2022-08-22T18:07:49Z

Hi @MTDzi, thanks a lot for pointing this bug out! It should be fixed in the newest version. Let me know if you find any other bugs :)

MTDziCOCO · 2022-08-22T18:12:11Z

Awesome, thanks!

Great tutorial, BTW, I should have started with that ;)

I'll let you know if I find anything else, sure thing.

MTDzi added the bug Something isn't working label Aug 21, 2022

MTDzi assigned phlippe Aug 21, 2022

phlippe closed this as completed Aug 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial 6: error in the `MultiheadAttention.forward` method #51

Tutorial 6: error in the `MultiheadAttention.forward` method #51

MTDzi commented Aug 21, 2022

phlippe commented Aug 22, 2022

MTDziCOCO commented Aug 22, 2022

Tutorial 6: error in the MultiheadAttention.forward method #51

Tutorial 6: error in the MultiheadAttention.forward method #51

Comments

MTDzi commented Aug 21, 2022

phlippe commented Aug 22, 2022

MTDziCOCO commented Aug 22, 2022

Tutorial 6: error in the `MultiheadAttention.forward` method #51

Tutorial 6: error in the `MultiheadAttention.forward` method #51