We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiheadAttention.forward
Tutorial: 6
Describe the bug In the MultiheadAttention.forward method, the line:
values = values.reshape(batch_size, seq_length, embed_dim)
should read:
values = values.reshape(batch_size, seq_length, self.embed_dim)
The embed_dim should not come from the input tensor, i.e. instead of:
embed_dim
batch_size, seq_length, embed_dim = x.size()
we should probably have something like:
batch_size, seq_length, _ = x.size()
or
batch_size, seq_length, input_dim = x.size()
To Reproduce (if any steps necessary) Steps to reproduce the behavior:
In [5]:
class MultiheadAttention(nn.Module):
batch_size = 3 seq_len = 11 input_dim = 13 num_heads = 19 embed_dim = 17 * num_heads mha = MultiheadAttention(input_dim, embed_dim, num_heads) input_tensor = torch.rand((batch_size, seq_len, input_dim)) values = mha(input_tensor) values.shape
which yields the following error:
RuntimeError Traceback (most recent call last) [<ipython-input-50-38c850c37259>](https://localhost:8080/#) in <module> 8 9 input_tensor = torch.rand((batch_size, seq_len, input_dim)) ---> 10 values = mha(input_tensor) 11 12 values.shape 1 frames [<ipython-input-49-45be71448f04>](https://localhost:8080/#) in forward(self, x, mask, return_attention) 36 values = values.permute(0, 2, 1, 3) # [Batch, SeqLen, Head, Dims] 37 # values = values.reshape(batch_size, seq_length, embed_dim) ---> 38 values = values.reshape(batch_size, seq_length, embed_dim) 39 o = self.o_proj(values) 40 RuntimeError: shape '[3, 11, 13]' is invalid for input of size 10659
Expected behavior After making the suggested change, the output is:
torch.Size([3, 11, 323])
which is what I was expecting to get.
Runtime environment (please complete the following information): Google Colab, both CPU and GPU.
The text was updated successfully, but these errors were encountered:
Hi @MTDzi, thanks a lot for pointing this bug out! It should be fixed in the newest version. Let me know if you find any other bugs :)
Sorry, something went wrong.
Awesome, thanks!
Great tutorial, BTW, I should have started with that ;)
I'll let you know if I find anything else, sure thing.
phlippe
No branches or pull requests
Tutorial: 6
Describe the bug
In the
MultiheadAttention.forward
method, the line:should read:
The
embed_dim
should not come from the input tensor, i.e. instead of:we should probably have something like:
or
To Reproduce (if any steps necessary)
Steps to reproduce the behavior:
In [5]:
cell, the one containingclass MultiheadAttention(nn.Module):
which yields the following error:
Expected behavior
After making the suggested change, the output is:
which is what I was expecting to get.
Runtime environment (please complete the following information):
Google Colab, both CPU and GPU.
The text was updated successfully, but these errors were encountered: