New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear3D #2508
Linear3D #2508
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a couple of comments.
@lozhnikov Memory errors fixed! (I don't know how :D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I have a question about the ordering.
If I remember right, we changed the ordering in |
Right. I will update the description :) |
e560028
to
196b14b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I added a few style suggestions.
5196160
to
e511f5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
@mlpack-jenkins test this please |
I checked the Linear3D tests and the Lookup tests with valgrind. It didn't show any memory errors. So, I assume these errors are in other tests/methods. If no one objects, I'll merge the PR tomorrow. |
I merged the PR. Thanks for the contribution! |
The current Linear layer works well for the 2D input, like if a data point has 'n' features. Then the shape of input is (n, batchSize).
The size of the weight for the 2D Linear layer is
(inSize, outSize)
.But when we have 3D input where every batch contains multiple data points and each data points has 'n' features. Like in the case of word embeddings, where each batch contains multiple sequences/sentences and each sequence contains embedding vector for each word. So, effectively the shape of such input will be
(sequenceLength, embeddingSize, batchSize)
, and the number of features isembeddingSize
. Now when we try to use the existing Linear layer, we need to vectorize each slice so that shape becomes(sequenceLength * embeddingSize, batchSize)
. And the number of features is taken assequenceLength * embeddingSize
which is not true. Even if we try to do such a thing, the number of parameters (the shape of weight will be(sequenceLength * embeddingSize, outSize)
) will be much higher than what it should be.In this pull request, I have tried to solve this issue. Let me know your thoughts.
(I asked about this on the IRC but when I am searching those messages now, I'm not getting. Looks like sometimes my messages go to Bermuda triangle :D)