Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear3D #2508

Merged
merged 5 commits into from Aug 22, 2020
Merged

Linear3D #2508

merged 5 commits into from Aug 22, 2020

Conversation

mrityunjay-tripathi
Copy link
Member

@mrityunjay-tripathi mrityunjay-tripathi commented Jul 10, 2020

The current Linear layer works well for the 2D input, like if a data point has 'n' features. Then the shape of input is (n, batchSize).
The size of the weight for the 2D Linear layer is (inSize, outSize).

But when we have 3D input where every batch contains multiple data points and each data points has 'n' features. Like in the case of word embeddings, where each batch contains multiple sequences/sentences and each sequence contains embedding vector for each word. So, effectively the shape of such input will be (sequenceLength, embeddingSize, batchSize), and the number of features is embeddingSize. Now when we try to use the existing Linear layer, we need to vectorize each slice so that shape becomes (sequenceLength * embeddingSize, batchSize). And the number of features is taken as sequenceLength * embeddingSize which is not true. Even if we try to do such a thing, the number of parameters (the shape of weight will be (sequenceLength * embeddingSize, outSize)) will be much higher than what it should be.

In this pull request, I have tried to solve this issue. Let me know your thoughts.
(I asked about this on the IRC but when I am searching those messages now, I'm not getting. Looks like sometimes my messages go to Bermuda triangle :D)

Copy link
Contributor

@lozhnikov lozhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a couple of comments.

src/mlpack/methods/ann/layer/linear3d.hpp Outdated Show resolved Hide resolved
src/mlpack/methods/ann/layer/linear3d_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/methods/ann/layer/linear3d_impl.hpp Outdated Show resolved Hide resolved
@mrityunjay-tripathi
Copy link
Member Author

@lozhnikov Memory errors fixed! (I don't know how :D)

Copy link
Contributor

@lozhnikov lozhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I have a question about the ordering.

src/mlpack/methods/ann/layer/linear3d_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/methods/ann/layer/linear3d_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/methods/ann/layer/linear3d_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/methods/ann/layer/linear3d.hpp Show resolved Hide resolved
@lozhnikov
Copy link
Contributor

So, effectively the shape of such input will be (embeddingSize, sequenceLength, batchSize)

If I remember right, we changed the ordering in MultiHeadAttention, didn't we?

@mrityunjay-tripathi
Copy link
Member Author

So, effectively the shape of such input will be (embeddingSize, sequenceLength, batchSize)

If I remember right, we changed the ordering in MultiHeadAttention, didn't we?

Right. I will update the description :)

Copy link
Contributor

@lozhnikov lozhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I added a few style suggestions.

src/mlpack/methods/ann/layer/linear3d_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/methods/ann/layer/linear3d_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/methods/ann/layer/linear3d_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/methods/ann/layer/lookup_impl.hpp Outdated Show resolved Hide resolved
Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@lozhnikov
Copy link
Contributor

@mlpack-jenkins test this please

@lozhnikov
Copy link
Contributor

I checked the Linear3D tests and the Lookup tests with valgrind. It didn't show any memory errors. So, I assume these errors are in other tests/methods. If no one objects, I'll merge the PR tomorrow.

@lozhnikov lozhnikov merged commit e6d047b into mlpack:master Aug 22, 2020
@lozhnikov
Copy link
Contributor

I merged the PR. Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants