Adapt MultiheadAttention and LayerNorm to new Layer interface #3547

akropp · 2023-10-18T19:24:59Z

Updated the MultiheadAttention and LayerNorm classes to conform to the new ANN Layer interface, and moved them from the layer/not_adapted directory to the layer directory. Both have been tested by external code, but don't have a testcase in the repo.
Also, fixed the logic in MultiLayer::Backward and FFN:EvaluateWithGradient to pass the appropriate values to the input parameter of the Backward call on the contained layers. Most implementations of Backward didn't actually use this parameter, so the incorrect behavior was probably not noticed; however the LayerNorm class does reference this parameter.

(cherry picked from commit 5f69d92)

(cherry picked from commit cb50bf0)

(cherry picked from commit b8b7e2b)

(cherry picked from commit 4eff1b3)

… pass (cherry picked from commit fd105b8)

… pass (cherry picked from commit 1593e6e)

src/mlpack/methods/ann/ffn_impl.hpp

(cherry picked from commit 0728f3b)

akropp · 2023-10-19T14:34:24Z

Changed it to use MakeAlias. This is how the same thing is accomplished in rnn_impl. Not sure it's really "better", but at least that constructor is in the public API.

(cherry picked from commit 78c4ccb)

(cherry picked from commit dfafc15)

Removed default value for eps.

(cherry picked from commit 20f646e)

(cherry picked from commit e64b454)

akropp · 2023-10-20T11:29:11Z

Realized I was missing some commits for multihead_attention

(cherry picked from commit 7259b71)

akropp · 2023-10-23T17:45:48Z

So I feel I have to comment here on what these changes are about. In working with the MultiheadAttention class and some others, I discovered that multi_layer_impl was not passing the correct values to the Backward method (unless I am mis-understanding this parameter). I believe the backward call should be passed the INPUTs to the layer, the DELTA (gy) of the output, and the gradient out parameter (g). multi_layer_impl was in fact passing the OUTPUT from the layer.

Most layer implementations actually just ignore this parameter, so it didn't matter. However, SoftmaxLayerType and LogSoftmaxLayerType both used it, and assumed that is was in fact the output (since that was was multi_layer was passing). Thus when I initially made this change, those test cases were failing. I believe instead the right thing to do is to make those two layers save their own outputs since they need to use them (or I guess alternately re-compute them). This is what I have done in this checkout. As well as a couple of other random places that used the input param but didn't need to.

All the tests pass now. LMK if you agree with my understanding, or if I should re-do this all.

comment out unusued parameter "input"

rcurtin · 2023-10-25T14:03:52Z

Thanks for the deep dive into the Backward() function---I left a response on #3551. 👍 (Still reviewing the PR itself.)

akropp · 2023-10-26T13:59:34Z

Sure. Now that it seems that more needs to be done w.r.t. #3551, I might separate out the refactoring of the input stuff from the MultiheadAttention and LayerNorm moves. I had done the changes at the same time, but they should be separable.

akropp · 2023-11-16T14:17:23Z

Actually, I think it has something to do with aliasing. I installed 9.8 (what we are using on linux) on my mac, and reproduced the same problem. The MultiheadAttention class was using the constructor which takes a memory pointer directly, rather than the MakeAlias call. Not sure exactly what the failure mode was (the MakeAlias call calls the destructor on the alias first, and then placement new), but seems to have fixed it for 9.8. Apparently something is different about newer versions where this isn't necessary.

…n.hpp Co-authored-by: Ryan Curtin <ryan@ratml.org>

Co-authored-by: Ryan Curtin <ryan@ratml.org>

akropp · 2023-11-16T14:34:32Z

Was going to update the HISTORY.md. I'll add in a reference to this PR and the Issue (3551) about the input parameter. Do you want me to add more line items for the other issues we kind of rolled into here? (e.g. activation functions, bug fixes).

Added to the files where I made meaningful logic changes.

…to adapt_multihead

conradsnicta · 2023-11-17T02:47:43Z

Actually, I think it has something to do with aliasing. I installed 9.8 (what we are using on linux) on my mac, and reproduced the same problem. (...) Apparently something is different about newer versions where this isn't necessary.

It's possible the problem you've encountered is related to interactions between move constructors and matrices that use auxiliary memory. This issue was fixed in Armadillo 11.4.

Either way, Armadillo 9.800 is ancient history at this point. I believe the only reason that mlpack has 9.800 as the minimum version is that this is the default version in Ubuntu 20.04 LTS.

I recommend updating the minimum version of Armadillo to 10.8, which is the default version in Ubuntu 22.04 LTS. The vast majority of folks that use Ubuntu LTS would have moved to 22.04 LTS at this point (especially desktop users). Armadillo 10.8 is faster and has considerably more functionality.

akropp · 2023-11-17T11:40:42Z

Definitely seems like that was the issue. Fixed here anyways, but agreed that it might be good to up the min version at some point.

rcurtin · 2023-11-20T18:57:50Z

Actually, I think it has something to do with aliasing. I installed 9.8 (what we are using on linux) on my mac, and reproduced the same problem. (...) Apparently something is different about newer versions where this isn't necessary.

It's possible the problem you've encountered is related to interactions between move constructors and matrices that use auxiliary memory. This issue was fixed in Armadillo 11.4.

Yeah, now I am remembering, I think that is a part of why MakeAlias() was written.

Either way, Armadillo 9.800 is ancient history at this point. I believe the only reason that mlpack has 9.800 as the minimum version is that this is the default version in Ubuntu 20.04 LTS.

I recommend updating the minimum version of Armadillo to 10.8, which is the default version in Ubuntu 22.04 LTS. The vast majority of folks that use Ubuntu LTS would have moved to 22.04 LTS at this point (especially desktop users). Armadillo 10.8 is faster and has considerably more functionality.

I get the argument (you may have written it down more than one time at this point 😄) and I'm not necessarily in disagreement, but it does take a little effort to update the minimum version (build systems need to be updated, build scripts need to be updated, other little things here and there). I'll see if I can find some time to look into that in the future, but at least for the next couple weeks I'm a bit tapped out. I do still think it's important to support versions of dependencies available in commonly-used distributions or environments, to reduce the friction of installation and usage. Fortunately, the situation for mlpack, its dependencies, and C++ in general, is that it's a bit easier now to work around old system versions and install by hand (especially since "installing by hand" for mlpack and all its direct dependencies is just "put headers in place").

rcurtin

Awesome, I am super happy to have these layers available again. Thank you so much for the extensive effort and bouncing back and forth on matters as trivial as spacing, and as complex as the signature that the activation functions require. This is such a nice improvement, and I can't think of anything else that needs to be done before merge. (As per usual I left a few more trivial comments :))

HISTORY.md

src/mlpack/methods/ann/ffn_impl.hpp

Co-authored-by: Ryan Curtin <ryan@ratml.org>

akropp · 2023-11-21T12:29:55Z

Any idea what's up with the static code checks? That failure is new.

mlpack-bot

Second approval provided automatically after 24 hours. 👍

rcurtin · 2023-11-21T19:23:35Z

Sorry, I think that's an error in the static code job I introduced accidentally elsewhere. I think I fixed it...

rcurtin · 2023-11-21T19:23:41Z

@mlpack-jenkins test this please

akropp · 2023-11-21T21:35:30Z

Did the trick.

conradsnicta · 2023-11-25T02:23:45Z

@rcurtin

(...) it does take a little effort to update the minimum version (build systems need to be updated, build scripts need to be updated, other little things here and there). I'll see if I can find some time to look into that in the future, but at least for the next couple weeks I'm a bit tapped out.

Probably a good time to update the minimum Armadillo version is when Ubuntu 24.04 LTS comes out. This is scheduled for late April 2024, which is about 5 months from now.

At that stage we can pretty much assume that the vast majority of Ubuntu users would be at least on the previous 22.04 LTS release, which contains Armadillo 10.8.

Debian 12 (current release) is at Armadillo 11.4. Fedora and RHEL EPEL are currently at Armadillo 10.8. Other distros like Arch and OpenSuse are on more recent releases.

akropp added 6 commits October 18, 2023 15:09

Adapt MultiheadAttention to new Layer interface.

53cd8e7

(cherry picked from commit 5f69d92)

Adapt LayerNorm to new Layer interface.

727323b

(cherry picked from commit cb50bf0)

Adapt MultiheadAttention to new Layer interface.

a79ca49

(cherry picked from commit b8b7e2b)

Adapt LayerNorm to new Layer interface.

8ef9f59

(cherry picked from commit 4eff1b3)

Pass the correct "input" values to the sub-layers during the Backward…

beb2873

… pass (cherry picked from commit fd105b8)

Pass the correct "input" values to the sub-layers during the Backward…

0a48537

… pass (cherry picked from commit 1593e6e)

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Oct 18, 2023

conradsnicta reviewed Oct 19, 2023

View reviewed changes

src/mlpack/methods/ann/ffn_impl.hpp Outdated Show resolved Hide resolved

Use MakeAlias to slice input data

4de7035

(cherry picked from commit 0728f3b)

akropp and others added 5 commits October 19, 2023 14:18

Remove unnecessary template arg

ed0b423

(cherry picked from commit 78c4ccb)

Remove unnecessary template arg

b686012

(cherry picked from commit dfafc15)

Update layer_norm.hpp

441248f

Removed default value for eps.

Adapt multihead_attention

dcf4ce3

(cherry picked from commit 20f646e)

Add self-attention flag to MultiheadAttention layer

89b78f8

(cherry picked from commit e64b454)

akropp added 2 commits October 20, 2023 07:50

Fix LayerNorm

7993469

(cherry picked from commit 7259b71)

Fix references to "input" in the Backwards call.

2791789

Update softmax_impl.hpp

a0c3747

comment out unusued parameter "input"

Adding input and output parameters to Backward() method on Layer.

15c2c0f

akropp mentioned this pull request Oct 27, 2023

Inconsistent use of the "input" parameter to the Backward method in ANNs #3551

Closed

shrit added c: methods and removed s: unanswered s: unlabeled labels Oct 28, 2023

Merge remote-tracking branch 'upstream/master' into adapt_multihead

b26a2fc

akropp and others added 6 commits November 16, 2023 09:18

Update src/mlpack/methods/ann/activation_functions/hard_swish_functio…

130b075

…n.hpp Co-authored-by: Ryan Curtin <ryan@ratml.org>

Update src/mlpack/methods/ann/layer/layer_norm.hpp

27ad35c

Co-authored-by: Ryan Curtin <ryan@ratml.org>

Update src/mlpack/methods/ann/layer/multihead_attention_impl.hpp

2328247

Co-authored-by: Ryan Curtin <ryan@ratml.org>

Update src/mlpack/methods/ann/layer/multihead_attention.hpp

def2346

Co-authored-by: Ryan Curtin <ryan@ratml.org>

Update src/mlpack/methods/ann/layer/multihead_attention.hpp

13cb8c2

Co-authored-by: Ryan Curtin <ryan@ratml.org>

Update src/mlpack/methods/ann/layer/multihead_attention.hpp

0d4eb70

Co-authored-by: Ryan Curtin <ryan@ratml.org>

akropp added 3 commits November 16, 2023 13:46

Add author lines

511fe69

Added to the files where I made meaningful logic changes.

Update HISTORY.md

8e0039d

Merge branch 'adapt_multihead' of https://github.com/akropp/mlpack in…

df24810

…to adapt_multihead

rcurtin approved these changes Nov 20, 2023

View reviewed changes

HISTORY.md Outdated Show resolved Hide resolved

HISTORY.md Outdated Show resolved Hide resolved

src/mlpack/methods/ann/ffn_impl.hpp Outdated Show resolved Hide resolved

src/mlpack/methods/ann/ffn_impl.hpp Outdated Show resolved Hide resolved

akropp and others added 4 commits November 20, 2023 15:10

Update src/mlpack/methods/ann/ffn_impl.hpp

7bce96a

Co-authored-by: Ryan Curtin <ryan@ratml.org>

Update src/mlpack/methods/ann/ffn_impl.hpp

0f7f472

Co-authored-by: Ryan Curtin <ryan@ratml.org>

Update HISTORY.md

27eae7c

Co-authored-by: Ryan Curtin <ryan@ratml.org>

Update HISTORY.md

cb7c6d3

Co-authored-by: Ryan Curtin <ryan@ratml.org>

mlpack-bot bot approved these changes Nov 21, 2023

View reviewed changes

mlpack-bot bot removed the s: needs review label Nov 21, 2023

rcurtin merged commit 1839274 into mlpack:master Nov 22, 2023
19 checks passed

rcurtin mentioned this pull request Nov 22, 2023

Release version 4.3.0 #3562

Merged

akropp deleted the adapt_multihead branch November 22, 2023 14:18

shrit mentioned this pull request Dec 18, 2023

Addition of depthwise Separable Convolution layer #3576

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt MultiheadAttention and LayerNorm to new Layer interface #3547

Adapt MultiheadAttention and LayerNorm to new Layer interface #3547

akropp commented Oct 18, 2023

akropp commented Oct 19, 2023

akropp commented Oct 20, 2023

akropp commented Oct 23, 2023

rcurtin commented Oct 25, 2023

akropp commented Oct 26, 2023

akropp commented Nov 16, 2023

akropp commented Nov 16, 2023

conradsnicta commented Nov 17, 2023

akropp commented Nov 17, 2023

rcurtin commented Nov 20, 2023

rcurtin left a comment

akropp commented Nov 21, 2023

mlpack-bot bot left a comment

rcurtin commented Nov 21, 2023

rcurtin commented Nov 21, 2023

akropp commented Nov 21, 2023

conradsnicta commented Nov 25, 2023

Adapt MultiheadAttention and LayerNorm to new Layer interface #3547

Adapt MultiheadAttention and LayerNorm to new Layer interface #3547

Conversation

akropp commented Oct 18, 2023

akropp commented Oct 19, 2023

akropp commented Oct 20, 2023

akropp commented Oct 23, 2023

rcurtin commented Oct 25, 2023

akropp commented Oct 26, 2023

akropp commented Nov 16, 2023

akropp commented Nov 16, 2023

conradsnicta commented Nov 17, 2023

akropp commented Nov 17, 2023

rcurtin commented Nov 20, 2023

rcurtin left a comment

Choose a reason for hiding this comment

akropp commented Nov 21, 2023

mlpack-bot bot left a comment

Choose a reason for hiding this comment

rcurtin commented Nov 21, 2023

rcurtin commented Nov 21, 2023

akropp commented Nov 21, 2023

conradsnicta commented Nov 25, 2023