Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt MultiheadAttention and LayerNorm to new Layer interface #3547

Merged
merged 160 commits into from Nov 22, 2023

Conversation

akropp
Copy link
Contributor

@akropp akropp commented Oct 18, 2023

Updated the MultiheadAttention and LayerNorm classes to conform to the new ANN Layer interface, and moved them from the layer/not_adapted directory to the layer directory. Both have been tested by external code, but don't have a testcase in the repo.
Also, fixed the logic in MultiLayer::Backward and FFN:EvaluateWithGradient to pass the appropriate values to the input parameter of the Backward call on the contained layers. Most implementations of Backward didn't actually use this parameter, so the incorrect behavior was probably not noticed; however the LayerNorm class does reference this parameter.

(cherry picked from commit 0728f3b)
@akropp
Copy link
Contributor Author

akropp commented Oct 19, 2023

Changed it to use MakeAlias. This is how the same thing is accomplished in rnn_impl. Not sure it's really "better", but at least that constructor is in the public API.

akropp and others added 5 commits October 19, 2023 14:18
(cherry picked from commit 78c4ccb)
(cherry picked from commit dfafc15)
Removed default value for eps.
(cherry picked from commit 20f646e)
@akropp
Copy link
Contributor Author

akropp commented Oct 20, 2023

Realized I was missing some commits for multihead_attention

@akropp
Copy link
Contributor Author

akropp commented Oct 23, 2023

So I feel I have to comment here on what these changes are about. In working with the MultiheadAttention class and some others, I discovered that multi_layer_impl was not passing the correct values to the Backward method (unless I am mis-understanding this parameter). I believe the backward call should be passed the INPUTs to the layer, the DELTA (gy) of the output, and the gradient out parameter (g). multi_layer_impl was in fact passing the OUTPUT from the layer.

Most layer implementations actually just ignore this parameter, so it didn't matter. However, SoftmaxLayerType and LogSoftmaxLayerType both used it, and assumed that is was in fact the output (since that was was multi_layer was passing). Thus when I initially made this change, those test cases were failing. I believe instead the right thing to do is to make those two layers save their own outputs since they need to use them (or I guess alternately re-compute them). This is what I have done in this checkout. As well as a couple of other random places that used the input param but didn't need to.

All the tests pass now. LMK if you agree with my understanding, or if I should re-do this all.

comment out unusued parameter "input"
@rcurtin
Copy link
Member

rcurtin commented Oct 25, 2023

Thanks for the deep dive into the Backward() function---I left a response on #3551. 👍 (Still reviewing the PR itself.)

@akropp
Copy link
Contributor Author

akropp commented Oct 26, 2023

Sure. Now that it seems that more needs to be done w.r.t. #3551, I might separate out the refactoring of the input stuff from the MultiheadAttention and LayerNorm moves. I had done the changes at the same time, but they should be separable.

@akropp
Copy link
Contributor Author

akropp commented Nov 16, 2023

Actually, I think it has something to do with aliasing. I installed 9.8 (what we are using on linux) on my mac, and reproduced the same problem. The MultiheadAttention class was using the constructor which takes a memory pointer directly, rather than the MakeAlias call. Not sure exactly what the failure mode was (the MakeAlias call calls the destructor on the alias first, and then placement new), but seems to have fixed it for 9.8. Apparently something is different about newer versions where this isn't necessary.

akropp and others added 6 commits November 16, 2023 09:18
…n.hpp

Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
@akropp
Copy link
Contributor Author

akropp commented Nov 16, 2023

Was going to update the HISTORY.md. I'll add in a reference to this PR and the Issue (3551) about the input parameter. Do you want me to add more line items for the other issues we kind of rolled into here? (e.g. activation functions, bug fixes).

Added to the files where I made meaningful logic changes.
@conradsnicta
Copy link
Contributor

Actually, I think it has something to do with aliasing. I installed 9.8 (what we are using on linux) on my mac, and reproduced the same problem. (...) Apparently something is different about newer versions where this isn't necessary.

It's possible the problem you've encountered is related to interactions between move constructors and matrices that use auxiliary memory. This issue was fixed in Armadillo 11.4.

Either way, Armadillo 9.800 is ancient history at this point. I believe the only reason that mlpack has 9.800 as the minimum version is that this is the default version in Ubuntu 20.04 LTS.

I recommend updating the minimum version of Armadillo to 10.8, which is the default version in Ubuntu 22.04 LTS. The vast majority of folks that use Ubuntu LTS would have moved to 22.04 LTS at this point (especially desktop users). Armadillo 10.8 is faster and has considerably more functionality.

@akropp
Copy link
Contributor Author

akropp commented Nov 17, 2023

Definitely seems like that was the issue. Fixed here anyways, but agreed that it might be good to up the min version at some point.

@rcurtin
Copy link
Member

rcurtin commented Nov 20, 2023

Actually, I think it has something to do with aliasing. I installed 9.8 (what we are using on linux) on my mac, and reproduced the same problem. (...) Apparently something is different about newer versions where this isn't necessary.

It's possible the problem you've encountered is related to interactions between move constructors and matrices that use auxiliary memory. This issue was fixed in Armadillo 11.4.

Yeah, now I am remembering, I think that is a part of why MakeAlias() was written.

Either way, Armadillo 9.800 is ancient history at this point. I believe the only reason that mlpack has 9.800 as the minimum version is that this is the default version in Ubuntu 20.04 LTS.

I recommend updating the minimum version of Armadillo to 10.8, which is the default version in Ubuntu 22.04 LTS. The vast majority of folks that use Ubuntu LTS would have moved to 22.04 LTS at this point (especially desktop users). Armadillo 10.8 is faster and has considerably more functionality.

I get the argument (you may have written it down more than one time at this point 😄) and I'm not necessarily in disagreement, but it does take a little effort to update the minimum version (build systems need to be updated, build scripts need to be updated, other little things here and there). I'll see if I can find some time to look into that in the future, but at least for the next couple weeks I'm a bit tapped out. I do still think it's important to support versions of dependencies available in commonly-used distributions or environments, to reduce the friction of installation and usage. Fortunately, the situation for mlpack, its dependencies, and C++ in general, is that it's a bit easier now to work around old system versions and install by hand (especially since "installing by hand" for mlpack and all its direct dependencies is just "put headers in place").

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I am super happy to have these layers available again. Thank you so much for the extensive effort and bouncing back and forth on matters as trivial as spacing, and as complex as the signature that the activation functions require. This is such a nice improvement, and I can't think of anything else that needs to be done before merge. (As per usual I left a few more trivial comments :))

HISTORY.md Outdated Show resolved Hide resolved
HISTORY.md Outdated Show resolved Hide resolved
src/mlpack/methods/ann/ffn_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/methods/ann/ffn_impl.hpp Outdated Show resolved Hide resolved
akropp and others added 4 commits November 20, 2023 15:10
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
@akropp
Copy link
Contributor Author

akropp commented Nov 21, 2023

Any idea what's up with the static code checks? That failure is new.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@rcurtin
Copy link
Member

rcurtin commented Nov 21, 2023

Sorry, I think that's an error in the static code job I introduced accidentally elsewhere. I think I fixed it...

@rcurtin
Copy link
Member

rcurtin commented Nov 21, 2023

@mlpack-jenkins test this please

@akropp
Copy link
Contributor Author

akropp commented Nov 21, 2023

Did the trick.

@rcurtin rcurtin merged commit 1839274 into mlpack:master Nov 22, 2023
19 checks passed
@rcurtin rcurtin mentioned this pull request Nov 22, 2023
@akropp akropp deleted the adapt_multihead branch November 22, 2023 14:18
@conradsnicta
Copy link
Contributor

@rcurtin

(...) it does take a little effort to update the minimum version (build systems need to be updated, build scripts need to be updated, other little things here and there). I'll see if I can find some time to look into that in the future, but at least for the next couple weeks I'm a bit tapped out.

Probably a good time to update the minimum Armadillo version is when Ubuntu 24.04 LTS comes out. This is scheduled for late April 2024, which is about 5 months from now.

At that stage we can pretty much assume that the vast majority of Ubuntu users would be at least on the previous 22.04 LTS release, which contains Armadillo 10.8.

Debian 12 (current release) is at Armadillo 11.4. Fedora and RHEL EPEL are currently at Armadillo 10.8. Other distros like Arch and OpenSuse are on more recent releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants