New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt MultiheadAttention and LayerNorm to new Layer interface #3547
Conversation
(cherry picked from commit 0728f3b)
Changed it to use MakeAlias. This is how the same thing is accomplished in rnn_impl. Not sure it's really "better", but at least that constructor is in the public API. |
Realized I was missing some commits for multihead_attention |
(cherry picked from commit 7259b71)
So I feel I have to comment here on what these changes are about. In working with the MultiheadAttention class and some others, I discovered that multi_layer_impl was not passing the correct values to the Backward method (unless I am mis-understanding this parameter). I believe the backward call should be passed the INPUTs to the layer, the DELTA (gy) of the output, and the gradient out parameter (g). multi_layer_impl was in fact passing the OUTPUT from the layer. Most layer implementations actually just ignore this parameter, so it didn't matter. However, SoftmaxLayerType and LogSoftmaxLayerType both used it, and assumed that is was in fact the output (since that was was multi_layer was passing). Thus when I initially made this change, those test cases were failing. I believe instead the right thing to do is to make those two layers save their own outputs since they need to use them (or I guess alternately re-compute them). This is what I have done in this checkout. As well as a couple of other random places that used the input param but didn't need to. All the tests pass now. LMK if you agree with my understanding, or if I should re-do this all. |
comment out unusued parameter "input"
Thanks for the deep dive into the |
Sure. Now that it seems that more needs to be done w.r.t. #3551, I might separate out the refactoring of the input stuff from the MultiheadAttention and LayerNorm moves. I had done the changes at the same time, but they should be separable. |
Actually, I think it has something to do with aliasing. I installed 9.8 (what we are using on linux) on my mac, and reproduced the same problem. The MultiheadAttention class was using the constructor which takes a memory pointer directly, rather than the MakeAlias call. Not sure exactly what the failure mode was (the MakeAlias call calls the destructor on the alias first, and then placement new), but seems to have fixed it for 9.8. Apparently something is different about newer versions where this isn't necessary. |
…n.hpp Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Was going to update the HISTORY.md. I'll add in a reference to this PR and the Issue (3551) about the input parameter. Do you want me to add more line items for the other issues we kind of rolled into here? (e.g. activation functions, bug fixes). |
Added to the files where I made meaningful logic changes.
…to adapt_multihead
It's possible the problem you've encountered is related to interactions between move constructors and matrices that use auxiliary memory. This issue was fixed in Armadillo 11.4. Either way, Armadillo 9.800 is ancient history at this point. I believe the only reason that mlpack has 9.800 as the minimum version is that this is the default version in Ubuntu 20.04 LTS. I recommend updating the minimum version of Armadillo to 10.8, which is the default version in Ubuntu 22.04 LTS. The vast majority of folks that use Ubuntu LTS would have moved to 22.04 LTS at this point (especially desktop users). Armadillo 10.8 is faster and has considerably more functionality. |
Definitely seems like that was the issue. Fixed here anyways, but agreed that it might be good to up the min version at some point. |
Yeah, now I am remembering, I think that is a part of why
I get the argument (you may have written it down more than one time at this point 😄) and I'm not necessarily in disagreement, but it does take a little effort to update the minimum version (build systems need to be updated, build scripts need to be updated, other little things here and there). I'll see if I can find some time to look into that in the future, but at least for the next couple weeks I'm a bit tapped out. I do still think it's important to support versions of dependencies available in commonly-used distributions or environments, to reduce the friction of installation and usage. Fortunately, the situation for mlpack, its dependencies, and C++ in general, is that it's a bit easier now to work around old system versions and install by hand (especially since "installing by hand" for mlpack and all its direct dependencies is just "put headers in place"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I am super happy to have these layers available again. Thank you so much for the extensive effort and bouncing back and forth on matters as trivial as spacing, and as complex as the signature that the activation functions require. This is such a nice improvement, and I can't think of anything else that needs to be done before merge. (As per usual I left a few more trivial comments :))
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Any idea what's up with the static code checks? That failure is new. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
Sorry, I think that's an error in the static code job I introduced accidentally elsewhere. I think I fixed it... |
@mlpack-jenkins test this please |
Did the trick. |
Probably a good time to update the minimum Armadillo version is when Ubuntu 24.04 LTS comes out. This is scheduled for late April 2024, which is about 5 months from now. At that stage we can pretty much assume that the vast majority of Ubuntu users would be at least on the previous 22.04 LTS release, which contains Armadillo 10.8. Debian 12 (current release) is at Armadillo 11.4. Fedora and RHEL EPEL are currently at Armadillo 10.8. Other distros like Arch and OpenSuse are on more recent releases. |
Updated the MultiheadAttention and LayerNorm classes to conform to the new ANN Layer interface, and moved them from the layer/not_adapted directory to the layer directory. Both have been tested by external code, but don't have a testcase in the repo.
Also, fixed the logic in MultiLayer::Backward and FFN:EvaluateWithGradient to pass the appropriate values to the input parameter of the Backward call on the contained layers. Most implementations of Backward didn't actually use this parameter, so the incorrect behavior was probably not noticed; however the LayerNorm class does reference this parameter.