-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LAMB: Differences from the paper author's official implementation #107
Comments
Hi @binmakeswell, Thanks for the issue! Actually, if I remember correctly, I committed my implementation before there was any official implementation. I usually implement the paper directly, and not reproducing other implementations. In this case, I will check your reference and see how it benefits the training, performance-wise! If you have a modification suggestion, feel free to open a PR, I'll look into it! |
Thanks for your reply, by the way, the implementation of LARS seems to have a similar problem. According to the author and TensorFlow official implementation, they also skip some parameters according to their names() when calculating. |
Thanks for the specifics @binmakeswell ! A few things about the above discussion:
Looking forward to improve my implementation thanks to your feedback :D |
Thanks for you reply. |
Hi @binmakeswell 👋 I just opened a PR to have a different weight decay for normalization layers, meaning that the user can specify the WD to 0 for normalization layers. That should covers the modification you mentioned in this issue :) Let me know if you have any questions! |
The LAMB implementation of the PyTorch version you released is different from the official version of TensorFlow released by the paper author. According to the official implementation published in the paper, the author's code implementation skips some parameters according to their names() when calculating. But in your implementation, it seems that all parameters are directly involved in the calculation.
For example, exclude_from_weight_decay=["batch_normalization", "LayerNorm", "layer_norm"]
Their implementation:
https://github.com/tensorflow/addons/blob/master/tensorflow_addons/optimizers/lamb.py
The text was updated successfully, but these errors were encountered: