You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I've been trying to reproduce the GSAM results. I noticed that in the code, the learning rate (LR) warmup starts from 0, which is lower than the minimum LR for the post-warmup decay. Because of this, the rho parameter, which is scheduled proportionally with the LR, has negative values early in training.
This does not seem intentional, as rho is never supposed to be negative according to the paper. I'm curious if this makes any difference to the results of the paper if fixed. My guess is that its a very small amount of training (1/3 of the first epoch) and wouldn't change anything.
Hi! I've been trying to reproduce the GSAM results. I noticed that in the code, the learning rate (LR) warmup starts from 0, which is lower than the minimum LR for the post-warmup decay. Because of this, the rho parameter, which is scheduled proportionally with the LR, has negative values early in training.
This does not seem intentional, as rho is never supposed to be negative according to the paper. I'm curious if this makes any difference to the results of the paper if fixed. My guess is that its a very small amount of training (1/3 of the first epoch) and wouldn't change anything.
@lucasb-eyer @juntang-zhuang
The text was updated successfully, but these errors were encountered: