You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m not quite sure by just looking at these sections. It seems the general idea is to perform gradient descent on per-element lr. Seems to be interesting. But I’m quite concerned about the fast convergence is due to lr is rapidly decayed, rather than it truly learns well. Another concern is with computation, because ADAS needs to take extra gradient w.r.t learning rate, not sure how much burden will it cost. Perhaps need some more validation.
What do you think about merge Adabelief with Adas (https://github.com/YanaiEliyahu/AdasOptimizer)?
Or do they conflict?
The text was updated successfully, but these errors were encountered: