Negative rho values in GSAM training #80

ankitkv · 2023-11-29T23:53:58Z

Hi! I've been trying to reproduce the GSAM results. I noticed that in the code, the learning rate (LR) warmup starts from 0, which is lower than the minimum LR for the post-warmup decay. Because of this, the rho parameter, which is scheduled proportionally with the LR, has negative values early in training.

This does not seem intentional, as rho is never supposed to be negative according to the paper. I'm curious if this makes any difference to the results of the paper if fixed. My guess is that its a very small amount of training (1/3 of the first epoch) and wouldn't change anything.

@lucasb-eyer @juntang-zhuang

ankitkv · 2023-12-12T20:29:52Z

I confirm it doesn't make a significant difference.

ankitkv closed this as completed Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative rho values in GSAM training #80

Negative rho values in GSAM training #80

ankitkv commented Nov 29, 2023 •

edited

ankitkv commented Dec 12, 2023 •

edited

Negative rho values in GSAM training #80

Negative rho values in GSAM training #80

Comments

ankitkv commented Nov 29, 2023 • edited

ankitkv commented Dec 12, 2023 • edited

ankitkv commented Nov 29, 2023 •

edited

ankitkv commented Dec 12, 2023 •

edited