Optimizers for coord check #16

xwjabc · 2022-06-01T19:01:09Z

Thank you for your great work! When trying the coord check in the examples, I noticed that the original optimizers (e.g., sgd, adam) are used instead of the muP optimizers (e.g., musgd, muadam). However, according to the Table 8 in the paper, the optimizers should be adjusted accordingly to make activations bounded. Is there any reason behind the use of original optimizers?

edwardjhu · 2022-06-01T19:37:38Z

Hi Weijian,

The mup coordinate check curves do use mu-optimizers. The conversion happens internally:

mup/mup/coord_check.py

Line 441 in eac6f1d

if mup:

Does this address your concern?

xwjabc · 2022-06-01T20:38:50Z

Thank you for your quick reply! Yes, it addresses my concern. I just found out the auto conversion right before seeing your reply 😆

thegregyang closed this as completed Jun 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizers for coord check #16

Optimizers for coord check #16

xwjabc commented Jun 1, 2022

edwardjhu commented Jun 1, 2022

xwjabc commented Jun 1, 2022

Optimizers for coord check #16

Optimizers for coord check #16

Comments

xwjabc commented Jun 1, 2022

edwardjhu commented Jun 1, 2022

xwjabc commented Jun 1, 2022