Same amount of VRAM is taken as in AdamW #22

VCasecnikovs · 2023-04-03T11:04:55Z

One of the main benefits of LION, is it needs to save less data for each param.
Adam needs to save Momentum and RMSProp ema's, while in LION we need to save only momentum ema.
When I try to use LION, it takes exactly the same amount of memory as AdamW

xiangning-chen · 2023-04-06T21:17:17Z

Hi, what is the model size in your setting?
When the model is small, I think the main memory overhead comes from the activation, so the saved second moment may not be significant.

VCasecnikovs · 2023-04-07T09:42:35Z

@xiangning-chen
178m parameters, convolutional.

feffy380 · 2023-04-07T22:16:36Z

Are you comparing this to AdamW8bit by chance?

VCasecnikovs · 2023-05-27T15:36:00Z

No, to AdamW

konev-artem · 2023-06-01T18:35:37Z

In my setting, Lion takes less memory than AdamW (9.9 Gb vs 10.1Gb) but Lion is slower in terms of steps/sec. Has anyone noticed the same? I compare Lion with triton vs fused AdamW.

nicosouth · 2024-05-14T06:44:57Z

do you solve the problem? i have the same problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same amount of VRAM is taken as in AdamW #22

Same amount of VRAM is taken as in AdamW #22

VCasecnikovs commented Apr 3, 2023

xiangning-chen commented Apr 6, 2023

VCasecnikovs commented Apr 7, 2023

feffy380 commented Apr 7, 2023

VCasecnikovs commented May 27, 2023

konev-artem commented Jun 1, 2023

nicosouth commented May 14, 2024

Same amount of VRAM is taken as in AdamW #22

Same amount of VRAM is taken as in AdamW #22

Comments

VCasecnikovs commented Apr 3, 2023

xiangning-chen commented Apr 6, 2023

VCasecnikovs commented Apr 7, 2023

feffy380 commented Apr 7, 2023

VCasecnikovs commented May 27, 2023

konev-artem commented Jun 1, 2023

nicosouth commented May 14, 2024