Could you please implement a Adafactor optimizer? :) #1256

christophschuhmann · 2019-09-12T14:41:24Z

🚀 Feature

Could you please implement a Adafactor optimizer? :)

Motivation

In contrast to Adam it requires much less GPU memory.
I tried to use the FairSeq implementation for the pytorch-transformers, but I'm no expert and I couldn't get it done.

Could you please do that? :)

Additional context

thomwolf · 2019-09-18T07:33:00Z

What didn't work for you with the fairseq implementation?

It seems pretty self-contained: https://github.com/pytorch/fairseq/blob/master/fairseq/optim/adafactor.py#L65-L213

stale · 2019-11-17T08:03:11Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

moscow25 · 2020-08-10T22:35:36Z

FYI @sshleifer -- I was wrong -- able to train T5-large even batch==1 with FP32, no gradient check-pointing and ADAM. Given that T5 team strongly recommends AdaFactor -- giving it a try, other pieces perhaps being more difficult...

stale bot added the wontfix label Nov 17, 2019

stale bot closed this as completed Nov 24, 2019

moscow25 mentioned this issue Aug 25, 2020

Add AdaFactor optimizer from fairseq #6722

Merged

sshleifer reopened this Aug 25, 2020

stale bot removed the wontfix label Aug 25, 2020

LysandreJik closed this as completed in #6722 Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you please implement a Adafactor optimizer? :) #1256

Could you please implement a Adafactor optimizer? :) #1256

christophschuhmann commented Sep 12, 2019

thomwolf commented Sep 18, 2019

stale bot commented Nov 17, 2019

moscow25 commented Aug 10, 2020

Could you please implement a Adafactor optimizer? :) #1256

Could you please implement a Adafactor optimizer? :) #1256

Comments

christophschuhmann commented Sep 12, 2019

🚀 Feature

Motivation

Additional context

thomwolf commented Sep 18, 2019

stale bot commented Nov 17, 2019

moscow25 commented Aug 10, 2020