GaLore implementation assigns same optimizer to galore_params and non_galore_params #29841

geronimi73 · 2024-03-24T17:33:59Z

System Info

Hey everyone,

I noticed something in the current implementation of GaLore (layerwise): the same optimizer is hooked onto galore_params and non_galore_params.

Is this on purpose?

transformers/src/transformers/trainer.py

Lines 1296 to 1301 in 76a33a1

    
           for param in non_galore_params: 
        
               param_groups = [{"params": [param]}] 
        
               optimizer_dict[param] = optimizer_cls(param_groups, **optimizer_kwargs) 
        
           for param in galore_params: 
        
               param_groups = [{"params": [param], **galore_optim_kwargs}] 
        
               optimizer_dict[param] = optimizer_cls(param_groups, **optimizer_kwargs)

The official implementation hooks GaLoreAdamW8bit to galore_params and bnb.optim.Adam8bit to all others:

https://github.com/jiaweizzhao/GaLore/blob/864eeb361dc96c1932c3fa429ad0119aaed8e617/torchrun_main.py#L339-L342

Who can help?

@younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

any script that uses galore_*_layerwise

Expected behavior

not sure

The text was updated successfully, but these errors were encountered:

younesbelkada · 2024-04-05T15:46:15Z

Hi @geronimi73
Thanks for the issue - yes this is expected as Galore optimizers under the hood will fall back to the native optimizer if you don't pass any galore kwargs

geronimi73 · 2024-04-05T19:06:35Z

yes, @jiaweizzhao told me the same that's why I closed the issue. would have noticed this myself if I looked at the code more carefully. thank you for your answer

younesbelkada · 2024-04-08T09:27:58Z

Thanks @geronimi73 !

amyeroberts added the optimization label Mar 24, 2024

geronimi73 closed this as completed Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GaLore implementation assigns same optimizer to galore_params and non_galore_params #29841

GaLore implementation assigns same optimizer to galore_params and non_galore_params #29841

geronimi73 commented Mar 24, 2024

younesbelkada commented Apr 5, 2024

geronimi73 commented Apr 5, 2024

younesbelkada commented Apr 8, 2024

GaLore implementation assigns same optimizer to galore_params and non_galore_params #29841

GaLore implementation assigns same optimizer to galore_params and non_galore_params #29841

Comments

geronimi73 commented Mar 24, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

younesbelkada commented Apr 5, 2024

geronimi73 commented Apr 5, 2024

younesbelkada commented Apr 8, 2024