FIX Make special LoRA inits DeepSpeed compatible #1887

BenjaminBossan · 2024-06-25T10:09:56Z

Some LoRA init methods need to access the base layer weight. Getting this access can fail or stall in distributed settings. For DeepSpeed, the weight is now gathered before trying to access it.

Note: Without DeepSpeed, this is a no-op and should thus not have any disadvantage. We don't have DS in our CI, so this is not tested.

I also made some small changes to OLoRA init to use self.get_base_layer() instead of self.base_layer.

Resolves huggingface/accelerate#2886 Possibly resolves huggingface#896 (comment) Some LoRA init methods need to access the base layer weight. Getting this access can fail or stall in distributed settings. For DeepSpeed, the weight is now gathered before trying to access it. Note: Without DeepSpeed, this is a no-op and should thus not have any disadvantage. We don't have DS in our CI, so this is not tested. I also made some small changes to OLoRA init to use self.get_base_layer() instead of self.base_layer.

HuggingFaceDocBuilderDev · 2024-06-25T10:13:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2024-06-25T11:34:56Z

ping @tokenizer-decode as there are some slight changes to OLoRA.

SunMarc

LGTM ! Maybe we can leave a comment on why we added that for these inits since we don't do it for other inits

tokenizer-decode · 2024-06-25T17:21:05Z

Could you explain to me why self.get_base_layer() rather than self.base_layer? No mutation to base weights occured at that part so I'd assume both should return the same? How did this solve the problem? I am surprised.

BenjaminBossan · 2024-06-26T08:16:51Z

Could you explain to me why self.get_base_layer() rather than self.base_layer? No mutation to base weights occured at that part so I'd assume both should return the same?

The reason why we have get_base_layer() is that we support nesting LoRA adapters and other adapter types. Calling the method ensures we really get back the "original" base layer. When there is no nesting, both approaches are indeed the same.

How did this solve the problem? I am surprised.

This change is not related to the DeepSpeed issue, that's what I wanted to convey when I wrote "I also made some small changes to OLoRA".

tokenizer-decode · 2024-06-26T09:15:16Z

Oh now I see. Thanks for notifying. Looks good.

BenjaminBossan mentioned this pull request Jun 25, 2024

Can't apply LoRA's PiSSA weight init when using DeepSpeed ZeRO3 + LoRA to finetune! huggingface/accelerate#2886

Closed

4 tasks

BenjaminBossan requested a review from SunMarc June 25, 2024 11:34

SunMarc approved these changes Jun 25, 2024

View reviewed changes

Reviewer feedback: add clarifying comment

eaff778

BenjaminBossan merged commit 184beaf into huggingface:main Jun 26, 2024
14 checks passed

BenjaminBossan deleted the fix-lora-special-init-deepspeed-compatible branch June 26, 2024 09:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX Make special LoRA inits DeepSpeed compatible #1887

FIX Make special LoRA inits DeepSpeed compatible #1887

BenjaminBossan commented Jun 25, 2024

HuggingFaceDocBuilderDev commented Jun 25, 2024

BenjaminBossan commented Jun 25, 2024

SunMarc left a comment

tokenizer-decode commented Jun 25, 2024

BenjaminBossan commented Jun 26, 2024

tokenizer-decode commented Jun 26, 2024

FIX Make special LoRA inits DeepSpeed compatible #1887

FIX Make special LoRA inits DeepSpeed compatible #1887

Conversation

BenjaminBossan commented Jun 25, 2024

HuggingFaceDocBuilderDev commented Jun 25, 2024

BenjaminBossan commented Jun 25, 2024

SunMarc left a comment

Choose a reason for hiding this comment

tokenizer-decode commented Jun 25, 2024

BenjaminBossan commented Jun 26, 2024

tokenizer-decode commented Jun 26, 2024