Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Make special LoRA inits DeepSpeed compatible #1887

Conversation

BenjaminBossan
Copy link
Member

Resolves huggingface/accelerate#2886

Possibly resolves
#896 (comment)

Some LoRA init methods need to access the base layer weight. Getting this access can fail or stall in distributed settings. For DeepSpeed, the weight is now gathered before trying to access it.

Note: Without DeepSpeed, this is a no-op and should thus not have any disadvantage. We don't have DS in our CI, so this is not tested.

I also made some small changes to OLoRA init to use self.get_base_layer() instead of self.base_layer.

Resolves huggingface/accelerate#2886

Possibly resolves
huggingface#896 (comment)

Some LoRA init methods need to access the base layer weight. Getting
this access can fail or stall in distributed settings. For DeepSpeed,
the weight is now gathered before trying to access it.

Note: Without DeepSpeed, this is a no-op and should thus not have any
disadvantage. We don't have DS in our CI, so this is not tested.

I also made some small changes to OLoRA init to use
self.get_base_layer() instead of self.base_layer.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member Author

ping @tokenizer-decode as there are some slight changes to OLoRA.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Maybe we can leave a comment on why we added that for these inits since we don't do it for other inits

@tokenizer-decode
Copy link
Contributor

Could you explain to me why self.get_base_layer() rather than self.base_layer? No mutation to base weights occured at that part so I'd assume both should return the same? How did this solve the problem? I am surprised.

@BenjaminBossan
Copy link
Member Author

Could you explain to me why self.get_base_layer() rather than self.base_layer? No mutation to base weights occured at that part so I'd assume both should return the same?

The reason why we have get_base_layer() is that we support nesting LoRA adapters and other adapter types. Calling the method ensures we really get back the "original" base layer. When there is no nesting, both approaches are indeed the same.

How did this solve the problem? I am surprised.

This change is not related to the DeepSpeed issue, that's what I wanted to convey when I wrote "I also made some small changes to OLoRA".

@tokenizer-decode
Copy link
Contributor

Oh now I see. Thanks for notifying. Looks good.

@BenjaminBossan BenjaminBossan merged commit 184beaf into huggingface:main Jun 26, 2024
14 checks passed
@BenjaminBossan BenjaminBossan deleted the fix-lora-special-init-deepspeed-compatible branch June 26, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't apply LoRA's PiSSA weight init when using DeepSpeed ZeRO3 + LoRA to finetune!
4 participants