-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Make special LoRA inits DeepSpeed compatible #1887
FIX Make special LoRA inits DeepSpeed compatible #1887
Conversation
Resolves huggingface/accelerate#2886 Possibly resolves huggingface#896 (comment) Some LoRA init methods need to access the base layer weight. Getting this access can fail or stall in distributed settings. For DeepSpeed, the weight is now gathered before trying to access it. Note: Without DeepSpeed, this is a no-op and should thus not have any disadvantage. We don't have DS in our CI, so this is not tested. I also made some small changes to OLoRA init to use self.get_base_layer() instead of self.base_layer.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ping @tokenizer-decode as there are some slight changes to OLoRA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Maybe we can leave a comment on why we added that for these inits since we don't do it for other inits
Could you explain to me why |
The reason why we have
This change is not related to the DeepSpeed issue, that's what I wanted to convey when I wrote "I also made some small changes to OLoRA". |
Oh now I see. Thanks for notifying. Looks good. |
Resolves huggingface/accelerate#2886
Possibly resolves
#896 (comment)
Some LoRA init methods need to access the base layer weight. Getting this access can fail or stall in distributed settings. For DeepSpeed, the weight is now gathered before trying to access it.
Note: Without DeepSpeed, this is a no-op and should thus not have any disadvantage. We don't have DS in our CI, so this is not tested.
I also made some small changes to OLoRA init to use
self.get_base_layer()
instead ofself.base_layer
.