-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AB matrix initialization in layers.py does not conform to the description of the paper #98
Comments
Hi Jinxin,
We didn’t apply LoRA to embedding layers in the paper. In any case, this shouldn’t make a meaningful difference whether A or B is initialized to zero as long as the other one is not zero. Let me know if you see a substantial difference tho!
… On Jul 9, 2023, at 11:23 PM, jinxin-zhu ***@***.***> wrote:
"We use a random Gaussian initialization for A and zero for B,” in paper but:
def reset_parameters(self): nn.Embedding.reset_parameters(self) if hasattr(self, 'lora_A'): # initialize A the same way as the default for nn.Linear and B to zero nn.init.zeros_(self.lora_A) nn.init.normal_(self.lora_B)
in layers.py
—
Reply to this email directly, view it on GitHub <#98>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJ5U6MHWXABDN3DVOIRDEVDXPNYRRANCNFSM6AAAAAA2D5VCCU>.
You are receiving this because you are subscribed to this thread.
|
@edwardjhu can you please tell us why at least one of A or B has to be non-zero? |
May be the paper say that? Ensure that at the beginning of the training phase, the matrix product of LoRA's A and B is 0. Maybe it’s to start stable training. |
We want at least one of the matrix to be zero so LoRA in the first forward pass is a no-op, which indeed stabilizes training. Say that we are generating some content with a LM. If both matrices are non-zero, the random LoRA init, if large enough, might move the entire model so far from the original that we start generating garbage, which is bad for training. |
"We use a random Gaussian initialization for A and zero for B,” in paper but:
`
def reset_parameters(self):
`
in layers.py
The text was updated successfully, but these errors were encountered: