Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AB matrix initialization in layers.py does not conform to the description of the paper #98

Open
jinxin-zhu opened this issue Jul 10, 2023 · 4 comments

Comments

@jinxin-zhu
Copy link

jinxin-zhu commented Jul 10, 2023

"We use a random Gaussian initialization for A and zero for B,” in paper but:
`
def reset_parameters(self):

    nn.Embedding.reset_parameters(self)

    if hasattr(self, 'lora_A'):

        # initialize A the same way as the default for nn.Linear and B to zero

        nn.init.zeros_(self.lora_A)

        nn.init.normal_(self.lora_B)

`
in layers.py

@edwardjhu
Copy link
Collaborator

edwardjhu commented Jul 10, 2023 via email

@aliasvishnu
Copy link

@edwardjhu can you please tell us why at least one of A or B has to be non-zero?

@haiduo
Copy link

haiduo commented Jan 5, 2024

@edwardjhu can you please tell us why at least one of A or B has to be non-zero?

May be the paper say that? Ensure that at the beginning of the training phase, the matrix product of LoRA's A and B is 0. Maybe it’s to start stable training.

@edwardjhu
Copy link
Collaborator

We want at least one of the matrix to be zero so LoRA in the first forward pass is a no-op, which indeed stabilizes training. Say that we are generating some content with a LM. If both matrices are non-zero, the random LoRA init, if large enough, might move the entire model so far from the original that we start generating garbage, which is bad for training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants