AB matrix initialization in layers.py does not conform to the description of the paper #98

jinxin-zhu · 2023-07-10T03:22:51Z

"We use a random Gaussian initialization for A and zero for B,” in paper but:
`
def reset_parameters(self):

    nn.Embedding.reset_parameters(self)

    if hasattr(self, 'lora_A'):

        # initialize A the same way as the default for nn.Linear and B to zero

        nn.init.zeros_(self.lora_A)

        nn.init.normal_(self.lora_B)

`
in layers.py

The text was updated successfully, but these errors were encountered:

edwardjhu · 2023-07-10T12:03:09Z

Hi Jinxin, We didn’t apply LoRA to embedding layers in the paper. In any case, this shouldn’t make a meaningful difference whether A or B is initialized to zero as long as the other one is not zero. Let me know if you see a substantial difference tho!

…

On Jul 9, 2023, at 11:23 PM, jinxin-zhu ***@***.***> wrote: "We use a random Gaussian initialization for A and zero for B,” in paper but: def reset_parameters(self): nn.Embedding.reset_parameters(self) if hasattr(self, 'lora_A'): # initialize A the same way as the default for nn.Linear and B to zero nn.init.zeros_(self.lora_A) nn.init.normal_(self.lora_B) in layers.py — Reply to this email directly, view it on GitHub <#98>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJ5U6MHWXABDN3DVOIRDEVDXPNYRRANCNFSM6AAAAAA2D5VCCU>. You are receiving this because you are subscribed to this thread.

aliasvishnu · 2023-07-21T12:41:29Z

@edwardjhu can you please tell us why at least one of A or B has to be non-zero?

haiduo · 2024-01-05T12:55:59Z

@edwardjhu can you please tell us why at least one of A or B has to be non-zero?

May be the paper say that? Ensure that at the beginning of the training phase, the matrix product of LoRA's A and B is 0. Maybe it’s to start stable training.

edwardjhu · 2024-01-05T17:24:53Z

We want at least one of the matrix to be zero so LoRA in the first forward pass is a no-op, which indeed stabilizes training. Say that we are generating some content with a LM. If both matrices are non-zero, the random LoRA init, if large enough, might move the entire model so far from the original that we start generating garbage, which is bad for training.

nicolefinnie mentioned this issue Aug 17, 2023

Embedding reset_parameters() implement wrong #114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AB matrix initialization in layers.py does not conform to the description of the paper #98

AB matrix initialization in layers.py does not conform to the description of the paper #98

jinxin-zhu commented Jul 10, 2023 •

edited

Loading

edwardjhu commented Jul 10, 2023 via email

aliasvishnu commented Jul 21, 2023

haiduo commented Jan 5, 2024 •

edited

Loading

edwardjhu commented Jan 5, 2024

AB matrix initialization in layers.py does not conform to the description of the paper #98

AB matrix initialization in layers.py does not conform to the description of the paper #98

Comments

jinxin-zhu commented Jul 10, 2023 • edited Loading

edwardjhu commented Jul 10, 2023 via email

aliasvishnu commented Jul 21, 2023

haiduo commented Jan 5, 2024 • edited Loading

edwardjhu commented Jan 5, 2024

jinxin-zhu commented Jul 10, 2023 •

edited

Loading

haiduo commented Jan 5, 2024 •

edited

Loading