Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Different initialization methods for LoRA #1189

Merged
merged 4 commits into from
Nov 29, 2023

Conversation

BenjaminBossan
Copy link
Member

This PR adds the possibility to use different initialization methods for LoRA, as is a requirement for a completely backwards compatible adoption of PEFT in diffusers.

Description

The default is still the same as always, namely the one from the reference implementation by Microsoft. On top of that, it is now possible to pass init_lora_weights='gaussian' to initialize the LoRA weights in the same way as is default for diffusers, namely with a normal distribution which is scaled by 1/r.

The init method currently applies to LoRA linear and conv layers, but not embedding layers, which are always initialized from a normal distribution (and are probably irrelevant for diffusers).

In the future, similar extensions could be added for other adapter methods.

Notes

For testing, a rather simple test is added which calculates the Kolmogorov-Smirnov distance between the weights of a LoRA layer and the expected distribution. If someone has a better idea for a test, please let me know.

This PR adds the possibility to use different initialization methods for
LoRA, as is a requirement for a completely backwards compatible adoption
of PEFT in diffusers.

Description

The default is still the same as always, namely the one from the
reference implementation by Microsoft. On top of that, it is now
possible to pass `init_lora_weights='gaussian'` to initialize the LoRA
weights in the same way as is default for diffusers, namely with a
normal distribution which is scaled by 1/r.

The init method currently applies to LoRA linear and conv layers, but
not embedding layers, which are always initialized from a normal
distribution (and are probably irrelevant for diffusers).

In the future, similar extensions could be added for other adapter
methods.

Notes

For testing, a rather simple test is added which calculates the
Kolmogorov-Smirnov distance between the weights of a LoRA layer and the
expected distribution. If someone has a better idea for a test, please
let me know.
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 27, 2023

The documentation is not available anymore as the PR was closed or merged.

@@ -18,7 +18,9 @@
extras["quality"] = ["black ~= 22.0", "ruff>=0.0.241", "urllib3<=2.0.0"]
extras["docs_specific"] = ["hf-doc-builder"]
extras["dev"] = extras["quality"] + extras["docs_specific"]
extras["test"] = extras["dev"] + ["pytest", "pytest-cov", "pytest-xdist", "parameterized", "datasets", "diffusers<0.21.0"]
extras["test"] = extras["dev"] + [
"pytest", "pytest-cov", "pytest-xdist", "parameterized", "datasets", "diffusers<0.21.0", "scipy"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why's the version restriction needed on the diffusers side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was pinned in #936 and can probably be unpinned now.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Particularly, the tests are very thorough!

How should this be reflected in huggingface/diffusers#5419?

@BenjaminBossan
Copy link
Member Author

How should this be reflected in huggingface/diffusers#5419?

Hmm, not sure about that PR specifically. In general, for diffusers to use this feature, every time that a LoraConfig is being created, init_lora_weights="gaussian" should be passed as an additional argument. On top

  1. diffusers needs to depend on a PEFT version that contains that feature, or
  2. add the argument in a backwards compatible way, as older PEFT versions would error out when encountering this unknown argument

@sayakpaul
Copy link
Member

Hmm the changes sound good to me. @patrickvonplaten WDYT?

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @BenjaminBossan for adding support for different weight initialization methods for LoRA. The tests are really good, never thought that I would see statistical significance analysis in tests. LGTM! 🤩

@BenjaminBossan
Copy link
Member Author

Before merging, should we wait until we have finalized the change required on diffusers side, in case we find that this PR doesn't quite cut it?

@pacman100
Copy link
Contributor

Before merging, should we wait until we have finalized the change required on diffusers side, in case we find that this PR doesn't quite cut it?

Makes sense

@sayakpaul
Copy link
Member

sayakpaul commented Nov 28, 2023

Before merging, should we wait until we have finalized the change required on diffusers side, in case we find that this PR doesn't quite cut it?

Do you mean testing it on huggingface/diffusers#5388? I can do that tomorrow.

@sayakpaul
Copy link
Member

@BenjaminBossan
Copy link
Member Author

Great, thanks for checking it.

@BenjaminBossan BenjaminBossan merged commit f0fb951 into main Nov 29, 2023
14 checks passed
@BenjaminBossan BenjaminBossan deleted the enh-init-methods-for-lora branch November 29, 2023 11:37
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Nov 30, 2023
This PR adds the possibility to use different initialization methods for
LoRA, as is a requirement for a completely backwards compatible adoption
of PEFT in diffusers.

The default is still the same as always, namely the one from the
reference implementation by Microsoft. On top of that, it is now
possible to pass `init_lora_weights='gaussian'` to initialize the LoRA
weights in the same way as is default for diffusers, namely with a
normal distribution which is scaled by 1/r.

The init method currently applies to LoRA linear and conv layers, but
not embedding layers, which are always initialized from a normal
distribution (and are probably irrelevant for diffusers).

In the future, similar extensions could be added for other adapter
methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants