-
Notifications
You must be signed in to change notification settings - Fork 292
Description
I encountered the same problem as #41, which states "LoRAs have no effects".
Background
I'm using SSDT to train LoRAs, the LoRA layers implementations are from loralib, which have weight scaling = (rank / alpha).
So if I want to use a alpha = 1, rank-16 LoRA produced by SSDT in AddNet, scale should be set to 1/16.
Some users found this additional scaling not convenient, I added an unscale weight option to scale weight by (alpha / rank) when converting SSDT checkpoint to AddNet format.
Investigation
I looked the state dict after unscaling.
All of the tensors have pretty small values, which can cause low numerical stability. ~20% of them have zero values, in this 20%, 15% in text encoder, 85% in UNet.
Experiment
In one LoRA (rank=16, alpha=1) I trained,
- All unscaled LoRAs have basically no effect.
- Not unscaled, scale = 0.0625 (1 / 16) worked as normal.
Conclusion and Solution
I suspect those zeros are products of underflow, which probably is the cause of #41.
Those underflows happens more often if rank is high.
At training time, add option: "alpha" to scale LoRA like loralib. Save alpha to LoRA metadata.
At inference time, add option: "scale weight" to scale LoRA weight by rank / alpha.
Backward Compatibility
Unfortunately, as you can imagine, almost all existing LoRAs have already underflowed.
If "scale weight" is enabled, for still using old LoRAs, if a LoRA have no alpha in metadata, do not scale.
Additional: NaNs
After AUTOMATIC1111/stable-diffusion-webui@9991967, when generating images, those underflowed LoRAs sometimes produces NaN errors.
Some users reported loss=NaN when using https://github.com/Linaqruf/kohya-trainer/ and https://github.com/Mikubill/naifu-diffusion/, especially at high rank. I suspect that's related to this issue.


