Adding weighted adapter as LoRAs combination gives unexpected result with StableDiffusion compared to webui #643

kovalexal · 2023-06-27T17:26:16Z

System Info

python3.8, diffusers, transformers, accelerate, peft versions from main branches of each library (I used slightly modified version of your peft-gpu Dockerfile)

Who can help?

@pacman100 @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

Hi!

I've discovered some unexpected results when combining multiple LoRA adapters together for StableDiffusion with PEFT, compared to webui results.

web UI setting:

The SD v1.5 checkpoint I use is deliberate_v2 (safetensors version)
LoRAs I've tried to mix are: Detail Tweaker and 3D rendering style - they have the same rank and alpha

peft setting:

For SD v1.5 diffusers checkpoint I've used official checkpoint conversion script to convert safetensors to diffusers format
For LoRAs I've used peft checkpoint conversion script to convert Civitai LoRAs to peft format

Sanity check

Let's check that both checkpoints give the same results without using LoRAs.

candid RAW portrait photo of a woman (Crystal Simmerman:1.0) with (dark hair:1.0) and a (purple colored suit:1.0) on a dark street with shopping windows (at night:1.2), bokeh, Ilford Delta 3200 film, dof, high definition, detailed, intricate, flashlight

Negative prompt: bad-hands-5, asian, cropped, lowres, poorly drawn face, out of frame, blurry, blurred, text, watermark, disfigured, closed eyes, ugly, cartoon, render, 3d, plastic, 3d (artwork), rendered, comic

Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 1428928479, Size: 512x512, Model hash: 9aba26abdf, Model: deliberate_v2

webui output:

diffusers + peft sample code:

torch.manual_seed(1428928479)
image = pipe(
    prompt = "candid RAW portrait photo of a woman (Crystal Simmerman:1.0) with (dark hair:1.0) and a (purple colored suit:1.0) on a dark street with shopping windows (at night:1.2), bokeh, Ilford Delta 3200 film, dof, high definition, detailed, intricate, flashlight",
    negative_prompt = "bad-hands-5, asian, cropped, lowres, poorly drawn face, out of frame, blurry, blurred, text, watermark, disfigured, closed eyes, ugly, cartoon, render, 3d, plastic, 3d (artwork), rendered, comic",
    num_inference_steps=20,
    guidance_scale=7,
).images[0]

diffusers + peft output:

The results are quite similar, so this test have passed.

Single LoRA

Let's check that both checkpoints give the same results when using single LoRAs (let's use Detail Tweaker aka add-detail).

<lora:add_detail:1> candid RAW portrait photo of a woman (Crystal Simmerman:1.0) with (dark hair:1.0) and a (purple colored suit:1.0) on a dark street with shopping windows (at night:1.2), bokeh, Ilford Delta 3200 film, dof, high definition, detailed, intricate, flashlight

Negative prompt: bad-hands-5, asian, cropped, lowres, poorly drawn face, out of frame, blurry, blurred, text, watermark, disfigured, closed eyes, ugly, cartoon, render, 3d, plastic, 3d (artwork), rendered, comic

Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 1428928479, Size: 512x512, Model hash: 9aba26abdf, Model: deliberate_v2

webui output:

diffusers + peft sample code:

lora_path = ...
lora_name = "add_detail"

pipe.unet = PeftModel.from_pretrained(
    pipe.unet,
    f"{lora_path}/unet",
    lora_name
)

pipe.text_encoder = PeftModel.from_pretrained(
    pipe.text_encoder,
    f"{lora_path}/text_encoder",
    lora_name
)

pipe.unet.set_adapter(lora_name)
pipe.text_encoder.set_adapter(lora_name)

torch.manual_seed(1428928479)
image = pipe(
    prompt = "candid RAW portrait photo of a woman (Crystal Simmerman:1.0) with (dark hair:1.0) and a (purple colored suit:1.0) on a dark street with shopping windows (at night:1.2), bokeh, Ilford Delta 3200 film, dof, high definition, detailed, intricate, flashlight",
    negative_prompt = "bad-hands-5, asian, cropped, lowres, poorly drawn face, out of frame, blurry, blurred, text, watermark, disfigured, closed eyes, ugly, cartoon, render, 3d, plastic, 3d (artwork), rendered, comic",
    num_inference_steps=20,
    guidance_scale=7,
).images[0]

diffusers + peft output:

The results are also quite similar, so this test have passed.

Mixture of two LoRAs

Let's check that both checkpoints give the same results when using mixture of multiple LoRAs (let's use Detail Tweaker aka add-detail and 3D rendering style aka 3DMM_V11, both with weights 1.0).

<lora:add_detail:1> <lora:3DMM_V11:1> candid RAW portrait photo of a woman (Crystal Simmerman:1.0) with (dark hair:1.0) and a (purple colored suit:1.0) on a dark street with shopping windows (at night:1.2), bokeh, Ilford Delta 3200 film, dof, high definition, detailed, intricate, flashlight

Negative prompt: bad-hands-5, asian, cropped, lowres, poorly drawn face, out of frame, blurry, blurred, text, watermark, disfigured, closed eyes, ugly, cartoon, render, 3d, plastic, 3d (artwork), rendered, comic

Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 1428928479, Size: 512x512, Model hash: 9aba26abdf, Model: deliberate_v2

webui output:

diffusers + peft sample code:

# Add first LoRA
lora_path = ...
lora_name = "add_detail"
pipe.unet = PeftModel.from_pretrained(
    pipe.unet,
    f"{lora_path}/unet",
    lora_name
)
pipe.text_encoder = PeftModel.from_pretrained(
    pipe.text_encoder,
    f"{lora_path}/text_encoder",
    lora_name
)

# Add second LoRA
lora_path = ...
lora_name = "3DMM_V11"
pipe.unet.load_adapter(
    f"{lora_path}/unet",
    adapter_name=lora_name
)
pipe.text_encoder.load_adapter(
    f"{lora_path}/text_encoder",
    adapter_name=lora_name
)

# Mix two LoRAs together
pipe = create_weighted_lora_adapter(pipe, ["add_detail", "3DMM_V11"], [1.0, 1.0], "combined")
pipe.unet.set_adapter("combined")
pipe.text_encoder.set_adapter("combined")

torch.manual_seed(1428928479)
image = pipe(
    prompt = "candid RAW portrait photo of a woman (Crystal Simmerman:1.0) with (dark hair:1.0) and a (purple colored suit:1.0) on a dark street with shopping windows (at night:1.2), bokeh, Ilford Delta 3200 film, dof, high definition, detailed, intricate, flashlight",
    negative_prompt = "bad-hands-5, asian, cropped, lowres, poorly drawn face, out of frame, blurry, blurred, text, watermark, disfigured, closed eyes, ugly, cartoon, render, 3d, plastic, 3d (artwork), rendered, comic",
    num_inference_steps=20,
    guidance_scale=7,
).images[0]

diffusers + peft output:

We can see that the results differ dramatically.

Mixture of two LoRAs - what is going on?

Let's try this approach in peft:

Load first LoRA, merge it with base model
Load second LoRA, apply it to base model, investigate results

diffusers + peft sample code:

# Add first lora
lora_path = ...
lora_name = "add_detail"
pipe.unet = PeftModel.from_pretrained(
    pipe.unet,
    f"{lora_path}/unet",
    lora_name
)
pipe.text_encoder = PeftModel.from_pretrained(
    pipe.text_encoder,
    f"{lora_path}/text_encoder",
    lora_name
)

# Merge first LoRA to model weights
pipe.unet = pipe.unet.merge_and_unload()
pipe.text_encoder = pipe.text_encoder.merge_and_unload()

# Load second LoRA
lora_path = ...
lora_name = "3DMM_V11"
pipe.unet = PeftModel.from_pretrained(
    pipe.unet,
    f"{lora_path}/unet",
    lora_name
)
pipe.text_encoder = PeftModel.from_pretrained(
    pipe.text_encoder,
    f"{lora_path}/text_encoder",
    lora_name
)

torch.manual_seed(1428928479)
image = pipe(
    prompt = "candid RAW portrait photo of a woman (Crystal Simmerman:1.0) with (dark hair:1.0) and a (purple colored suit:1.0) on a dark street with shopping windows (at night:1.2), bokeh, Ilford Delta 3200 film, dof, high definition, detailed, intricate, flashlight",
    negative_prompt = "bad-hands-5, asian, cropped, lowres, poorly drawn face, out of frame, blurry, blurred, text, watermark, disfigured, closed eyes, ugly, cartoon, render, 3d, plastic, 3d (artwork), rendered, comic",
    num_inference_steps=20,
    guidance_scale=7,
).images[0]

diffusers + peft output:

We can see that the results are quite similar to what we are getting in webui. So, we can definitely say that there is a problem in creating a weighted adapter for two LoRAs.

Mixture of two LoRAs - what is going on? - diving deeper

So, from my perspective, I see that there is a possible error inside method LoraModel.add_weighted_adapter.

peft/src/peft/tuners/lora.py

Lines 467 to 472 in f5352f0

    
               continue 
        
           target.lora_A[adapter_name].weight.data += ( 
        
               target.lora_A[adapter].weight.data * weight * target.scaling[adapter] 
        
           ) 
        
           target.lora_B[adapter_name].weight.data += target.lora_B[adapter].weight.data * weight

A LoRA is an addition to the base weights:

$h = W_0 x + B A x $

So, from my perspective, a mixture of multiple LoRAs should be calculated like this:

$h = W_0 x + \alpha_1 B_1 A_1 x + \alpha_2 B_2 A_2 x + \ldots$

But currently a mixture for the same rank LoRAs is calculated like this:

$h = W_0 x + (B_1 + B_2 + \ldots) (\alpha_1 A_1 + \alpha_2 A_2 + \ldots) x$

Mixture of multiple LoRAs - possible solutions:

I see the following possible solutions to overcome this issue:

Perform concatenation instead of a sum (use different dims for $B$ and $A$):
- pros: easy to implement, we can mix LoRAs with different ranks;
- cons: output LoRA rank can increase significantly if mixing a lot of LoRAs, at the end we can get a LoRA with very big rank, which will lead to serious performance drawback.
Perform some sort of decomposition (like SVD) of just LoRAs mixture $\alpha_1 B_1 A_1 x + \alpha_2 B_2 A_2 x + \ldots$ and drop least important components:
- pros: we can get any output rank that we want, we can mix LoRAs with different ranks;
- cons: there will definitely be a model accuracy loss if rank is two small, also an interface to add_weighted_adapter changes.
Replace base weights with merged LoRAs, store a copy of base weights for unmerge/unmix:
- pros: the most reliable solution, we would be able to merge/unmerge/mix and unmix everything we want, as far as I understand, webui does this;
- cons: need to store a copy of base weights, also a lot of code should be rewritten, will break current interfaces.

@pacman100 I am not sure if this is applicable only to my case or not (maybe it works differently for text models), but I would be happy to help your team with fixing this issue.

Expected behavior

From my perspective, merge of multiple LoRAs in peft should work just like merge in webui.

The text was updated successfully, but these errors were encountered:

pacman100 · 2023-06-28T07:38:33Z

Hello @kovalexal, very detailed issue description, insightful and helpful, Thank you!

Yes, I know the weighted adapter method isn't mathematically equivalent of merging loras one after another. I have mentioned the consecutive merging here in #280 (comment)

The current implementation is inspired by https://github.com/cloneofsimo/lora/tree/master which seems to work in practice:

I agree that it is incorrect mathematically but an easier way of mixing LoRAs.

I believe point 2 would fit properly without much changes:

Perform some sort of decomposition (like SVD) of just LoRAs mixture
and drop least important components:

pros: we can get any output rank that we want, we can mix LoRAs with different ranks;
cons: there will definitely be a model accuracy loss if rank is two small, also an interface to add_weighted_adapter changes.

kovalexal · 2023-06-29T17:40:39Z

Hello, @pacman100, thanks for clarification!

I'll dig into it if I have capacity one day.

pacman100 · 2023-07-15T08:53:04Z

Hello, the merged PR #695 should address this by using point 2 you suggested of using SVD decomposition. The new rank is the max of the ranks of the LoRAs being combined.

pacman100 · 2023-07-15T08:56:09Z

Also, I have been working on adding PEFT support in Khoya-ss for training and webui extensions for inference.

PEFT training of DreamBooth: pacman100/peft-dreambooth (Branch: peft-dreambooth/ at smangrul/add-peft-support)

Extension to use PEFT in webui: pacman100/peft-sd-webui-additional-networks (Branch: peft-sd-webui-additional-networks/ at smangrul/add-peft-support)

Sample output trying it out:

kovalexal · 2023-07-18T20:22:17Z

@pacman100 Wow, great, thank you, very useful addition!

I've also worked on my own version of SVD decomposition for LoRAs weights, I assumed that it can be useful for somebody to also specify an output rank for combined adapter (so somebody can create adapter with similar characteristics but with loss of precision). Would you mind if I create a PR for this?

github-actions · 2023-08-12T15:03:30Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

kovalexal · 2023-08-16T15:27:31Z

This issue was fully addressed in #817, now we can get identical results to what we are getting in webui!

kovalexal changed the title ~~Adding weighted adapter as LoRAs combination gives unexpected result on StableDiffusion compared to webui~~ Adding weighted adapter as LoRAs combination gives unexpected result with StableDiffusion compared to webui Jun 27, 2023

kovalexal mentioned this issue Aug 10, 2023

Added additional parameters to mixing multiple LoRAs through SVD, added ability to mix LoRAs through concatenation #817

Merged

kovalexal closed this as completed Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding weighted adapter as LoRAs combination gives unexpected result with StableDiffusion compared to webui #643

Adding weighted adapter as LoRAs combination gives unexpected result with StableDiffusion compared to webui #643

kovalexal commented Jun 27, 2023 •

edited

Loading

pacman100 commented Jun 28, 2023

kovalexal commented Jun 29, 2023

pacman100 commented Jul 15, 2023

pacman100 commented Jul 15, 2023

kovalexal commented Jul 18, 2023

github-actions bot commented Aug 12, 2023

kovalexal commented Aug 16, 2023

Adding weighted adapter as LoRAs combination gives unexpected result with StableDiffusion compared to webui #643

Adding weighted adapter as LoRAs combination gives unexpected result with StableDiffusion compared to webui #643

Comments

kovalexal commented Jun 27, 2023 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Sanity check

Single LoRA

Mixture of two LoRAs

Mixture of two LoRAs - what is going on?

Mixture of two LoRAs - what is going on? - diving deeper

Mixture of multiple LoRAs - possible solutions:

Expected behavior

pacman100 commented Jun 28, 2023

kovalexal commented Jun 29, 2023

pacman100 commented Jul 15, 2023

pacman100 commented Jul 15, 2023

kovalexal commented Jul 18, 2023

github-actions bot commented Aug 12, 2023

kovalexal commented Aug 16, 2023

kovalexal commented Jun 27, 2023 •

edited

Loading