Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error loading multiple LORAs on Transformers Adapter #4371

Closed
1 task done
Xeba111 opened this issue Oct 23, 2023 · 13 comments
Closed
1 task done

Error loading multiple LORAs on Transformers Adapter #4371

Xeba111 opened this issue Oct 23, 2023 · 13 comments
Labels
bug Something isn't working stale

Comments

@Xeba111
Copy link

Xeba111 commented Oct 23, 2023

Describe the bug

When trying to load multiple LORAs at the same time, an error occurs and none of the LORAs are applied.
This bug only allows for one LORA to be applied at the same time using the Transformers Adapter.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

  • LLAMA2 13B, loaded with Transformers, model loaded in 4-bits
  • 2 LORAs trained, exact same configuration in both of them
  • 2 LORAs selected to load to model
  • Error appears

Screenshot

Logs

2023-10-23 12:51:02 INFO:Applying the following LoRAs to llama2-13b: 13B-alpaca-v1, 13B-dolly-v1
Traceback (most recent call last):
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1550, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1199, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 519, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 512, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 495, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 649, in gen_wrapper
    yield from f(*args, **kwargs)
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/modules/ui_model_menu.py", line 222, in load_lora_wrapper
    add_lora_to_model(selected_loras)
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/modules/LoRA.py", line 27, in add_lora_to_model
    add_lora_transformers(lora_names)
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/modules/LoRA.py", line 180, in add_lora_transformers
    merge_loras()
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/modules/LoRA.py", line 16, in merge_loras
    shared.model.add_weighted_adapter(shared.lora_names, [1] * len(shared.lora_names), "__merged")
  File "/home/cslab02/Desktop/TesisSebasMena/text-generation-webui/installer_files/env/lib/python3.11/site-packages/peft/tuners/lora.py", line 660, in add_weighted_adapter
    new_rank = svd_rank or max(adapters_ranks)
                           ^^^^^^^^^^^^^^^^^^^
ValueError: max() arg is an empty sequence

System Info

- Ubuntu 20.04
- Last Ooba version
- CUDA 12.1
- Python 3.11
- NVIDIA A4000
@Xeba111 Xeba111 added the bug Something isn't working label Oct 23, 2023
@FartyPants
Copy link
Contributor

FartyPants commented Oct 23, 2023

I wrote in the (#3120) that the PR uses obsolete PEFT code but it has been merged anyway. So ¯\_(ツ)_/¯

There is far more problems with this approach than just make merging work. For example add_weighted_adapter will silently bail out next time you try to merge Loras into the same adapter name making user think you applied new ones, but in fact nothing was done etc... but this is not handled in the merge at all.

But I don't know how to convince people that this is wrong approach.

  1. the Lora drop-down in main should allow adding just a single Lora using from_pretrained - it is the safest and always working method. Reset model, then use from_pretrained. No weird secret merging into third adapter. This does NOT work like stable diffusion.
  2. a new tab (but I prefer extension) for Lora merging and switching needs to be done where user has full control of it otherwise it is useless. It needs to be transparent to user what is going on (like merging two loras actually physically creates a third lora) and it needs to allow changing weights because 99.99% merging two loras with weights of 1 will not produce the result you want. It needs to handle PEFT peculiarities too...

I know people want it to work like Stable diffusion, but text is not images. A funny Lora and a poetry Lora will not create funny poetry merge. So we should work with what it is, not what people imagine it is.

Edit: Retracting my statement (but leaving it unedited here). Since the Lora merging works fine on exllama2 the above 1 and 2 is not a solution as it applies to transformers only.

@Xeba111
Copy link
Author

Xeba111 commented Oct 23, 2023

@FartyPants it seems to me you understand a lot more about PEFT and LORAs than I do. I am currently working in my undergraduate thesis so I really need multiple LORAs to work simultaneously. Could you explain to me how to do it or refer me to some online resource, please?

@oobabooga
Copy link
Owner

I disagree with the idea that the whole web UI has to be rewritten to support this feature, and find it counterproductive to criticize everything without providing alternatives.

The current LoRA menu works fine for ExLlamav2 in the multi-lora case.

@FartyPants
Copy link
Contributor

FartyPants commented Oct 23, 2023

@FartyPants it seems to me you understand a lot more about PEFT and LORAs than I do. I am currently working in my undergraduate thesis so I really need multiple LORAs to work simultaneously. Could you explain to me how to do it or refer me to some online resource, please?

Generally PEFT supports loading multiple lora but the main purpose is in switching between them (which is instantaneous) because LLM do not work the way pictures do.
The vastly different weights of two loras made during training makes it almost impossible to arbitrary merge two without one being the absolute dominant so you need to adjust weights = lot of experimenting. I would often end up with one being 0.4 and the other 0.9 while I could swear they were trained the same way...

There are also few methods to merge Loras - linear (they need to be same rank) - but the result is basically a big pile of bull anyway because you are averaging them together. Average two sets of weights and tell me the result effect before you do - hahaha - it's just a voodoo
Other method is svd which is sort of same but can work on different ranks by re-ranking the smaller one. The last one is cat which adds the adapters together and arguably produces result that is close to what you imagine. The problem is two 128 rank adapters create a new 256 rank adapter etc... a few things like that and your GPU is gone as all source and result exist in VRAM.

I'd been wrestling with PEFT for long (and they have a number of issues)
So get my https://github.com/FartyPants/Playground extension - there is tab Lora-Rama
Now: load model - don't do anything, load first lora in Playground using Load Adapter, again don't do anything else, select second lora and then +Add Adapter. If everything went well you should see two adapters loaded at the bottom. Now unroll tools accordion and there will be sliders. Select the method to be cat (it's useless to try other methods). Probably move the sliders to be 0.9 and 0.9 as a first trial. Click Merge A+B, hopefully it will create third adapter - test it. It won't likely do what you want so now you need to mess with sliders to attenuate one so the other will peek though. Merge again, test. It will keep creating more and more adapters.
If you find a good mix - dump all to folder then go there and pick the one which was good and save it somewhere - this will be your merged lora you can use at anytime.

  • don't delete adapters - likely you won't be able to merge again
  • if things go wrong - reload the model in Model tab then add adapters again. Always Load, add
  • if things still go wrong - try to update PEFT - PEFT is a can of worms and they don't bother updating version so you don't really know what peft you have only that it is version 5

It is fascinating to see how two LORAs work together - but remember most of the time the LLM will try to be schizophrenic. If you load say philosophical and funny LORA, you will probably not get funny philosophy, rather the model will try to be funny at one moment and philosophical at the other. (you get some overlap, but not how you imagine)
The best usage I can think of is for example have Lora that writes long text (trained with a low rank) and Lora that writes specific text (high rank) and after merge you will achieve a Lora that writes specific long text. Good luck.

Edit: to be clear - this above ONLY apply to transformers.

@FartyPants
Copy link
Contributor

FartyPants commented Oct 23, 2023

I disagree with the idea that the whole web UI has to be rewritten to support this feature, and find it counterproductive to criticize everything without providing alternatives.

The current LoRA menu works fine for ExLlamav2 in the multi-lora case.

That's good to know. I know only very little about PEFT and nothing about exllama2. If it works with exllama as it should then there is no issue as it would be probably preferable to use exllama2 anyway for most people. So forget I said anything (people should not listen to me anyway - just ask my wife)

I only wanted to avoid problems down the line with people posting that merging doesn't work well or at all while I already knew month ago it doesn't. But that's only PEFT case - so I'll be quiet, because I really know nothing about exllama. Again sorry I interjected. Rock on. I'll keep my musing to myself next time.

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Oct 23, 2023

It works in exl2 because I think it uses it's own code. For GGUF multiple loras merge fine into the model, even into quants. For PEFT the same exact loras merge together but raise the model's perplexity scores and do not have the desired effect. The mind boggles.

I guess now that it works, I need to try running the same loras over exl2 and see what I get. It would be mighty funny if it worked well there too.

edit: whelp.. tried it and gets identical scores to the lora merged in playground. At least we know playground is working. Dunno why GGUF is the outlier, it definitely merged as the loras killed the previous repeat issue in the model.

@FartyPants
Copy link
Contributor

FartyPants commented Oct 23, 2023

Did you also try to apply Loras to exl2 in Playground? I'd be interested to see if it works or need some attention. I don't use anything else except transformers.
Answering my own question: no it doesn't. Fixing it now...

@Xeba111
Copy link
Author

Xeba111 commented Oct 23, 2023

@Ph0rk0z hey, your comment has made me curious about exllamav2. How do I train a LORA with exllamav2? Or do I just use a normal Transformers LORA with a exllamav2 model? Thanks in advance

@FartyPants
Copy link
Contributor

FartyPants commented Oct 24, 2023

In Training and Training PRO it definitely creates the adapter through PEFT - so it 100% expects the model to be transformers.
If exllama actually supports training (check with them), then it has to be handled their way, not PEFT way.

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Oct 24, 2023

Exllama v1/v2 can't train but it loads loras just fine. Only autogptq/alpaca_lora_4bit trains on GPTQ models. Otherwise you need the FP16 and can use --load-in-4bit

BTW, to merge lora in playground I used alpaca_lora_4bit to load the model and then performed the merges and saved. I then used the results on exllama and friends.

@FartyPants
Copy link
Contributor

FartyPants commented Oct 24, 2023

Yeah, compare it to PEFT, the LORA in exllama 2 is more simplistic (but after all PEFT is used for stable diffusion an many other projects as well) but also super clean and non-problematic. You load lora(s) sequentially and that's it. If you want to merge lora and set weights then spawn it to new lora, you still need to go through transformers and PEFT rn.

It's more reminiscent of how Automatic1111 uses Lora as a chain, instead of how PEFT uses loras as a merge that comes with a tons of problems attached to it.

I added exllama2 loading lora in playground (just one lora for now as a test)
If there is interest, I could later add setting different weights to it - it's not hard. Or just wait as someone may add it at the exllama 2 end and then I don't have to do anything, hehehe.

Honestly, the exllama2 LORA way is WAY better and much cleaner code to build upon than PEFT, even if it is missing stuff. Far less over-engineered than PEFT and very clean, not like PEFT hairball. So I hope they continue. If they add training, I'd abandon PEFT in a heartbeat.

@github-actions github-actions bot added the stale label Dec 6, 2023
Copy link

github-actions bot commented Dec 6, 2023

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@github-actions github-actions bot closed this as completed Dec 6, 2023
@linzm1007
Copy link

python server.py --auto-devices --gpu-memory 80 --listen --api --model-dir /data/mlops/modelDir/ --model Qwen1.5-7B-Chat --trust-remote-code --lora-dir /data/mlops/adapterDir --lora ouputDir_1 ouputDir_2

Applying the following LoRAs to Qwen1.5-7B-Chat: ouputDir_1, ouputDir_2
╭────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────╮
│ /app/server.py:243 in │
│ │
│ 242 if shared.args.lora: │
│ ❱ 243 add_lora_to_model(shared.args.lora) │
│ 244 │
│ │
│ /app/modules/LoRA.py:18 in add_lora_to_model │
│ │
│ 17 else: │
│ ❱ 18 add_lora_transformers(lora_names) │
│ 19 │
│ │
│ /app/modules/LoRA.py:130 in add_lora_transformers │
│ │
│ 129 if len(lora_names) > 1: │
│ ❱ 130 merge_loras() │
│ 131 │
│ │
│ /app/modules/LoRA.py:152 in merge_loras │
│ │
│ 151 │
│ ❱ 152 shared.model.add_weighted_adapter(shared.lora_names, [1] * len(shared.lora_names), " │
│ 153 shared.model.set_adapter("__merged") │
│ │
│ /venv/lib/python3.10/site-packages/peft/tuners/lora.py:660 in add_weighted_adapter │
│ │
│ 659 # new rank is the max of all ranks of the adapters if not provided │
│ ❱ 660 new_rank = svd_rank or max(adapters_ranks) │
│ 661 else: │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: max() arg is an empty sequence

How to solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

5 participants