-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error loading multiple LORAs on Transformers Adapter #4371
Comments
I wrote in the (#3120) that the PR uses obsolete PEFT code but it has been merged anyway. So There is far more problems with this approach than just make merging work. For example add_weighted_adapter will silently bail out next time you try to merge Loras into the same adapter name making user think you applied new ones, but in fact nothing was done etc... but this is not handled in the merge at all. But I don't know how to convince people that this is wrong approach.
I know people want it to work like Stable diffusion, but text is not images. A funny Lora and a poetry Lora will not create funny poetry merge. So we should work with what it is, not what people imagine it is. Edit: Retracting my statement (but leaving it unedited here). Since the Lora merging works fine on exllama2 the above 1 and 2 is not a solution as it applies to transformers only. |
@FartyPants it seems to me you understand a lot more about PEFT and LORAs than I do. I am currently working in my undergraduate thesis so I really need multiple LORAs to work simultaneously. Could you explain to me how to do it or refer me to some online resource, please? |
I disagree with the idea that the whole web UI has to be rewritten to support this feature, and find it counterproductive to criticize everything without providing alternatives. The current LoRA menu works fine for ExLlamav2 in the multi-lora case. |
Generally PEFT supports loading multiple lora but the main purpose is in switching between them (which is instantaneous) because LLM do not work the way pictures do. There are also few methods to merge Loras - linear (they need to be same rank) - but the result is basically a big pile of bull anyway because you are averaging them together. Average two sets of weights and tell me the result effect before you do - hahaha - it's just a voodoo I'd been wrestling with PEFT for long (and they have a number of issues)
It is fascinating to see how two LORAs work together - but remember most of the time the LLM will try to be schizophrenic. If you load say philosophical and funny LORA, you will probably not get funny philosophy, rather the model will try to be funny at one moment and philosophical at the other. (you get some overlap, but not how you imagine) Edit: to be clear - this above ONLY apply to transformers. |
That's good to know. I know only very little about PEFT and nothing about exllama2. If it works with exllama as it should then there is no issue as it would be probably preferable to use exllama2 anyway for most people. So forget I said anything (people should not listen to me anyway - just ask my wife) I only wanted to avoid problems down the line with people posting that merging doesn't work well or at all while I already knew month ago it doesn't. But that's only PEFT case - so I'll be quiet, because I really know nothing about exllama. Again sorry I interjected. Rock on. I'll keep my musing to myself next time. |
It works in exl2 because I think it uses it's own code. For GGUF multiple loras merge fine into the model, even into quants. For PEFT the same exact loras merge together but raise the model's perplexity scores and do not have the desired effect. The mind boggles. I guess now that it works, I need to try running the same loras over exl2 and see what I get. It would be mighty funny if it worked well there too. edit: whelp.. tried it and gets identical scores to the lora merged in playground. At least we know playground is working. Dunno why GGUF is the outlier, it definitely merged as the loras killed the previous repeat issue in the model. |
Did you also try to apply Loras to exl2 in Playground? I'd be interested to see if it works or need some attention. I don't use anything else except transformers. |
@Ph0rk0z hey, your comment has made me curious about exllamav2. How do I train a LORA with exllamav2? Or do I just use a normal Transformers LORA with a exllamav2 model? Thanks in advance |
In Training and Training PRO it definitely creates the adapter through PEFT - so it 100% expects the model to be transformers. |
Exllama v1/v2 can't train but it loads loras just fine. Only autogptq/alpaca_lora_4bit trains on GPTQ models. Otherwise you need the FP16 and can use --load-in-4bit BTW, to merge lora in playground I used alpaca_lora_4bit to load the model and then performed the merges and saved. I then used the results on exllama and friends. |
Yeah, compare it to PEFT, the LORA in exllama 2 is more simplistic (but after all PEFT is used for stable diffusion an many other projects as well) but also super clean and non-problematic. You load lora(s) sequentially and that's it. If you want to merge lora and set weights then spawn it to new lora, you still need to go through transformers and PEFT rn. It's more reminiscent of how Automatic1111 uses Lora as a chain, instead of how PEFT uses loras as a merge that comes with a tons of problems attached to it. I added exllama2 loading lora in playground (just one lora for now as a test) Honestly, the exllama2 LORA way is WAY better and much cleaner code to build upon than PEFT, even if it is missing stuff. Far less over-engineered than PEFT and very clean, not like PEFT hairball. So I hope they continue. If they add training, I'd abandon PEFT in a heartbeat. |
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
python server.py --auto-devices --gpu-memory 80 --listen --api --model-dir /data/mlops/modelDir/ --model Qwen1.5-7B-Chat --trust-remote-code --lora-dir /data/mlops/adapterDir --lora ouputDir_1 ouputDir_2 Applying the following LoRAs to Qwen1.5-7B-Chat: ouputDir_1, ouputDir_2 How to solve it? |
Describe the bug
When trying to load multiple LORAs at the same time, an error occurs and none of the LORAs are applied.
This bug only allows for one LORA to be applied at the same time using the Transformers Adapter.
Is there an existing issue for this?
Reproduction
Screenshot
Logs
System Info
The text was updated successfully, but these errors were encountered: