-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial multi-lora support #1103
Conversation
I'll try to review this one and #1098 later today. Better supporting LoRAs is a priority to me know and your PRs are very helpful. I'm also considering using a custom version of PEFT in the requirements.txt to support applying LoRAs to 4-bit models. |
server.py
Outdated
@@ -211,8 +211,9 @@ def create_model_menus(): | |||
ui.create_refresh_button(shared.gradio['model_menu'], lambda: None, lambda: {'choices': get_available_models()}, 'refresh-button') | |||
with gr.Column(): | |||
with gr.Row(): | |||
shared.gradio['lora_menu'] = gr.Dropdown(choices=available_loras, value=shared.lora_name, label='LoRA') | |||
shared.gradio['lora_menu'] = gr.CheckboxGroup(choices=available_loras, value=shared.lora_names, label='LoRA model(s)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how many LoRAs people might end up with, but you could maybe keep this a Dropdown
and add the multiselect=True
argument. Probably a clearer UI experience?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, I'm dumb. In #853 this is what the auto webui used for styles
(my suggested option #4) but I never looked into how that was done. This is indeed much cleaner. Going to change that and push, though it seems I'll need to rebase and force-push as Main branch changed code right next to the server.py edits here.
rebuilt off main
435c600
to
0143b92
Compare
If you select multiple LoRAs (like 4 or 5), the row containing the new button and the LoRA dropdown grows in an awkward way. Is it possible to implement this menu in such way that it occupies a constant area and never grows? |
uhh... by really forcing it with CSS
it prevents vertical growth, at the cost of the replacement awkwardness that if you have more LoRAs than fit on one line, they're hidden behind a scrollbar. I don't know if that's worth doing? I can add it if you prefer it. |
|
||
if lora_name not in ['None', '']: | ||
print(f"Adding the LoRA {lora_name} to the model...") | ||
# Only adding, and already peft? Do it the easy way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to assume that the model is already peft? For instance, if you load llama-7b without the --lora
argument, it will not have been loaded with PeftModel.from_pretrained
.
Edit: okay, this is only executed if len(set(shared.lora_names)) > 0
, in which case the model will have been loaded with PeftModel.from_pretrained
.
@@ -25,7 +38,11 @@ def add_lora_to_model(lora_name): | |||
elif shared.args.load_in_8bit: | |||
params['device_map'] = {'': 0} | |||
|
|||
shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_name}"), **params) | |||
shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_names[0]}"), **params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also related to the comment above, if the model is "fresh", is it necessary to reload it with PeftModel.from_pretrained
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, this isn't actually a full reload, this is just taking the non-peft model and wrapping it (+ applying the first lora). It definitely runs a lot faster than a full model load, and doesn't output any of the loading nonsense to console at least.
requests | ||
rwkv==0.7.3 | ||
safetensors==0.3.0 | ||
sentencepiece | ||
pyyaml | ||
tqdm | ||
git+https://github.com/huggingface/peft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, is the dev version of peft required? The code seems to run without errors with peft==0.2.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peft 0.2.0 was released March 9th https://github.com/huggingface/peft/releases and multi-adapter support was merged April 6th huggingface/peft#263 so, yes, it's needed? I'm not sure what could lead it to seemingly work on 0.2.0 for you, possibly you accidentally had a different version installed while testing because you were recently testing johnsmith0031/alpaca_lora_4bit#13 as well? idk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely needed then.
I did some reorganizing and it looks fine now, no need to change the CSS. What I really wanted was for the model and the lora dropdowns to be on the same line. |
For #853
Contains initial support for loading multiple LoRAs at once.
Works as a checkbox group, with a refresh button, and an apply button.
If you checkmark new loras that weren't checkmarked before, it loads them very very quickly. If you unload prior loras, it removes all for now and then re-adds. Way faster than a full model reload.
The
merge_and_unload
function seems to not support 8-bit.This alters requirements to require a direct git copy of peft for now, as they haven't published the update with this feature yet.
I have not fully tested the results of generating with multiple LoRAs, only that they load/unload and the model still works.
I have also not tested against the possibility of memleaks or other issues arising from repeatedly mucking with loras on the fly.
I have only tested in 8bit with LLaMA-13B currently.