Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapter support #378

Closed
QLutz opened this issue May 30, 2023 · 7 comments
Closed

Adapter support #378

QLutz opened this issue May 30, 2023 · 7 comments
Labels
feature request New feature or request

Comments

@QLutz
Copy link

QLutz commented May 30, 2023

Feature request

Enable the use of locally stored adapters as created by huggingface/peft. Ideally, this should be compatible with the most notable benefits of TGI (e.g. sharing and flash attention).

Motivation

Using models fine-tuned with PEFT is possible only by merging the adapter back in the original weights of the model. This is especially cumbersome in terms of disk space for use-cases where the user has many adapters for just one model.

Your contribution

I'm not sure how much work this may induce or if it is at all feasible (notably enabling sharding with adapters). I'll gladly read any insights on the complexity and the relevance of adding this feature.

@OlivierDehaene
Copy link
Member

Hello!
This is something that we want to support in the future but our bandwith is very limited at the moment.

@OlivierDehaene OlivierDehaene added the feature request New feature or request label May 30, 2023
@chris-aeviator
Copy link

here's a wip/ poc of loading an adapter model via Peft ohmytofu-ai@aba56c1 .

This is adressing #896 (comment)). I cannot test hot-swapping right now since I'm trying to finish my LlamaModel inference server and it seems to miss load_addapter methods which I'm going to implement I guess.

@rudolpheric
Copy link

I am also very interested. @OlivierDehaene: Are their any updates on when it will be implemented?

@xiaoyunwu
Copy link

instead of support more models, I think we should get this working first? I am interested in this. There is a S-Lora out there already.

@Narsil
Copy link
Collaborator

Narsil commented Nov 27, 2023

Everything is already working actually since quite a while actually.

Closing this.

@Narsil Narsil closed this as completed Nov 27, 2023
@QLutz
Copy link
Author

QLutz commented Nov 27, 2023

@Narsil Unless I missed something (in which case I'd be very grateful for any pointers), TGI does not support loading multiple adapters for one single base model simultaneously yet.

Some mechanisms have been implemented that automate the merging of an adapter into its base model (PR #935) but that is more of a (most welcome !) convenience feature rather than the more ambitious support for adapters (best represented in the SOTA today by S-LoRA) first described in this thread.

Obviously, the final decision will come from your end but I think many of us would like to know if you plan on adding this to TGI.

@mhillebrand
Copy link

@Narsil vLLM recently merged a multi-LoRA feature into their main branch. Perhaps this ticket should be reopened?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants