Adapter support #378

QLutz · 2023-05-30T12:18:55Z

Feature request

Enable the use of locally stored adapters as created by huggingface/peft. Ideally, this should be compatible with the most notable benefits of TGI (e.g. sharing and flash attention).

Motivation

Using models fine-tuned with PEFT is possible only by merging the adapter back in the original weights of the model. This is especially cumbersome in terms of disk space for use-cases where the user has many adapters for just one model.

Your contribution

I'm not sure how much work this may induce or if it is at all feasible (notably enabling sharding with adapters). I'll gladly read any insights on the complexity and the relevance of adding this feature.

OlivierDehaene · 2023-05-30T15:13:33Z

Hello!
This is something that we want to support in the future but our bandwith is very limited at the moment.

chris-aeviator · 2023-08-27T15:37:06Z

here's a wip/ poc of loading an adapter model via Peft ohmytofu-ai@aba56c1 .

This is adressing #896 (comment)). I cannot test hot-swapping right now since I'm trying to finish my LlamaModel inference server and it seems to miss load_addapter methods which I'm going to implement I guess.

rudolpheric · 2023-09-26T17:18:03Z

I am also very interested. @OlivierDehaene: Are their any updates on when it will be implemented?

xiaoyunwu · 2023-11-24T21:07:35Z

instead of support more models, I think we should get this working first? I am interested in this. There is a S-Lora out there already.

Narsil · 2023-11-27T12:22:56Z

Everything is already working actually since quite a while actually.

Closing this.

QLutz · 2023-11-27T14:57:25Z

@Narsil Unless I missed something (in which case I'd be very grateful for any pointers), TGI does not support loading multiple adapters for one single base model simultaneously yet.

Some mechanisms have been implemented that automate the merging of an adapter into its base model (PR #935) but that is more of a (most welcome !) convenience feature rather than the more ambitious support for adapters (best represented in the SOTA today by S-LoRA) first described in this thread.

Obviously, the final decision will come from your end but I think many of us would like to know if you plan on adding this to TGI.

mhillebrand · 2024-01-26T20:44:13Z

@Narsil vLLM recently merged a multi-LoRA feature into their main branch. Perhaps this ticket should be reopened?

OlivierDehaene added the feature request New feature or request label May 30, 2023

Narsil closed this as completed Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapter support #378

Adapter support #378

QLutz commented May 30, 2023

OlivierDehaene commented May 30, 2023

chris-aeviator commented Aug 27, 2023

rudolpheric commented Sep 26, 2023

xiaoyunwu commented Nov 24, 2023

Narsil commented Nov 27, 2023

QLutz commented Nov 27, 2023

mhillebrand commented Jan 26, 2024

Adapter support #378

Adapter support #378

Comments

QLutz commented May 30, 2023

Feature request

Motivation

Your contribution

OlivierDehaene commented May 30, 2023

chris-aeviator commented Aug 27, 2023

rudolpheric commented Sep 26, 2023

xiaoyunwu commented Nov 24, 2023

Narsil commented Nov 27, 2023

QLutz commented Nov 27, 2023

mhillebrand commented Jan 26, 2024