Add support for HF Inference endpoints via text-generation-inference / EasyLLM #190

ggbetz · 2023-08-21T09:55:04Z

Hi Luca! It's excellent that one can use 🔥LMQL with locally served open models. Now, it would be handy if we could use open LLMs running in the cloud as easily as OpenAI's models. E.g.:

argmax 
    "Say 'this is a test':[RESPONSE]" 
from 
    "hf:meta-llama/Llama-2-70b-chat-hf" 
where 
    len(TOKENS(RESPONSE)) < 10

@philschmid has this EasyLLM and I wonder whether it might be pretty straightforward to implement that feature with easyllm?

If you think it's worthwhile, too, some pointers about where to start and a short instruction/plan for implementing that feature would be very welcome.

Cheers, Gregor

The text was updated successfully, but these errors were encountered:

lbeurerkellner · 2023-08-21T14:21:54Z

Hi Gregor, thanks for the suggestions. Additional backends are always welcome. We recently added first support for replicate.com which does allow you to run open models in the cloud (@charles-dyfis-net is working on this).

After a brief look, EasyLLM seems promising and could be a way to switch out OpenAI models for other models, mocking the same interface. However, looking at https://github.com/philschmid/easyllm/blob/main/easyllm/schema/openai.py#L60, EasyLLM seems to lack support for the most crucial feature that LMQL requires to operate at full feature set, i.e. the logit_bias parameter.

logit_bias is what allows LMQL to guide the model during text generation according to the query program and constraints. Unfortunately, I have found with most projects that implement OpenAI-like APIs, that none of them so far implement it to full faithfulness (e.g. streaming, batching + logit bias), required for LMQL compatibility. I admit, keeping up and achieving good levels of OpenAI API compliance is hard and a moving target, but unfortunately a requirement for use with LMQL.

Now, since OpenAI APIs change a lot, deprecate fast and are also increasingly proprietary+intransparent (vendor lock-in), we decided to move away from relying too much on them, and instead build our own (open) protocol for efficient language model streaming. For this, we implement the language model transport protocol. All non-OpenAI backends in LMQL have moved to this protocol in the meantime and I would suggest this as a good starting point to build new backends (consider the random model for a simple reference implementation). We could already add simple (non-constraint-enabled) HuggingFace text-generation-inference support, but are still waiting for logit_bias support on their end, cf. huggingface/text-generation-inference#810).

ggbetz · 2023-08-21T15:22:53Z

Thanks for the detailed explanation, Luca. I understand. So let's wait for logit_bias being implemented in TGI (just upvoted the PR).

lbeurerkellner added the enhancement New feature or request label Aug 21, 2023

ggbetz closed this as completed Aug 21, 2023

ggbetz reopened this Aug 21, 2023

ggbetz mentioned this issue Aug 21, 2023

Added support for logit_bias in token generation huggingface/text-generation-inference#810

Closed

10 tasks

This was referenced Sep 8, 2023

Tgi huggingface inference #203

Closed

[Feature] Add support for logit_bias philschmid/easyllm#35

Open

lbeurerkellner changed the title ~~HF Inference endpoints via EasyLLM?~~ HF Inference endpoints via text-generation-inference / EasyLLM Sep 22, 2023

lbeurerkellner changed the title ~~HF Inference endpoints via text-generation-inference / EasyLLM~~ Add support for HF Inference endpoints via text-generation-inference / EasyLLM Sep 22, 2023

KreshLaDoge mentioned this issue Jan 31, 2024

Guidance acceleration huggingface/text-generation-inference#505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for HF Inference endpoints via text-generation-inference / EasyLLM #190

Add support for HF Inference endpoints via text-generation-inference / EasyLLM #190

ggbetz commented Aug 21, 2023

lbeurerkellner commented Aug 21, 2023 •

edited

Loading

ggbetz commented Aug 21, 2023

Add support for HF Inference endpoints via text-generation-inference / EasyLLM #190

Add support for HF Inference endpoints via text-generation-inference / EasyLLM #190

Comments

ggbetz commented Aug 21, 2023

lbeurerkellner commented Aug 21, 2023 • edited Loading

ggbetz commented Aug 21, 2023

lbeurerkellner commented Aug 21, 2023 •

edited

Loading