Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HF Inference endpoints via text-generation-inference / EasyLLM #190

Open
ggbetz opened this issue Aug 21, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@ggbetz
Copy link

ggbetz commented Aug 21, 2023

Hi Luca! It's excellent that one can use 🔥LMQL with locally served open models. Now, it would be handy if we could use open LLMs running in the cloud as easily as OpenAI's models. E.g.:

argmax 
    "Say 'this is a test':[RESPONSE]" 
from 
    "hf:meta-llama/Llama-2-70b-chat-hf" 
where 
    len(TOKENS(RESPONSE)) < 10

@philschmid has this EasyLLM and I wonder whether it might be pretty straightforward to implement that feature with easyllm?

If you think it's worthwhile, too, some pointers about where to start and a short instruction/plan for implementing that feature would be very welcome.

Cheers, Gregor

@lbeurerkellner lbeurerkellner added the enhancement New feature or request label Aug 21, 2023
@lbeurerkellner
Copy link
Collaborator

lbeurerkellner commented Aug 21, 2023

Hi Gregor, thanks for the suggestions. Additional backends are always welcome. We recently added first support for replicate.com which does allow you to run open models in the cloud (@charles-dyfis-net is working on this).

After a brief look, EasyLLM seems promising and could be a way to switch out OpenAI models for other models, mocking the same interface. However, looking at https://github.com/philschmid/easyllm/blob/main/easyllm/schema/openai.py#L60, EasyLLM seems to lack support for the most crucial feature that LMQL requires to operate at full feature set, i.e. the logit_bias parameter.

logit_bias is what allows LMQL to guide the model during text generation according to the query program and constraints. Unfortunately, I have found with most projects that implement OpenAI-like APIs, that none of them so far implement it to full faithfulness (e.g. streaming, batching + logit bias), required for LMQL compatibility. I admit, keeping up and achieving good levels of OpenAI API compliance is hard and a moving target, but unfortunately a requirement for use with LMQL.

Now, since OpenAI APIs change a lot, deprecate fast and are also increasingly proprietary+intransparent (vendor lock-in), we decided to move away from relying too much on them, and instead build our own (open) protocol for efficient language model streaming. For this, we implement the language model transport protocol. All non-OpenAI backends in LMQL have moved to this protocol in the meantime and I would suggest this as a good starting point to build new backends (consider the random model for a simple reference implementation). We could already add simple (non-constraint-enabled) HuggingFace text-generation-inference support, but are still waiting for logit_bias support on their end, cf. huggingface/text-generation-inference#810).

@ggbetz
Copy link
Author

ggbetz commented Aug 21, 2023

Thanks for the detailed explanation, Luca. I understand. So let's wait for logit_bias being implemented in TGI (just upvoted the PR).

@ggbetz ggbetz closed this as completed Aug 21, 2023
@ggbetz ggbetz reopened this Aug 21, 2023
@lbeurerkellner lbeurerkellner changed the title HF Inference endpoints via EasyLLM? HF Inference endpoints via text-generation-inference / EasyLLM Sep 22, 2023
@lbeurerkellner lbeurerkellner changed the title HF Inference endpoints via text-generation-inference / EasyLLM Add support for HF Inference endpoints via text-generation-inference / EasyLLM Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants