New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add an OpenAI-compatible provider as a generic Enterprise LLM adapter #3218
Conversation
Experimental insiders build for testing purposes: cody.vsix.zip To try it:
Ensure your
Then reload |
ignore this, this build should not be usedUpdated build which is anticipated to fix/improve the 'document this code', 'generate unit tests', and 'fix this code' functionality |
Increasingly, LLM software is standardizing around the use of OpenAI-esque compatible endpoints. Some examples: * [OpenLLM](https://github.com/bentoml/OpenLLM) (commonly used to self-host/deploy various LLMs in enterprises) * [Huggingface TGI](huggingface/text-generation-inference#735) (and, by extension, [AWS SageMaker](https://aws.amazon.com/blogs/machine-learning/announcing-the-launch-of-new-hugging-face-llm-inference-containers-on-amazon-sagemaker/)) * [Ollama](https://github.com/ollama/ollama) (commonly used for running LLMs locally, useful for local testing) All of these projects either have OpenAI-compatible API endpoints already, or are actively building out support for it. On strat we are regularly working with enterprise customers that self-host their own specific-model LLM via one of these methods, and wish for Cody to consume an OpenAI endpoint (understanding some specific model is on the other side and that Cody should optimize for / target that specific model.) Since Cody needs to tailor to a specific model (prompt generation, stop sequences, context limits, timeouts, etc.) and handle other provider-specific nuances, it is insufficient to simply expect that a customer-provided OpenAI compatible endpoint is in fact 1:1 compatible with e.g. GPT-3.5 or GPT-4. We need to be able to configure/tune many of these aspects to the specific provider/model, even though it presents as an OpenAI endpoint. In response to these needs, I am working on adding an 'OpenAI-compatible' provider proper: the ability for a Sourcegraph enterprise instance to advertise that although it is connected to an OpenAI compatible endpoint, there is in fact a specific model on the other side (starting with Starchat and Starcoder) and that Cody should target that configuration. The _first step_ of this work is this change. After this change, an existing (current-version) Sourcegraph enterprise instance can configure an OpenAI endpoint for completions via the site config such as: ``` "cody.enabled": true, "completions": { "provider": "openai", "accessToken": "asdf", "endpoint": "http://openllm.foobar.com:3000", "completionModel": "gpt-4", "chatModel": "gpt-4", "fastChatModel": "gpt-4", }, ``` The `gpt-4` model parameters will be sent to the OpenAI-compatible endpoint specified, but will otherwise be unused today. Users may then specify in their VS Code configuration that Cody should treat the LLM on the other side as if it were e.g. Starchat: ``` "cody.autocomplete.advanced.provider": "experimental-openaicompatible", "cody.autocomplete.advanced.model": "starchat-16b-beta", "cody.autocomplete.advanced.timeout.multiline": 10000, "cody.autocomplete.advanced.timeout.singleline": 10000, ``` In the future, we will make it possible to configure the above options via the Sourcegraph site configuration instead of each user needing to configure it in their VS Code settings explicitly. Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
5f3af92
to
c93b032
Compare
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
3ab65e3
to
b1bdcd7
Compare
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
@philipp-spiess could I ask you for a quick review on this? I'd like to get this merged asap for a customer I've been iterating with, once merged I will continue improving / working on this over the next month - so if there's anything you think should be improved just lmk. |
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
It looks pretty straightforward, super excited for us to continue on this! Saw you were already looking at LiteLLM as well. Having some issues with starchat-beta so haven't run it on my machine yet, but given that it's behind an experimental flag and alpha state I'm not too concerned by it either. Great description of the next steps. My only observation, but I'm assuming it's already on your radar too, is that there's quite a sprawl of "LLM prompt template" code. This means every model we add is going to make the code even more branching. It's also very hard to see if you've missed something specific for a particular type of model. I feel we need to work towards a more robust way of configuring and generating prompts from standard prompt components that each different implementation can assemble and disassemble. And keep these specifics close together so that you can quickly scan and implement them as a whole. Some inspiration that comes to mind is: Also in LMStudio the proxy can apply prompt formatting for you: |
@RXminuS Yeah, totally agree - there's a lot we should do around prompt templating/generation; check out some of discussion docs in #wg-cody-architecture - the general plan is to revise how we do that logic in a more thoughtful way and bring it on the backend side |
…#3218) Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
…#3218) Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
What is this?
This is a first step towards an 'OpenAI-compatible API BYOLLM' provider.
For context, much LLM software is standardizing around the use of OpenAI-compatible APIs. Some examples:
All of these projects either have OpenAI-compatible API endpoints already, or are actively building out support for it. For self-hosted enterprise customers, it is valuable to provide this as an option as many customers need to host their own LLM due to data provenance concerns, regulations, etc. Effectively, we are working towards 'If you can provide Cody with an OpenAI-compatible API, Cody can connect to it and use it' as a self-hosted BYOLLM strategy.
What is involved?
There are two parts to this:
fireworks
provider, since it is the most robust in terms of supporting multiple models and was just a good starting point for me.openai
completions provider. But soon, I will also fork the backendopenai
provider and add support for anopenaicompatible
provider.openai
provider) is that although all of this software aims to provide an 'OpenAI-compatible API', in practice they actually can differ a fair amount. Some examples:"\\n"
rather than actual newlines, so at least for local testing with Ollama we'd like to be able to handle that.model
field of the request to begpt3.5
orgpt4
matching OpenAI, while it actually behind the scenes is Starchat.. while others may return an error in such a case and expect you to set themodel
field tostarchat
instead.openai
provider we want to know client-side more information about the specific model.completions
-style API vs. a chat-based API. - I'm still researching what various implementations like Ollama do here.Test plan
For now, this is only confirmed working with Starchat beta - and in specific circumstances. I am working actively on making testing with Ollama and with a few other models more straightforward.
After this change, an existing (current-version) Sourcegraph enterprise instance can configure an OpenAI endpoint for completions via the site config such as this pointing to a Starchat model:
The
gpt-4
model parameters will be sent to the OpenAI-compatible endpoint specified, but will otherwise be unused/unrespected today by the autocomplete provider. Users then need to configure in their VS Code configuration that Cody should treat the LLM on the other side as if it were e.g. Starchat via the provider options:I am working on making it possible to configure these options via the Sourcegraph site configuration as an admin for all users, instead of each user needing to configure it in their VS Code settings themselves.
OpenLLM is rather tricky to set up - so Ollama is the easiest way to test, however it has some notable differences. I'm working on making testing with Ollama easier. Given this is confirmed working/useful by some customers, I'd like to merge this as-is.