add an OpenAI-compatible provider as a generic Enterprise LLM adapter #3218

slimsag · 2024-02-20T08:14:51Z

What is this?

This is a first step towards an 'OpenAI-compatible API BYOLLM' provider.

For context, much LLM software is standardizing around the use of OpenAI-compatible APIs. Some examples:

OpenLLM (commonly used to self-host/deploy various LLMs in enterprises)
Huggingface TGI (and, by extension, AWS SageMaker)
Ollama (primarily use for local testing, but also being used in enterprises to self-host LLMs via e.g. ollama-helm).

All of these projects either have OpenAI-compatible API endpoints already, or are actively building out support for it. For self-hosted enterprise customers, it is valuable to provide this as an option as many customers need to host their own LLM due to data provenance concerns, regulations, etc. Effectively, we are working towards 'If you can provide Cody with an OpenAI-compatible API, Cody can connect to it and use it' as a self-hosted BYOLLM strategy.

What is involved?

There are two parts to this:

Client-side provider (this change)

This change starts by supporting Starchat specifically, and has been tested/confirmed working for some customers who bring their own LLM via OpenAI-compatible endpoints.
This begins as a fork of the fireworks provider, since it is the most robust in terms of supporting multiple models and was just a good starting point for me.
This is where model-specific logic is implemented today (prompt generation, stop sequences, context limits, timeouts, etc.) - although some of this may move to the backend later depending on #wg-cody-architecture outcomes.

Backend provider (future change)

Currently this PR primarily works against a Sourcegraph instance that is configured to use an openai completions provider. But soon, I will also fork the backend openai provider and add support for an openaicompatible provider.
The reason for this (rather than just sticking with the current openai provider) is that although all of this software aims to provide an 'OpenAI-compatible API', in practice they actually can differ a fair amount. Some examples:
- Ollama sometimes appears to handle newlines incorrectly, so responses contain "\\n" rather than actual newlines, so at least for local testing with Ollama we'd like to be able to handle that.
- Different OpenAI-API providers can expect different things. For example, one API endpoint may expect the model field of the request to be gpt3.5 or gpt4 matching OpenAI, while it actually behind the scenes is Starchat.. while others may return an error in such a case and expect you to set the model field to starchat instead.
- Some OpenAI-API providers expect you to send stop sequences appropriate for the model, while others expect that you config that behind the scenes.
- When debugging against e.g. your own OpenAI-compatible API endpoint (which some customers do plan to host/run themselves), it's nice to disable certain features until you get them working. For example, one may wish to develop against a Cody+Sourcegraph deployment and disable the streaming APIs until basic non-streaming endpoints are working first.
- We want to be able to communicate to clients what the configured backend model is, e.g. even though it says it is an openai provider we want to know client-side more information about the specific model.
- In some cases, e.g. someone intending to use StarCoder for autocomplete, and a different model for Chat, we may need to provide the ability to configure different endpoints for a completions-style API vs. a chat-based API. - I'm still researching what various implementations like Ollama do here.
  - It is worth noting OpenAI does have a completions-style API, not just chat-based, although it is officially deprecated :)

Test plan

For now, this is only confirmed working with Starchat beta - and in specific circumstances. I am working actively on making testing with Ollama and with a few other models more straightforward.

After this change, an existing (current-version) Sourcegraph enterprise instance can configure an OpenAI endpoint for completions via the site config such as this pointing to a Starchat model:

  "cody.enabled": true,
  "completions": {
    "provider": "openai",
    "accessToken": "asdf",
    "endpoint": "http://openllm.foobar.com:3000",
    "completionModel": "gpt-4",
    "chatModel": "gpt-4",
    "fastChatModel": "gpt-4",
  },

The gpt-4 model parameters will be sent to the OpenAI-compatible endpoint specified, but will otherwise be unused/unrespected today by the autocomplete provider. Users then need to configure in their VS Code configuration that Cody should treat the LLM on the other side as if it were e.g. Starchat via the provider options:

    "cody.autocomplete.advanced.provider": "experimental-openaicompatible",
    "cody.autocomplete.advanced.model": "starchat-16b-beta",
    "cody.autocomplete.advanced.timeout.multiline": 10000,
    "cody.autocomplete.advanced.timeout.singleline": 10000,

I am working on making it possible to configure these options via the Sourcegraph site configuration as an admin for all users, instead of each user needing to configure it in their VS Code settings themselves.

OpenLLM is rather tricky to set up - so Ollama is the easiest way to test, however it has some notable differences. I'm working on making testing with Ollama easier. Given this is confirmed working/useful by some customers, I'd like to merge this as-is.

slimsag · 2024-02-20T08:25:49Z

Experimental insiders build for testing purposes: cody.vsix.zip

To try it:

Download + extract the .vsix file.
Uninstall the current version of Cody you have in VS Code.
Install the new version via the VSIX file:

Ensure your >Preferences: Open User Settings (JSON) have something like this:

    "cody.autocomplete.advanced.provider": "experimental-openaicompatible",
    "cody.autocomplete.advanced.model": "starchat-16b-beta",
    "cody.autocomplete.advanced.timeout.multiline": 20000,
    "cody.autocomplete.advanced.timeout.singleline": 20000,

Then reload VS Code >Developer: Reload Window and it should take effect.

slimsag · 2024-03-01T20:16:56Z

ignore this, this build should not be used

Updated build which is anticipated to fix/improve the 'document this code', 'generate unit tests', and 'fix this code' functionality

cody.vsix.zip

Increasingly, LLM software is standardizing around the use of OpenAI-esque compatible endpoints. Some examples: * [OpenLLM](https://github.com/bentoml/OpenLLM) (commonly used to self-host/deploy various LLMs in enterprises) * [Huggingface TGI](huggingface/text-generation-inference#735) (and, by extension, [AWS SageMaker](https://aws.amazon.com/blogs/machine-learning/announcing-the-launch-of-new-hugging-face-llm-inference-containers-on-amazon-sagemaker/)) * [Ollama](https://github.com/ollama/ollama) (commonly used for running LLMs locally, useful for local testing) All of these projects either have OpenAI-compatible API endpoints already, or are actively building out support for it. On strat we are regularly working with enterprise customers that self-host their own specific-model LLM via one of these methods, and wish for Cody to consume an OpenAI endpoint (understanding some specific model is on the other side and that Cody should optimize for / target that specific model.) Since Cody needs to tailor to a specific model (prompt generation, stop sequences, context limits, timeouts, etc.) and handle other provider-specific nuances, it is insufficient to simply expect that a customer-provided OpenAI compatible endpoint is in fact 1:1 compatible with e.g. GPT-3.5 or GPT-4. We need to be able to configure/tune many of these aspects to the specific provider/model, even though it presents as an OpenAI endpoint. In response to these needs, I am working on adding an 'OpenAI-compatible' provider proper: the ability for a Sourcegraph enterprise instance to advertise that although it is connected to an OpenAI compatible endpoint, there is in fact a specific model on the other side (starting with Starchat and Starcoder) and that Cody should target that configuration. The _first step_ of this work is this change. After this change, an existing (current-version) Sourcegraph enterprise instance can configure an OpenAI endpoint for completions via the site config such as: ``` "cody.enabled": true, "completions": { "provider": "openai", "accessToken": "asdf", "endpoint": "http://openllm.foobar.com:3000", "completionModel": "gpt-4", "chatModel": "gpt-4", "fastChatModel": "gpt-4", }, ``` The `gpt-4` model parameters will be sent to the OpenAI-compatible endpoint specified, but will otherwise be unused today. Users may then specify in their VS Code configuration that Cody should treat the LLM on the other side as if it were e.g. Starchat: ``` "cody.autocomplete.advanced.provider": "experimental-openaicompatible", "cody.autocomplete.advanced.model": "starchat-16b-beta", "cody.autocomplete.advanced.timeout.multiline": 10000, "cody.autocomplete.advanced.timeout.singleline": 10000, ``` In the future, we will make it possible to configure the above options via the Sourcegraph site configuration instead of each user needing to configure it in their VS Code settings explicitly. Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

slimsag · 2024-03-28T03:36:42Z

@philipp-spiess could I ask you for a quick review on this? I'd like to get this merged asap for a customer I've been iterating with, once merged I will continue improving / working on this over the next month - so if there's anything you think should be improved just lmk.

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

RXminuS · 2024-03-28T10:23:28Z

It looks pretty straightforward, super excited for us to continue on this! Saw you were already looking at LiteLLM as well. Having some issues with starchat-beta so haven't run it on my machine yet, but given that it's behind an experimental flag and alpha state I'm not too concerned by it either.

Great description of the next steps. My only observation, but I'm assuming it's already on your radar too, is that there's quite a sprawl of "LLM prompt template" code. This means every model we add is going to make the code even more branching. It's also very hard to see if you've missed something specific for a particular type of model.

I feel we need to work towards a more robust way of configuring and generating prompts from standard prompt components that each different implementation can assemble and disassemble. And keep these specifics close together so that you can quickly scan and implement them as a whole. Some inspiration that comes to mind is:

Also in LMStudio the proxy can apply prompt formatting for you:
They have a few configs etc that we might want to look at

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

slimsag · 2024-03-28T15:41:20Z

@RXminuS Yeah, totally agree - there's a lot we should do around prompt templating/generation; check out some of discussion docs in #wg-cody-architecture - the general plan is to revise how we do that logic in a more thoughtful way and bring it on the backend side

…#3218) Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

slimsag added 2 commits March 18, 2024 15:25

fixup!

c93b032

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

slimsag force-pushed the sg/openaicompatible branch from 5f3af92 to c93b032 Compare March 26, 2024 01:33

slimsag added 2 commits March 25, 2024 18:45

pnpm -w run check

80cc1c5

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

basic tests

b1bdcd7

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

slimsag force-pushed the sg/openaicompatible branch from 3ab65e3 to b1bdcd7 Compare March 27, 2024 01:16

fix linter error

ce562cd

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

slimsag requested a review from philipp-spiess March 28, 2024 03:35

slimsag marked this pull request as ready for review March 28, 2024 03:35

slimsag requested review from arafatkatze and RXminuS March 28, 2024 03:37

slimsag added 2 commits March 27, 2024 20:39

fix test / provider identifier

755e44d

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

fix test

d06e752

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

RXminuS approved these changes Mar 28, 2024

View reviewed changes

slimsag added 2 commits March 28, 2024 07:41

CHANGELOG

fa792b0

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

Merge remote-tracking branch 'origin/main' into sg/openaicompatible

a999d60

slimsag enabled auto-merge (squash) March 28, 2024 15:16

Merge remote-tracking branch 'origin/main' into sg/openaicompatible

89a9696

slimsag merged commit d296d98 into main Mar 28, 2024
19 of 20 checks passed

slimsag deleted the sg/openaicompatible branch March 28, 2024 16:21

slimsag added a commit that referenced this pull request Mar 28, 2024

add an OpenAI-compatible provider as a generic Enterprise LLM adapter (…

c0a2dea

…#3218) Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

slimsag mentioned this pull request Mar 28, 2024

1.10.2 patch release #3600

Merged

valerybugakov pushed a commit that referenced this pull request Apr 22, 2024

add an OpenAI-compatible provider as a generic Enterprise LLM adapter (…

0c4c486

…#3218) Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add an OpenAI-compatible provider as a generic Enterprise LLM adapter #3218

add an OpenAI-compatible provider as a generic Enterprise LLM adapter #3218

slimsag commented Feb 20, 2024 •

edited

slimsag commented Feb 20, 2024

slimsag commented Mar 1, 2024 •

edited

slimsag commented Mar 28, 2024

RXminuS commented Mar 28, 2024

slimsag commented Mar 28, 2024

add an OpenAI-compatible provider as a generic Enterprise LLM adapter #3218

add an OpenAI-compatible provider as a generic Enterprise LLM adapter #3218

Conversation

slimsag commented Feb 20, 2024 • edited

What is this?

What is involved?

Test plan

slimsag commented Feb 20, 2024

slimsag commented Mar 1, 2024 • edited

slimsag commented Mar 28, 2024

RXminuS commented Mar 28, 2024

slimsag commented Mar 28, 2024

slimsag commented Feb 20, 2024 •

edited

slimsag commented Mar 1, 2024 •

edited