Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add an OpenAI-compatible provider as a generic Enterprise LLM adapter #3218

Merged
merged 10 commits into from Mar 28, 2024

Conversation

slimsag
Copy link
Member

@slimsag slimsag commented Feb 20, 2024

What is this?

This is a first step towards an 'OpenAI-compatible API BYOLLM' provider.

For context, much LLM software is standardizing around the use of OpenAI-compatible APIs. Some examples:

All of these projects either have OpenAI-compatible API endpoints already, or are actively building out support for it. For self-hosted enterprise customers, it is valuable to provide this as an option as many customers need to host their own LLM due to data provenance concerns, regulations, etc. Effectively, we are working towards 'If you can provide Cody with an OpenAI-compatible API, Cody can connect to it and use it' as a self-hosted BYOLLM strategy.

What is involved?

There are two parts to this:

  1. Client-side provider (this change)
  • This change starts by supporting Starchat specifically, and has been tested/confirmed working for some customers who bring their own LLM via OpenAI-compatible endpoints.
  • This begins as a fork of the fireworks provider, since it is the most robust in terms of supporting multiple models and was just a good starting point for me.
  • This is where model-specific logic is implemented today (prompt generation, stop sequences, context limits, timeouts, etc.) - although some of this may move to the backend later depending on #wg-cody-architecture outcomes.
  1. Backend provider (future change)
  • Currently this PR primarily works against a Sourcegraph instance that is configured to use an openai completions provider. But soon, I will also fork the backend openai provider and add support for an openaicompatible provider.
  • The reason for this (rather than just sticking with the current openai provider) is that although all of this software aims to provide an 'OpenAI-compatible API', in practice they actually can differ a fair amount. Some examples:
    • Ollama sometimes appears to handle newlines incorrectly, so responses contain "\\n" rather than actual newlines, so at least for local testing with Ollama we'd like to be able to handle that.
    • Different OpenAI-API providers can expect different things. For example, one API endpoint may expect the model field of the request to be gpt3.5 or gpt4 matching OpenAI, while it actually behind the scenes is Starchat.. while others may return an error in such a case and expect you to set the model field to starchat instead.
    • Some OpenAI-API providers expect you to send stop sequences appropriate for the model, while others expect that you config that behind the scenes.
    • When debugging against e.g. your own OpenAI-compatible API endpoint (which some customers do plan to host/run themselves), it's nice to disable certain features until you get them working. For example, one may wish to develop against a Cody+Sourcegraph deployment and disable the streaming APIs until basic non-streaming endpoints are working first.
    • We want to be able to communicate to clients what the configured backend model is, e.g. even though it says it is an openai provider we want to know client-side more information about the specific model.
    • In some cases, e.g. someone intending to use StarCoder for autocomplete, and a different model for Chat, we may need to provide the ability to configure different endpoints for a completions-style API vs. a chat-based API. - I'm still researching what various implementations like Ollama do here.
      • It is worth noting OpenAI does have a completions-style API, not just chat-based, although it is officially deprecated :)

Test plan

For now, this is only confirmed working with Starchat beta - and in specific circumstances. I am working actively on making testing with Ollama and with a few other models more straightforward.

After this change, an existing (current-version) Sourcegraph enterprise instance can configure an OpenAI endpoint for completions via the site config such as this pointing to a Starchat model:

  "cody.enabled": true,
  "completions": {
    "provider": "openai",
    "accessToken": "asdf",
    "endpoint": "http://openllm.foobar.com:3000",
    "completionModel": "gpt-4",
    "chatModel": "gpt-4",
    "fastChatModel": "gpt-4",
  },

The gpt-4 model parameters will be sent to the OpenAI-compatible endpoint specified, but will otherwise be unused/unrespected today by the autocomplete provider. Users then need to configure in their VS Code configuration that Cody should treat the LLM on the other side as if it were e.g. Starchat via the provider options:

    "cody.autocomplete.advanced.provider": "experimental-openaicompatible",
    "cody.autocomplete.advanced.model": "starchat-16b-beta",
    "cody.autocomplete.advanced.timeout.multiline": 10000,
    "cody.autocomplete.advanced.timeout.singleline": 10000,

I am working on making it possible to configure these options via the Sourcegraph site configuration as an admin for all users, instead of each user needing to configure it in their VS Code settings themselves.

OpenLLM is rather tricky to set up - so Ollama is the easiest way to test, however it has some notable differences. I'm working on making testing with Ollama easier. Given this is confirmed working/useful by some customers, I'd like to merge this as-is.

@slimsag
Copy link
Member Author

slimsag commented Feb 20, 2024

Experimental insiders build for testing purposes: cody.vsix.zip

To try it:

  1. Download + extract the .vsix file.
  2. Uninstall the current version of Cody you have in VS Code.
  3. Install the new version via the VSIX file:
image

Ensure your >Preferences: Open User Settings (JSON) have something like this:

    "cody.autocomplete.advanced.provider": "experimental-openaicompatible",
    "cody.autocomplete.advanced.model": "starchat-16b-beta",
    "cody.autocomplete.advanced.timeout.multiline": 20000,
    "cody.autocomplete.advanced.timeout.singleline": 20000,

Then reload VS Code >Developer: Reload Window and it should take effect.

@slimsag
Copy link
Member Author

slimsag commented Mar 1, 2024

ignore this, this build should not be used

Updated build which is anticipated to fix/improve the 'document this code', 'generate unit tests', and 'fix this code' functionality

cody.vsix.zip

Increasingly, LLM software is standardizing around the use of OpenAI-esque
compatible endpoints. Some examples:

* [OpenLLM](https://github.com/bentoml/OpenLLM) (commonly used to self-host/deploy various LLMs in enterprises)
* [Huggingface TGI](huggingface/text-generation-inference#735) (and, by extension, [AWS SageMaker](https://aws.amazon.com/blogs/machine-learning/announcing-the-launch-of-new-hugging-face-llm-inference-containers-on-amazon-sagemaker/))
* [Ollama](https://github.com/ollama/ollama) (commonly used for running LLMs locally, useful for local testing)

All of these projects either have OpenAI-compatible API endpoints already,
or are actively building out support for it. On strat we are regularly
working with enterprise customers that self-host their own specific-model
LLM via one of these methods, and wish for Cody to consume an OpenAI
endpoint (understanding some specific model is on the other side and that
Cody should optimize for / target that specific model.)

Since Cody needs to tailor to a specific model (prompt generation, stop
sequences, context limits, timeouts, etc.) and handle other provider-specific
nuances, it is insufficient to simply expect that a customer-provided OpenAI
compatible endpoint is in fact 1:1 compatible with e.g. GPT-3.5 or GPT-4.
We need to be able to configure/tune many of these aspects to the specific
provider/model, even though it presents as an OpenAI endpoint.

In response to these needs, I am working on adding an 'OpenAI-compatible'
provider proper: the ability for a Sourcegraph enterprise instance to
advertise that although it is connected to an OpenAI compatible endpoint,
there is in fact a specific model on the other side (starting with Starchat
and Starcoder) and that Cody should target that configuration. The _first
step_ of this work is this change.

After this change, an existing (current-version) Sourcegraph enterprise
instance can configure an OpenAI endpoint for completions via the site
config such as:

```
  "cody.enabled": true,
  "completions": {
    "provider": "openai",
    "accessToken": "asdf",
    "endpoint": "http://openllm.foobar.com:3000",
    "completionModel": "gpt-4",
    "chatModel": "gpt-4",
    "fastChatModel": "gpt-4",
  },
```

The `gpt-4` model parameters will be sent to the OpenAI-compatible endpoint
specified, but will otherwise be unused today. Users may then specify in
their VS Code configuration that Cody should treat the LLM on the other
side as if it were e.g. Starchat:

```
    "cody.autocomplete.advanced.provider": "experimental-openaicompatible",
    "cody.autocomplete.advanced.model": "starchat-16b-beta",
    "cody.autocomplete.advanced.timeout.multiline": 10000,
    "cody.autocomplete.advanced.timeout.singleline": 10000,
```

In the future, we will make it possible to configure the above options
via the Sourcegraph site configuration instead of each user needing to
configure it in their VS Code settings explicitly.

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
@slimsag slimsag marked this pull request as ready for review March 28, 2024 03:35
@slimsag
Copy link
Member Author

slimsag commented Mar 28, 2024

@philipp-spiess could I ask you for a quick review on this? I'd like to get this merged asap for a customer I've been iterating with, once merged I will continue improving / working on this over the next month - so if there's anything you think should be improved just lmk.

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
@RXminuS
Copy link
Contributor

RXminuS commented Mar 28, 2024

It looks pretty straightforward, super excited for us to continue on this! Saw you were already looking at LiteLLM as well. Having some issues with starchat-beta so haven't run it on my machine yet, but given that it's behind an experimental flag and alpha state I'm not too concerned by it either.

Great description of the next steps. My only observation, but I'm assuming it's already on your radar too, is that there's quite a sprawl of "LLM prompt template" code. This means every model we add is going to make the code even more branching. It's also very hard to see if you've missed something specific for a particular type of model.

I feel we need to work towards a more robust way of configuring and generating prompts from standard prompt components that each different implementation can assemble and disassemble. And keep these specifics close together so that you can quickly scan and implement them as a whole. Some inspiration that comes to mind is:

Also in LMStudio the proxy can apply prompt formatting for you:
CleanShot 2024-03-28 at 11 11 53 They have a few configs etc that we might want to look at

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
@slimsag slimsag enabled auto-merge (squash) March 28, 2024 15:16
@slimsag
Copy link
Member Author

slimsag commented Mar 28, 2024

@RXminuS Yeah, totally agree - there's a lot we should do around prompt templating/generation; check out some of discussion docs in #wg-cody-architecture - the general plan is to revise how we do that logic in a more thoughtful way and bring it on the backend side

@slimsag slimsag merged commit d296d98 into main Mar 28, 2024
19 of 20 checks passed
@slimsag slimsag deleted the sg/openaicompatible branch March 28, 2024 16:21
slimsag added a commit that referenced this pull request Mar 28, 2024
…#3218)

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
@slimsag slimsag mentioned this pull request Mar 28, 2024
valerybugakov pushed a commit that referenced this pull request Apr 22, 2024
…#3218)

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants