A simple proxy allowing you to use generative AI models hosted on Google Cloud (Vertex AI) together with Tabby.
It's an HTTP server running locally that listens on localhost, uses your Google Cloud Application Default Credentials (ADC) to get an access token, then calls a Vertex AI endpoint and transforms the results.
openai/chat⇒ DeepSeek-V3.2mistral/chat⇒ Codestral 2mistral/completion⇒ Codestral 2azure/embedding⇒ Text embeddings API
First, download and install Tabby.
You can use the CPU-only version for Linux, Windows or Apple if you don't need any local models.
Then, create config.toml in $HOME/.tabby on Linux or MacOS or
%USERPROFILE%\.tabby on Windows with the following contents:
[model.completion.http]
kind = "mistral/completion"
model_name = "codestral-2"
api_endpoint = "http://localhost:4000"
api_key = ""
[model.embedding.http]
kind = "azure/embedding"
model_name = "text-embedding"
api_endpoint = "http://localhost:4000"
api_key = ""
[model.chat.http]
kind = "openai/chat"
model_name = "deepseek-v3.2-maas"
api_endpoint = "http://localhost:4000"
api_key = ""
If you'd like to use Codestral 2 instead of DeepSeek for your chat, replace the
model.chat.http entry with this:
[model.chat.http]
kind = "mistral/chat"
model_name = "codestral-2"
api_endpoint = "http://localhost:4000"
api_key = ""
Depending on your needs, you'll need to enable one or more of these models in your cloud project:
First, install Pixi and the
gcloud CLI.
Then, initialize your ADC for Google Cloud and set the environment variables:
gcloud auth application-default login
export GOOGLE_PROJECT_ID="your-cloud-project-id"
export GOOGLE_REGION="europe-west4" # or "us-central1", only used by Codestral 2Finally, start the proxy:
pixi run startThen start Tabby and go to http://localhost:8080/system to verify the models work.
To make things easier, you can use the provided docker image which already includes Tabby with the proxy.
First, follow the steps above to install gcloud and generate a token.
gcloud auth application-default login
mkdir secrets
cp "$HOME/.config/gcloud/application_default_credentials.json" "./secrets/google_adc.json"
Then, start docker. In this example, Tabby will be accessible on http://localhost:11000/
docker run \
--env GOOGLE_PROJECT_ID=your-cloud-project-id \
--env GOOGLE_REGION=europe-west4 \
--mount source=./secrets,destination=/run/secrets,type=bind \
--mount source=./data,destination=/root/.tabby,type=bind \
-p 11000:8080 \
ghcr.io/fstanis/vertex2tabby:latest