This project is a drop-in replacement REST API for Vertex AI that is compatible with the OpenAI API specifications.
Examples:
Chat with Bard in Chatbot UI | Get help from Bard in VSCode |
---|---|
![]() |
![]() |
This project is inspired by the idea of LocalAI but with the focus on making Google Cloud Platform Vertex AI PaLM more accessible to anyone.
A Google Cloud Run service is installed that translates the OpenAI API calls to Vertex AI (PaLM).
Supported OpenAI API services:
OpenAI | API | Supported |
---|---|---|
List models | /v1/models |
β |
Chat Completions | /v1/chat/completions |
β |
Completions (Legacy) | /v1/completions |
β |
Embeddings | /v1/embeddings |
β |
The software is developed in Python and based on FastAPI and LangChain.
Everything is designed to be very simple, so you can easily adjust the source code to your individual needs.
Note You can find an example of customization in the
cologne
branch.
A Jupyter notebook Vertex_AI_Chat.ipynb
with step-by-step instructions is prepared.
It will help you to deploy the API backend and Chatbot UI frontend as Google Cloud Run service.
Requirements:
Your user (the one used for deployment) must have proper permissions in the project. For a fast and hassle-free deployemnt the "Owner" role is recommended.
In addition, the default compute service account ([PROJECT_NR]-compute@developer.gserviceaccount.com
)
must have the role "Role Vertex AI User" (roles/aiplatform.user
).
Authenticate:
gcloud auth login
Set default project:
gcloud config set project [PROJECT_ID]
Run the following script to create a container image and deploy that container as a public API (which allows unauthenticated calls) in Google Cloud Run:
bash deploy.sh
Note: You can change the OpenAI API key and Google Cloud region with environment variables:
export OPENAI_API_KEY="sk-XYZ" export GOOGLE_CLOUD_LOCATION="europe-west1" bash deploy.sh
The software is tested on Python 3.11. You should create a virtual environment with the version of Python you want to use, and activate it before proceeding.
python3 -m venv tvenv
source tvenv/bin/activate
You also need the Google Cloud CLI.
The Google Cloud CLI includes the gcloud
command-line tool.
Install requirements:
pip install -r requirements.txt
Authenticate:
gcloud auth application-default login
Set default project:
gcloud config set project [PROJECT_ID]
gcloud auth application-default set-quota-project [PROJECT_ID]
Run with default model:
export DEBUG="True"
export OPENAI_API_KEY="sk-XYZ"
uvicorn vertex:app --reload
Or run with codechat-bison-32k
32k model:
export DEBUG="True"
export OPENAI_API_KEY="sk-XYZ"
export MODEL_NAME="codechat-bison-32k"
export MAX_OUTPUT_TOKENS="16000"
uvicorn vertex:app --reload
The application will now be running on your local computer. You can access it by opening a web browser and navigating to the following address:
http://localhost:8000/
HTTP request and response formats are consistent with the OpenAI API.
For example, to generate a chat completion, you can send a POST request to the /v1/chat/completions
endpoint with the instruction as the request body:
curl --location 'http://localhost:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-XYZ' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Say this is a test!"
}
]
}'
Response:
{
"id": "cmpl-efccdeb3d2a6cfe144fdde11",
"created": 1691577522,
"object": "chat.completion",
"model": "gpt-3.5-turbo",
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
},
"choices": [
{
"message": {
"role": "assistant",
"content": "Sure, this is a test."
},
"finish_reason": "stop",
"index": 0
}
]
}
The configuration of the software can be done with environment variables.
The following variables with default values exist:
Variable | Default | Description |
---|---|---|
DEBUG | False | Show debug messages that help during development. |
GOOGLE_CLOUD_LOCATION | us-central1 | Google Cloud Platform region for API calls. |
GOOGLE_CLOUD_PROJECT_ID | [DEFAULT_AUTH_PROJECT] | Identifier for your project. If not specified, the project of authentication is used. |
HOST | 0.0.0.0 | Bind socket to this host. |
MAX_OUTPUT_TOKENS | 512 | Token limit determines the maximum amount of text output from one prompt. Can be overridden by the end user as required by the OpenAI API specification. |
MODEL_NAME | chat-bison | One of the foundation models that are available in Vertex AI. |
OPENAI_API_KEY | sk-[RANDOM_HEX] | Key used for authentication against the application. |
PORT | 8000 | Bind socket to this port. |
TEMPERATURE | 0.2 | Sampling temperature, it controls the degree of randomness in token selection. Can be overridden by the end user as required by the OpenAI API specification. |
TOP_K | 40 | How the model selects tokens for output, the next token is selected from. |
TOP_P | 0.8 | Tokens are selected from most probable to least until the sum of their. Can be overridden by the end user as required by the OpenAI API specification. |
If your application uses client libraries provided by OpenAI,
you only need to modify the OPENAI_API_BASE
environment variable to match your Google Cloud Run endpoint URL:
export OPENAI_API_BASE="https://https://openai-api-vertex-XYZ.a.run.app/v1"
python your_openai_app.py
When deploying the Chatbot UI application, the following environment variables must be set:
Variable | Value |
---|---|
OPENAI_API_KEY | API key generated during deployment |
OPENAI_API_HOST | Google Cloud Run URL |
Run the following script to create a container image from the GitHub source code and deploy that container as a public website (which allows unauthenticated calls) in Google Cloud Run:
export OPENAI_API_KEY="sk-XYZ"
export OPENAI_API_HOST="https://openai-api-vertex-XYZ.a.run.app"
bash chatbot-ui.sh
Set the following Chatbox settings:
Setting | Value |
---|---|
AI Provider | OpenAI API |
OpenAI API Key | API key generated during deployment |
API Host | Google Cloud Run URL |
The VSCode-OpenAI extension is a powerful and versatile tool designed to integrate OpenAI features seamlessly into your code editor.
To activate the setup, you have two options:
- either use the command "vscode-openai.configuration.show.quickpick" or
- access it through the vscode-openai Status Bar located at the bottom left corner of VSCode.
Select openai.com
and enter the Google Cloud Run URL with /v1
during setup.
When deploying the Discord Bot application, the following environment variables must be set:
Variable | Value |
---|---|
OPENAI_API_KEY | API key generated during deployment |
OPENAI_API_BASE | Google Cloud Run URL with /v1 |
When deploying the ChatGPT in Slack application, the following environment variables must be set:
Variable | Value |
---|---|
OPENAI_API_KEY | API key generated during deployment |
OPENAI_API_BASE | Google Cloud Run URL with /v1 |
When deploying the ChatGPT Telegram Bot application, the following environment variables must be set:
Variable | Value |
---|---|
OPENAI_API_KEY | API key generated during deployment |
OPENAI_API_BASE | Google Cloud Run URL with /v1 |
Have a patch that will benefit this project? Awesome! Follow these steps to have it accepted.
- Please read how to contribute.
- Fork this Git repository and make your changes.
- Create a Pull Request.
- Incorporate review feedback to your changes.
- Accepted!
All files in this repository are under the Apache License, Version 2.0 unless noted otherwise.