# OpenAI Compatible API

[txtai](https://github.com/neuml/txtai) is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

`txtai` has long been able host a [FastAPI based service](https://neuml.github.io/txtai/api/). There are clients for [Python](https://github.com/neuml/txtai.py), [JavaScript](https://github.com/neuml/txtai.js), [Java](https://github.com/neuml/txtai.java), [Rust](https://github.com/neuml/txtai.rs), [Go](https://github.com/neuml/txtai.go).

The API service also supports hosting OpenAI-compatible API endpoints. A standard OpenAI client can then be used to connect to a `txtai` service. This enables quickly trying `txtai` with a familiar-to-use client. It's also a way to do local/offline development testing using the OpenAI client.

This notebook will walk through comprehensive examples.

# Start API service

For this notebook, we'll run `txtai` through Docker.

Save the following to `/tmp/config/config.yml`.

## config.yml
```yaml
# Enable OpenAI compat endpoint
openai: True

# Load Wikipedia Embeddings index
cloud:
  provider: huggingface-hub
  container: neuml/txtai-wikipedia

# LLM instance
llm:
  path: llava-hf/llava-interleave-qwen-0.5b-hf

# RAG pipeline configuration
rag:
  path: hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
  output: flatten
  system: You are a friendly assistant. You answer questions from users.
  template: |
    Answer the following question using only the context below. Only include information
    specifically discussed.

    question: {question}
    context: {context}

# Text to Speech
texttospeech:
  path: neuml/kokoro-fp16-onnx

# Transcription
transcription:
  path: distil-whisper/distil-large-v3
```

Start Docker service.

```
docker run -it -p 8000:8000 -v /tmp/config:/config -e CONFIG=/config/config.yml \
--entrypoint uvicorn neuml/txtai-gpu --host 0.0.0.0 txtai.api:app
```

Alternatively, txtai can be directly installed and run as follows:

```
pip install txtai[all] autoawq autoawq-kernels
CONFIG=/tmp/config/config.yml uvicorn "txtai.api:app"
```

The API has token-based authorization built-in. [Read more on that here](https://neuml.github.io/txtai/api/customization/#dependencies).

# Run a text chat completion

The first example will run a text chat completion. The model is a RAG pipeline - this is more sophisticated than just a simple LLM call!

Agents, pipelines and workflows can all be run through this interface!

In [1]:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="api-key",
)

response = client.chat.completions.create(
    messages=[{
        "role": "user",
        "content": "Tell me about the iPhone",
    }],
    model="rag",
    stream=True
)

for chunk in response:
    if chunk.choices:
        print(chunk.choices[0].delta.content, end="")


The iPhone is a line of smartphones designed and marketed by Apple Inc. that uses Apple's iOS mobile operating system. The first-generation iPhone was announced by former Apple CEO Steve Jobs on January 9, 2007. 

Since then, Apple has annually released new iPhone models and iOS updates. The most recent models being the iPhone 16 and 16 Plus, and the higher-end iPhone 16 Pro and 16 Pro Max. 

More than 2.3 billion iPhones have been sold as of January 1, 2024, making Apple the largest vendor of mobile phones in 2023.

As mentioned above, the model supports much of what's available in `txtai`. For example, let's run a chat completion that runs an embeddings search.

In [2]:
response = client.chat.completions.create(
    messages=[{
        "role": "user",
        "content": "Tell me about the iPhone",
    }],
    model="embeddings",
)

print(response.choices[0].message.content)

The iPhone is a line of smartphones designed and marketed by Apple Inc. that uses Apple's iOS mobile operating system. The first-generation iPhone was announced by former Apple CEO Steve Jobs on January 9, 2007. Since then, Apple has annually released new iPhone models and iOS updates. iPhone naming has followed various patterns throughout its history.


# Vision models

Any supported `txtai` LLM can be run through the chat completion API. Let's run an example that describes an image.

In [3]:
response = client.chat.completions.create(
    model="llm",
    messages = [{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url", "image_url": {
                "url": "https://raw.githubusercontent.com/neuml/txtai/master/logo.png",
            }}
        ]}
    ]
)
print(response.choices[0].message.content)


The image shows a logo with the text "Txtai" in blue and green colors. The


# Embeddings API

Next let's generate embeddings. 

In [4]:
for x in client.embeddings.create(input="This is a test", model="vectors").data:
    print(x)

Embedding(embedding=[-0.01969079300761223, 0.024085085839033127, 0.0043829963542521, -0.027423616498708725, 0.040405914187431335, 0.017446696758270264, 0.028464825823903084, 0.000792442646343261, -0.03107883222401142, -0.024745089933276176, -0.013542148284614086, 0.039981111884117126, -0.01401221938431263, -0.011294773779809475, -0.04346214607357979, 0.015698621049523354, 0.03775031119585037, -0.009020405821502209, 0.046784739941358566, -0.017400527372956276, -0.0670166090130806, -0.05122058466076851, 0.027725063264369965, -0.023947732523083687, -0.044582683593034744, 0.04960233345627785, 0.029517438262701035, 0.05424104258418083, -0.06027599796652794, -0.035852570086717606, 0.01336587406694889, -0.008941668085753918, 0.00014064145216252655, -0.05230511724948883, -0.02150369994342327, 0.04969678074121475, -0.05967864394187927, -0.029450856149196625, -0.01113089732825756, -0.01256561279296875, -0.012282170355319977, 0.03466389700770378, -0.005313237197697163, -0.037443146109580994, -0.0

This uses the vector model associated with the current embeddings database.

# Text to Speech (TTS)

This API can do more than just work with text. Let's generate speech. 

In [5]:
from IPython.display import Audio, display

with client.audio.speech.with_streaming_response.create(
    model="neuml/kokoro-fp16-onnx",
    input="txtai is an all-in-one embeddings database for semantic search, LLM orchestration and semantic workflows",
    voice="bm_lewis",
) as response:
    response.stream_to_file(file="out.mp3")

display(Audio("out.mp3"))

# Transcription

The generated speech can also be transcribed back to text.

In [6]:
f = open("out.mp3", "rb")
client.audio.transcriptions.create(
    model="whisper",
    file=f
).text

"Text AI is an all in one embedding's database for semantic search, LLM orchestration and semantic workflows."

# JavaScript client

Given that this is an OpenAI-compatible API, other OpenAI clients are also supported. Let's try a few examples with the JavaScript client.

Install via `npm install openai`

In [7]:
%%writefile chat.js

import OpenAI from "openai";

const openai = new OpenAI({
    baseURL: "http://localhost:8000/v1",
    apiKey: "api-key"
});

async function main() {
    const stream = await openai.chat.completions.create({
        model: "rag",
        messages: [{ role: "user", content: "Tell me about the iPhone" }],
        stream: true,
    });

    for await (const chunk of stream) {
        process.stdout.write(chunk.choices[0]?.delta?.content || "");
    }
}

main();

Overwriting chat.js


In [8]:
!node --no-warnings chat.js

The iPhone is a line of smartphones designed and marketed by Apple Inc. that uses Apple's iOS mobile operating system. The first-generation iPhone was announced by former Apple CEO Steve Jobs on January 9, 2007. As of January 1, 2024, more than 2.3 billion iPhones have been sold, making Apple the largest vendor of mobile phones in 2023.

As we can see, this is the same output as we had earlier with the Python client.

Let's try generating speech.

In [None]:
%%writefile speech.js

import fs from "fs";
import path from "path";
import OpenAI from "openai";

const openai = new OpenAI({
    baseURL: "http://localhost:8000/v1",
    apiKey: "api-key"
});

const speechFile = path.resolve("./speech.mp3");

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: "neuml/kokoro-fp16-onnx",
    input: "txtai is an all-in-one embeddings database for semantic search, LLM orchestration and semantic workflows",
    voice: "bm_lewis",
  });

  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);
}

main();

Overwriting speech.js


In [None]:
!node --no-warnings speech.js

In [11]:
display(Audio("speech.mp3"))

Speech is the same as above, as expected.

In [12]:
%%writefile transcribe.js

import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI({
    baseURL: "http://localhost:8000/v1",
    apiKey: "api-key"
});

async function main() {
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream("speech.mp3"),
    model: "whisper",
  });

  console.log(transcription.text);
}

main();

Overwriting transcribe.js


In [13]:
!node --no-warnings transcribe.js

Text AI is an all in one embedding's database for semantic search, LLM orchestration and semantic workflows.


# Wrapping up

This notebook covered how to setup an OpenAI-compatible API endpoint for txtai. It enables quickly trying `txtai` with a familiar-to-use client. It's also a way to do local/offline development testing using the OpenAI client. Just another way to make it easier to use `txtai`!