In [None]:
# Install the OpenAI Python package
%pip install openai

## Basic Usage

API keys are not used when using the RAPID framework. For the OpenAI SDK, the API key is a necessary parameter for initializing the OpenAI class instance so any string will do. The endpoint URL is where the router service is hosted. In this example, it assumes it's being run locally and accessible on port 8000. Change the URI as/if necessary.

In [None]:
import os
from openai import AzureOpenAI

AZURE_OAI_API_KEY = "NOT_NEEDED_FOR_RAPID_EXAMPLE"
azure_oai_api_version = "2024-02-01"
azure_oai_endpoint = "http://localhost:8000/api/v1/dev/apirouter/lb/"

# The appId for the RAPID route that points to the OpenAI deployment
rapid_app_id = "chatbot-gpt4o"

client = AzureOpenAI(
    api_key=AZURE_OAI_API_KEY,
    api_version=azure_oai_api_version,
    azure_endpoint = azure_oai_endpoint
    )

Instead of the Azure OpenAI model deployment name, the app Id of the available routers is used. Azure API Management gateway will route the app to the corresponding defined Azure OpenAI model and also apply any of the enabled features in RAPID (e.g., caching).

In [None]:
response = client.chat.completions.create(
    model = rapid_app_id,
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
        {"role": "assistant",
            "content": "Yes, customer managed keys are supported by Azure OpenAI."},
        {"role": "user", "content": "Do other Azure AI services support this too?"}
    ]
)

print(f"Assistant response: {response.choices[0].message.content}")

## Using State / Chat History

The state management feature must be enabled and configured correctly for this to work. Conversation history is stored in a database and processed by the framework based on the `x-thread-id` header.

In [None]:
thread_id = None

# Use the with_raw_response method to get the headers to extract the thread ID
response = client.chat.completions.with_raw_response.create(
    model = rapid_app_id,
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
    ]
)

thread_id = response.headers.get("x-thread-id")
completion = response.parse()
if thread_id:
    print(f"Thread ID: {thread_id}\nAssistant response: {completion.choices[0].message.content}")
else:
    print(f"Assistant response: {completion.choices[0].message.content}")

With a new thread Id generated, you can reference this header value in future request for preserving chat history as part of the current conversation session and reference that history as additional context to the LLM.

_Note:_ To start a new session, remove the `x-thread-id` header in your request.

In [None]:
thread_header = {"x-thread-id": thread_id}

response2 = client.chat.completions.create(
    model = rapid_app_id,
    messages = [
        {"role": "user", "content": "Do other Azure AI services support this too?"},
    ],
    extra_headers = thread_header
)

print(f"Assistant response: {response2.choices[0].message.content}")