llmai

llmai is a Python library for working with OpenAI, Azure OpenAI, Vertex AI, Anthropic, Google Gemini, DeepSeek, Bedrock, and ChatGPT through a shared set of message, tool, schema, and response primitives.

Today the repository includes adapters for:

ChatGPT
OpenAI
Azure OpenAI
Vertex AI
DeepSeek
Anthropic
Google Gemini
Amazon Bedrock

Each provider client exposes the same core entrypoint:

generate(..., stream=False)

Why This Exists

Provider SDKs differ in how they represent messages, tool calls, structured output, and streaming events. llmai smooths those differences out so application code can stay closer to one mental model.

Installation

Install the project locally with uv:

uv sync

Or install it in editable mode with pip:

pip install -e .

Quick Start

from llmai import OpenAIClient
from llmai.shared import UserMessage

client = OpenAIClient(api_key="OPENAI_API_KEY")

result = client.generate(
    model="your-openai-model",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)
print(result.usage)
print(result.duration_seconds)

For text-only prompts, UserMessage(content="...") is the simplest form. You can also pass explicit content parts like TextContentPart when you need mixed multimodal input or tighter control over message structure.

If you want to swap providers, the overall call shape stays the same. In most cases you only need to change the client class, credentials, and model name.

Azure OpenAI

from llmai import AzureOpenAIClient
from llmai.shared import UserMessage


client = AzureOpenAIClient(
    api_key="AZURE_OPENAI_API_KEY",
    endpoint="https://your-resource.openai.azure.com",
    api_version="2024-10-21",
)

result = client.generate(
    model="your-azure-deployment",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

AzureOpenAIClient uses the official OpenAI SDK's Azure client and supports either API-key auth or Entra token auth. It reads AZURE_OPENAI_API_KEY or AZURE_OPENAI_AD_TOKEN, AZURE_OPENAI_ENDPOINT or AZURE_OPENAI_BASE_URL, AZURE_OPENAI_API_VERSION or OPENAI_API_VERSION, and optional AZURE_OPENAI_DEPLOYMENT by default.

Vertex AI

from llmai import VertexAIClient
from llmai.shared import UserMessage


client = VertexAIClient(
    project="your-gcp-project",
    location="us-central1",
)

result = client.generate(
    model="gemini-2.5-flash",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

VertexAIClient uses the google-genai Vertex AI path internally. It supports ADC or explicit credentials, and also accepts provider-specific VERTEX_PROJECT, VERTEX_LOCATION, and VERTEX_API_KEY envs while still allowing the upstream SDK's standard GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION, and Google API-key env handling.

ChatGPT

from llmai import ChatGPTClient
from llmai.shared import UserMessage


client = ChatGPTClient(access_token="CHATGPT_ACCESS_TOKEN")

result = client.generate(
    model="chatgpt-4o-latest",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

ChatGPTClient targets ChatGPT's Codex backend at https://chatgpt.com/backend-api/codex. It always uses the Responses API internally, and reads CHATGPT_ACCESS_TOKEN or CODEX_ACCESS_TOKEN by default, with optional CHATGPT_ACCOUNT_ID or CODEX_ACCOUNT_ID.

DeepSeek

from llmai import DeepSeekClient
from llmai.shared import JSONSchemaResponse, UserMessage


client = DeepSeekClient()

result = client.generate(
    model="deepseek-chat",
    messages=[
        UserMessage(content="Return a JSON object with one field named answer."),
    ],
    response_format=JSONSchemaResponse(
        name="final_answer",
        json_schema={
            "type": "object",
            "properties": {
                "answer": {"type": "string"},
            },
            "required": ["answer"],
        },
    ),
)

print(result.content)

DeepSeekClient uses the OpenAI SDK against DeepSeek's OpenAI-compatible API and reads DEEPSEEK_API_KEY by default. For structured output, it always uses an internal function-tool schema because DeepSeek does not support response_format={"type":"json_schema"}. During streaming, the internal response tool is surfaced as incremental JSON content chunks, and the stream still ends with parsed JSON on the final completion chunk's content. If you need DeepSeek's server-side strict tool enforcement, point base_url at https://api.deepseek.com/beta.

Amazon Bedrock

from llmai import BedrockClient
from llmai.shared import UserMessage


client = BedrockClient(
    region="us-east-1",
    aws_access_key_id="AWS_ACCESS_KEY_ID",
    aws_secret_access_key="AWS_SECRET_ACCESS_KEY",
)

# Or use Bedrock API-key auth:
# client = BedrockClient(region="us-east-1", api_key="BEDROCK_API_KEY")

result = client.generate(
    model="us.anthropic.claude-3-5-haiku-20241022-v1:0",
    messages=[
        UserMessage(content="Say hello."),
    ],
)

print(result.content)

Structured Output

from pydantic import BaseModel

from llmai import GoogleClient
from llmai.shared import JSONSchemaResponse, UserMessage


class Summary(BaseModel):
    title: str
    bullets: list[str]


client = GoogleClient(api_key="GOOGLE_API_KEY")

result = client.generate(
    model="your-google-model",
    messages=[
        UserMessage(content="Summarize retrieval-augmented generation in simple terms."),
    ],
    response_format=JSONSchemaResponse(json_schema=Summary),
)

print(result.content)

Use JSONSchemaResponse, JSONObjectResponse, or TextResponse to request different response shapes.

Multimodal Content

from llmai import GoogleClient
from llmai.shared import ImageContentPart, TextContentPart, UserMessage


client = GoogleClient(api_key="GOOGLE_API_KEY")

result = client.generate(
    model="your-google-model",
    messages=[
        UserMessage(
            content=[
                TextContentPart(text="Describe this image."),
                ImageContentPart(url="https://example.com/cat.png"),
            ]
        ),
    ],
)

print(result.content)
print(result.thinking)

Use explicit content parts when you need multimodal inputs or want to mix text with images in one message. Normal completion content is surfaced as list[TextContentPart | ImageContentPart] when the provider returns message content, including text-only replies. Reasoning is exposed on ResponseContent.thinking as list[str] when the provider returns one or more thinking blocks, and the same value is also available on the final AssistantMessage.

Tool Calling

from pydantic import BaseModel

from llmai import OpenAIClient
from llmai.shared import Tool, ToolResponseMessage, UserMessage


class WeatherArgs(BaseModel):
    city: str


weather_tool = Tool(
    name="get_weather",
    description="Look up the weather for a city.",
    schema=WeatherArgs,
)

client = OpenAIClient(api_key="OPENAI_API_KEY")

first = client.generate(
    model="your-openai-model",
    messages=[
        UserMessage(content="What is the weather in Kathmandu?"),
    ],
    tools=[weather_tool],
    tool_choice={"tools": ["get_weather"]},
)

for tool_call in first.tool_calls:
    if tool_call.name != "get_weather":
        continue

    follow_up = client.generate(
        model="your-openai-model",
        messages=[
            *first.messages,
            ToolResponseMessage(
                id=tool_call.id,
                content=["It is sunny in Kathmandu."],
            ),
        ],
        tools=[weather_tool],
    )
    print(follow_up.content)

llmai returns tool calls in first.tool_calls and leaves execution to the caller.

Hosted Web Search

llmai also supports a provider-hosted web search tool that is not a function tool:

from llmai import OpenAIClient
from llmai.shared import UserMessage, WebSearchTool

client = OpenAIClient(api_key="OPENAI_API_KEY")

result = client.generate(
    model="your-openai-model",
    messages=[
        UserMessage(content="What was a positive news story from today? Cite sources."),
    ],
    tools=[WebSearchTool()],
    api_type="responses",
)

print(result.content)
print(result.thinking)

You can also target it explicitly in tool_choice:

tool_choice = {
    "mode": "required",
    "tools": ["web_search"],
}

Current llmai behavior:

OpenAI Responses: attaches built-in web_search
Azure OpenAI: follows the same OpenAI adapter surface; service support depends on your Azure API version and deployment
Vertex AI: attaches google_search
ChatGPT/Codex: attaches built-in web_search
Anthropic: attaches Anthropic's hosted web-search tool
Google Gemini: attaches google_search
OpenAI Chat Completions: ignores hosted web_search
DeepSeek: ignores hosted web_search
Amazon Bedrock: ignores hosted web_search

web_search can be mixed with normal function tools in the same request.

Streaming

from llmai import AnthropicClient
from llmai.shared import UserMessage

client = AnthropicClient(api_key="ANTHROPIC_API_KEY")

for chunk in client.generate(
    model="your-anthropic-model",
    messages=[
        UserMessage(content="Explain recursion in one paragraph."),
    ],
    stream=True,
):
    if chunk.type == "content":
        print(chunk.chunk, end="")
    elif chunk.type == "completion":
        print("\nDone:", chunk.usage)

generate(..., stream=True) yields marker chunks with type="event" and event="start" / event="end" around each content, thinking, and tool section. If a provider returns multiple reasoning blocks, each block gets its own thinking start/end pair. The final chunk has type="completion" and includes top-level content, thinking, usage, and accumulated messages.

Package Layout

llmai/openai: OpenAI adapter
llmai/azure: Azure OpenAI adapter
llmai/vertex: Vertex AI adapter
llmai/deepseek: DeepSeek adapter
llmai/anthropic: Anthropic adapter
llmai/google: Google Gemini adapter
llmai/bedrock: Amazon Bedrock adapter
llmai/shared: common message, tool, schema, and response models

Core Types

The shared layer includes the main primitives you will use across providers:

UserMessage, SystemMessage, AssistantMessage
TextContentPart, ImageContentPart
Tool, WebSearchTool, ToolResponseMessage
JSONSchemaResponse, JSONObjectResponse, TextResponse
ResponseContent, ResponseStreamChunk, ResponseStreamContentChunk, ResponseStreamThinkingChunk, ResponseStreamToolChunk, ResponseStreamToolCompleteChunk, ResponseStreamCompletionChunk
ResponseUsage

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
dev		dev
llmai		llmai
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmai

Why This Exists

Installation

Quick Start

Azure OpenAI

Vertex AI

ChatGPT

DeepSeek

Amazon Bedrock

Structured Output

Multimodal Content

Tool Calling

Hosted Web Search

Streaming

Package Layout

Core Types

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

llmai

Why This Exists

Installation

Quick Start

Azure OpenAI

Vertex AI

ChatGPT

DeepSeek

Amazon Bedrock

Structured Output

Multimodal Content

Tool Calling

Hosted Web Search

Streaming

Package Layout

Core Types

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages