llmshim

A blazing fast LLM API translation layer in pure Rust. One interface, every provider.

Benchmarks

p50 latency over 20 runs. Same prompt, same models, same machine.

Metric	llmshim	litellm	langchain
Anthropic (p50)	1,234ms	1,288ms	1,363ms
OpenAI (p50)	648ms	1,180ms	700ms
Streaming TTFT	1,065ms	1,658ms	1,623ms
Cold start	1,396ms	2,382ms	1,619ms
Memory (RSS)	8 MB	255 MB	255 MB

All three libraries hit the same APIs (Responses API for OpenAI, Messages API for Anthropic). llmshim adds ~2µs of translation overhead per request — the rest is network.

Run it yourself:

cargo run --release --example bench        # Rust (llmshim)
uv run --with litellm --with langchain-anthropic --with langchain-openai \
  python benchmarks/bench_python.py        # Python (litellm + langchain)

What it does

Send requests through llmshim → it translates to whichever provider you choose → translates the response back. Zero infrastructure, zero databases, ~5MB binary.

import llmshim

resp = llmshim.chat("claude-sonnet-4-6", "What is Rust?")
print(resp["message"]["content"])

Switch providers by changing the model string. Everything else stays the same.

Install

brew install sanjay920/tap/llmshim   # macOS
pip install llmshim                   # Python (any platform)
cargo install llmshim --features proxy # from source

Configure

import llmshim

# Set API keys once — persisted to ~/.llmshim/config.toml
llmshim.configure(
    anthropic="sk-ant-...",
    openai="sk-...",
    gemini="AIza...",
    xai="xai-...",
)

Or from the CLI: llmshim configure

Supported models

Provider	Models	Reasoning visible
OpenAI	`gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano`	Yes (summaries)
Anthropic	`claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5-20251001`	Yes (full thinking)
Google Gemini	`gemini-3.1-pro-preview`, `gemini-3-flash-preview`, `gemini-3.1-flash-lite-preview`	Yes (thought summaries)
xAI	`grok-4.20-multi-agent-beta-0309`, `grok-4.20-beta-0309-reasoning`, `grok-4.20-beta-0309-non-reasoning`, `grok-4-1-fast-reasoning`, `grok-4-1-fast-non-reasoning`	No (hidden)

Chat

import llmshim

# Simple
resp = llmshim.chat("claude-sonnet-4-6", "Hello!", max_tokens=500)
print(resp["message"]["content"])

# With message history
resp = llmshim.chat("gpt-5.4", [
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"},
], max_tokens=500)

Streaming

for event in llmshim.stream("claude-sonnet-4-6", "Write a poem"):
    if event["type"] == "content":
        print(event["text"], end="", flush=True)
    elif event["type"] == "reasoning":
        pass  # thinking tokens
    elif event["type"] == "usage":
        print(f"\n[↑{event['input_tokens']} ↓{event['output_tokens']}]")

Multi-model conversations

Switch models mid-conversation. History carries over.

messages = [{"role": "user", "content": "What is a closure?"}]

r1 = llmshim.chat("claude-sonnet-4-6", messages, max_tokens=500)
print(f"Claude: {r1['message']['content']}")

messages.append({"role": "assistant", "content": r1["message"]["content"]})
messages.append({"role": "user", "content": "Now explain differently."})

r2 = llmshim.chat("gpt-5.4", messages, max_tokens=500)
print(f"GPT: {r2['message']['content']}")

Tool use

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

resp = llmshim.chat("claude-sonnet-4-6", "Weather in Tokyo?", max_tokens=500, tools=tools)
for tc in resp["message"].get("tool_calls", []):
    print(f"{tc['function']['name']}({tc['function']['arguments']})")

Tools are accepted in OpenAI Chat Completions format and auto-translated to each provider's native format.

Reasoning / thinking

resp = llmshim.chat(
    "claude-sonnet-4-6",
    "Solve: x^2 - 5x + 6 = 0",
    max_tokens=4000,
    reasoning_effort="high",
)
print(resp["reasoning"])          # thinking content
print(resp["message"]["content"]) # answer

Fallback chains

resp = llmshim.chat(
    "anthropic/claude-sonnet-4-6",
    "Hello",
    max_tokens=100,
    fallback=["openai/gpt-5.4", "gemini/gemini-3-flash-preview"],
)

Proxy server

llmshim runs as an HTTP proxy with its own API spec. Any language can talk to it.

llmshim proxy
# Listening on http://localhost:3000

curl http://localhost:3000/v1/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"Hi"}],"config":{"max_tokens":100}}'

Method	Path	Description
`POST`	`/v1/chat`	Chat completion (or streaming with `stream: true`)
`POST`	`/v1/chat/stream`	Always-streaming SSE with typed events
`GET`	`/v1/models`	List available models
`GET`	`/health`	Health check

Full API spec: api/openapi.yaml

Docker

llmshim docker build
llmshim docker start
llmshim docker status
llmshim docker logs
llmshim docker stop

CLI

llmshim                     # show help
llmshim chat                # interactive multi-model chat
llmshim configure           # set API keys
llmshim set <key> <value>   # set a config value
llmshim list                # show configured keys
llmshim models              # list available models
llmshim proxy               # start HTTP proxy

How it works

No canonical struct. Requests flow as serde_json::Value — each provider maps only what it understands. Adding a provider = implementing one trait with three methods.

llmshim::completion(router, request)
  → router.resolve("anthropic/claude-sonnet-4-6")
  → provider.transform_request(model, &value)
  → HTTP
  → provider.transform_response(model, body)

Key features

Multi-model conversations — switch providers mid-chat, history carries over
Reasoning/thinking — visible chain-of-thought from OpenAI, Anthropic, and Gemini
Streaming — token-by-token with thinking in dim grey
Tool use — Chat Completions format auto-translated to each provider
Vision/images — send images in any format, auto-translated between providers
Fallback chains — automatic failover across providers with exponential backoff
Cross-provider translation — system messages, tool calls, and provider-specific fields all handled

Build & test

cargo build                                    # dev build
cargo build --release --features proxy         # release build
cargo test --features proxy --tests            # unit tests (~326)
cargo test --features proxy -- --ignored       # integration tests (needs API keys)

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github/workflows		.github/workflows
api		api
benchmarks		benchmarks
clients/python		clients/python
examples		examples
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmshim

Benchmarks

What it does

Install

Configure

Supported models

Chat

Streaming

Multi-model conversations

Tool use

Reasoning / thinking

Fallback chains

Proxy server

Docker

CLI

How it works

Key features

Build & test

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmshim

Benchmarks

What it does

Install

Configure

Supported models

Chat

Streaming

Multi-model conversations

Tool use

Reasoning / thinking

Fallback chains

Proxy server

Docker

CLI

How it works

Key features

Build & test

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages