A blazing fast LLM API translation layer in pure Rust. One interface, every provider.
p50 latency over 20 runs. Same prompt, same models, same machine.
| Metric | llmshim | litellm | langchain |
|---|---|---|---|
| Anthropic (p50) | 1,234ms | 1,288ms | 1,363ms |
| OpenAI (p50) | 648ms | 1,180ms | 700ms |
| Streaming TTFT | 1,065ms | 1,658ms | 1,623ms |
| Cold start | 1,396ms | 2,382ms | 1,619ms |
| Memory (RSS) | 8 MB | 255 MB | 255 MB |
All three libraries hit the same APIs (Responses API for OpenAI, Messages API for Anthropic). llmshim adds ~2µs of translation overhead per request — the rest is network.
Run it yourself:
cargo run --release --example bench # Rust (llmshim)
uv run --with litellm --with langchain-anthropic --with langchain-openai \
python benchmarks/bench_python.py # Python (litellm + langchain)Send requests through llmshim → it translates to whichever provider you choose → translates the response back. Zero infrastructure, zero databases, ~5MB binary.
import llmshim
resp = llmshim.chat("claude-sonnet-4-6", "What is Rust?")
print(resp["message"]["content"])Switch providers by changing the model string. Everything else stays the same.
brew install sanjay920/tap/llmshim # macOS
pip install llmshim # Python (any platform)
cargo install llmshim --features proxy # from sourceimport llmshim
# Set API keys once — persisted to ~/.llmshim/config.toml
llmshim.configure(
anthropic="sk-ant-...",
openai="sk-...",
gemini="AIza...",
xai="xai-...",
)Or from the CLI: llmshim configure
| Provider | Models | Reasoning visible |
|---|---|---|
| OpenAI | gpt-5.4, gpt-5.4-mini, gpt-5.4-nano |
Yes (summaries) |
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 |
Yes (full thinking) |
| Google Gemini | gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview |
Yes (thought summaries) |
| xAI | grok-4.20-multi-agent-beta-0309, grok-4.20-beta-0309-reasoning, grok-4.20-beta-0309-non-reasoning, grok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning |
No (hidden) |
import llmshim
# Simple
resp = llmshim.chat("claude-sonnet-4-6", "Hello!", max_tokens=500)
print(resp["message"]["content"])
# With message history
resp = llmshim.chat("gpt-5.4", [
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"},
], max_tokens=500)for event in llmshim.stream("claude-sonnet-4-6", "Write a poem"):
if event["type"] == "content":
print(event["text"], end="", flush=True)
elif event["type"] == "reasoning":
pass # thinking tokens
elif event["type"] == "usage":
print(f"\n[↑{event['input_tokens']} ↓{event['output_tokens']}]")Switch models mid-conversation. History carries over.
messages = [{"role": "user", "content": "What is a closure?"}]
r1 = llmshim.chat("claude-sonnet-4-6", messages, max_tokens=500)
print(f"Claude: {r1['message']['content']}")
messages.append({"role": "assistant", "content": r1["message"]["content"]})
messages.append({"role": "user", "content": "Now explain differently."})
r2 = llmshim.chat("gpt-5.4", messages, max_tokens=500)
print(f"GPT: {r2['message']['content']}")tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
resp = llmshim.chat("claude-sonnet-4-6", "Weather in Tokyo?", max_tokens=500, tools=tools)
for tc in resp["message"].get("tool_calls", []):
print(f"{tc['function']['name']}({tc['function']['arguments']})")Tools are accepted in OpenAI Chat Completions format and auto-translated to each provider's native format.
resp = llmshim.chat(
"claude-sonnet-4-6",
"Solve: x^2 - 5x + 6 = 0",
max_tokens=4000,
reasoning_effort="high",
)
print(resp["reasoning"]) # thinking content
print(resp["message"]["content"]) # answerresp = llmshim.chat(
"anthropic/claude-sonnet-4-6",
"Hello",
max_tokens=100,
fallback=["openai/gpt-5.4", "gemini/gemini-3-flash-preview"],
)llmshim runs as an HTTP proxy with its own API spec. Any language can talk to it.
llmshim proxy
# Listening on http://localhost:3000curl http://localhost:3000/v1/chat \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"Hi"}],"config":{"max_tokens":100}}'| Method | Path | Description |
|---|---|---|
POST |
/v1/chat |
Chat completion (or streaming with stream: true) |
POST |
/v1/chat/stream |
Always-streaming SSE with typed events |
GET |
/v1/models |
List available models |
GET |
/health |
Health check |
Full API spec: api/openapi.yaml
llmshim docker build
llmshim docker start
llmshim docker status
llmshim docker logs
llmshim docker stopllmshim # show help
llmshim chat # interactive multi-model chat
llmshim configure # set API keys
llmshim set <key> <value> # set a config value
llmshim list # show configured keys
llmshim models # list available models
llmshim proxy # start HTTP proxyNo canonical struct. Requests flow as serde_json::Value — each provider maps only what it understands. Adding a provider = implementing one trait with three methods.
llmshim::completion(router, request)
→ router.resolve("anthropic/claude-sonnet-4-6")
→ provider.transform_request(model, &value)
→ HTTP
→ provider.transform_response(model, body)
- Multi-model conversations — switch providers mid-chat, history carries over
- Reasoning/thinking — visible chain-of-thought from OpenAI, Anthropic, and Gemini
- Streaming — token-by-token with thinking in dim grey
- Tool use — Chat Completions format auto-translated to each provider
- Vision/images — send images in any format, auto-translated between providers
- Fallback chains — automatic failover across providers with exponential backoff
- Cross-provider translation — system messages, tool calls, and provider-specific fields all handled
cargo build # dev build
cargo build --release --features proxy # release build
cargo test --features proxy --tests # unit tests (~326)
cargo test --features proxy -- --ignored # integration tests (needs API keys)