A modern, high-performance proxy that translates between different LLM API protocols. Make tools think they're talking to Ollama while actually proxying to OpenAI, Anthropic, OpenRouter, and more.
✨ Full Protocol Support - Translate between Ollama, OpenAI, Anthropic, OpenRouter, Azure, and more
⚡ Streaming Support - Real-time streaming for chat completions
🛠️ Tool Calls - Full OpenAI-style tool/function calling support
🔍 Model Queries - /api/tags, /v1/models, and other model endpoints
🛡️ No Environment Collisions - Explicit parameters only, no .env file conflicts
🎯 Zero Configuration - Run with a single command, no setup needed
📡 Async Architecture - High-performance async/await implementation
🎨 Beautiful CLI - Rich terminal output with helpful formatting
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install
git clone <repository>
cd llm-proxy
uv sync
uv run pip install -e .pip install llm-proxyllm-proxy serve \
--from ollama \
--to https://openrouter.ai/api/v1 \
--to-proto openrouter \
--model "openai/gpt-4" \
--key "your-openrouter-key" \
--port 11434Now use Ollama client:
OLLAMA_HOST=http://localhost:11434 ollama runllm-proxy serve \
--from openai \
--to https://api.anthropic.com \
--to-proto anthropic \
--model "claude-3-opus-20240229" \
--key "your-anthropic-key"Use with OpenAI SDK:
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="dummy" # Key is passed via --key parameter
)| Protocol | Source (Accepts) | Target (Proxies To) | Features |
|---|---|---|---|
| Ollama | ✅ | ❌ | Chat, Generate, Tools, Streaming |
| OpenAI | ✅ | ✅ | Chat, Completions, Tools, Streaming, Embeddings |
| Anthropic | ✅ | ✅ | Messages, Tools, Streaming |
| OpenRouter | ❌ | ✅ | Chat, Tools, Streaming |
| Azure | ❌ | ✅ | Chat, Tools, Streaming |
| Cohere | ❌ | ✅ | Chat, Generate |
| VertexAI | ❌ | ✅ | Chat, Tools |
| Bedrock | ❌ | ✅ | Chat, Tools |
llm-proxy --help
llm-proxy serve --helpllm-proxy quickstartllm-proxy protocols# Test with local Ollama instance
llm-proxy serve \
--from ollama \
--to http://localhost:11434 \
--to-proto ollama \
--model llama2 \
--port 11435llm-proxy serve \
--from ollama \
--to https://api.openai.com/v1 \
--to-proto openai \
--model gpt-4 \
--key "sk-..." \
--host 127.0.0.1 \
--port 8080Based on your --from protocol, different endpoints will be available:
POST /api/chat- Chat with tool calls supportPOST /api/generate- Text generationGET /api/tags- List available modelsGET /api/version- Version information
POST /v1/chat/completions- Chat completions with toolsPOST /v1/completions- Legacy completionsGET /v1/models- List modelsPOST /v1/embeddings- Embeddings (mock implementation)
POST /v1/messages- Anthropic messages APIGET /v1/models- List models
# List models
curl http://localhost:11434/api/tags
# Simple chat
curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "test",
"messages": [{"role": "user", "content": "Hello, how are you?"}],
"stream": false
}'curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer dummy" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "What is the weather in San Francisco?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
}],
"stream": true
}'# Ollama streaming
curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "test",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'
# OpenAI streaming
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'llm-proxy serve \
--from ollama \
--to https://api.openai.com/v1 \
--to-proto openai \
--model gpt-4 \
--key "sk-..." \
--timeout 120 # 2 minute timeoutllm-proxy serve \
--from ollama \
--to https://api.openai.com/v1 \
--to-proto openai \
--model gpt-4 \
--key "sk-..." \
--verbose # Enable debug logging- Client Request: Your tool sends a request in the source protocol (e.g., Ollama format)
- Protocol Translation: The proxy translates the request to the target protocol (e.g., OpenAI format)
- Forward Request: The translated request is sent to the target API
- Response Translation: The response is translated back to the source protocol
- Client Response: Your tool receives the response in the expected format
This tool is designed as a "hack tool" for quick prototyping and testing. We avoid .env files to prevent:
- Accidental key collisions
- Environment pollution
- Configuration drift between projects
- Surprising behavior from inherited environment variables
All configuration must be explicit via command-line arguments.
- Local Development: Test tools that expect Ollama with cloud models
- Protocol Migration: Gradually migrate from Ollama to OpenAI APIs
- Cost Testing: Compare different model providers without changing code
- Feature Testing: Test tool calling support with different backends
- Load Testing: Proxy to different endpoints for performance comparison
- Ensure the target URL is correct and accessible
- Check if you need to use HTTPS vs HTTP
- Verify API keys are correct
- Ensure your client supports Server-Sent Events (SSE)
- Check that
"stream": trueis in the request - For Ollama protocol, use ndjson format
- Verify the target model supports tool calls
- Check tool definition format matches OpenAI specification
- Ensure you're using the correct endpoint for the protocol
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests
- Submit a pull request
MIT License - see LICENSE file for details.
- Built with LiteLLM for protocol translation
- CLI powered by Typer
- Beautiful outputs with Rich
- Package management with uv
Note: This tool is for development and testing purposes. Always follow the terms of service for the APIs you're proxying to.