LLM Protocol Proxy 🚀

A modern, high-performance proxy that translates between different LLM API protocols. Make tools think they're talking to Ollama while actually proxying to OpenAI, Anthropic, OpenRouter, and more.

Features

✨ Full Protocol Support - Translate between Ollama, OpenAI, Anthropic, OpenRouter, Azure, and more
⚡ Streaming Support - Real-time streaming for chat completions
🛠️ Tool Calls - Full OpenAI-style tool/function calling support
🔍 Model Queries - /api/tags, /v1/models, and other model endpoints
🛡️ No Environment Collisions - Explicit parameters only, no .env file conflicts
🎯 Zero Configuration - Run with a single command, no setup needed
📡 Async Architecture - High-performance async/await implementation
🎨 Beautiful CLI - Rich terminal output with helpful formatting

Installation

Using `uv` (Recommended)

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install
git clone <repository>
cd llm-proxy
uv sync
uv run pip install -e .

Using pip

pip install llm-proxy

Quick Start

Ollama → OpenRouter (Most Common Use Case)

llm-proxy serve \
  --from ollama \
  --to https://openrouter.ai/api/v1 \
  --to-proto openrouter \
  --model "openai/gpt-4" \
  --key "your-openrouter-key" \
  --port 11434

Now use Ollama client:

OLLAMA_HOST=http://localhost:11434 ollama run

OpenAI → Anthropic

llm-proxy serve \
  --from openai \
  --to https://api.anthropic.com \
  --to-proto anthropic \
  --model "claude-3-opus-20240229" \
  --key "your-anthropic-key"

Use with OpenAI SDK:

import openai
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy"  # Key is passed via --key parameter
)

Supported Protocols

Protocol	Source (Accepts)	Target (Proxies To)	Features
Ollama	✅	❌	Chat, Generate, Tools, Streaming
OpenAI	✅	✅	Chat, Completions, Tools, Streaming, Embeddings
Anthropic	✅	✅	Messages, Tools, Streaming
OpenRouter	❌	✅	Chat, Tools, Streaming
Azure	❌	✅	Chat, Tools, Streaming
Cohere	❌	✅	Chat, Generate
VertexAI	❌	✅	Chat, Tools
Bedrock	❌	✅	Chat, Tools

Usage Examples

Show Help

llm-proxy --help
llm-proxy serve --help

Quickstart Guide

llm-proxy quickstart

List Protocols

llm-proxy protocols

Local Testing

# Test with local Ollama instance
llm-proxy serve \
  --from ollama \
  --to http://localhost:11434 \
  --to-proto ollama \
  --model llama2 \
  --port 11435

Custom Host and Port

llm-proxy serve \
  --from ollama \
  --to https://api.openai.com/v1 \
  --to-proto openai \
  --model gpt-4 \
  --key "sk-..." \
  --host 127.0.0.1 \
  --port 8080

API Endpoints

Based on your --from protocol, different endpoints will be available:

Ollama Protocol

POST /api/chat - Chat with tool calls support
POST /api/generate - Text generation
GET /api/tags - List available models
GET /api/version - Version information

OpenAI Protocol

POST /v1/chat/completions - Chat completions with tools
POST /v1/completions - Legacy completions
GET /v1/models - List models
POST /v1/embeddings - Embeddings (mock implementation)

Anthropic Protocol

POST /v1/messages - Anthropic messages API
GET /v1/models - List models

Testing with cURL

Test Ollama Endpoints

# List models
curl http://localhost:11434/api/tags

# Simple chat
curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "test",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "stream": false
  }'

Test with Tool Calls

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer dummy" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "What is the weather in San Francisco?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather in a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          }
        }
      }
    }],
    "stream": true
  }'

Test Streaming

# Ollama streaming
curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "test",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

# OpenAI streaming
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Advanced Configuration

Timeout Settings

llm-proxy serve \
  --from ollama \
  --to https://api.openai.com/v1 \
  --to-proto openai \
  --model gpt-4 \
  --key "sk-..." \
  --timeout 120  # 2 minute timeout

Verbose Logging

llm-proxy serve \
  --from ollama \
  --to https://api.openai.com/v1 \
  --to-proto openai \
  --model gpt-4 \
  --key "sk-..." \
  --verbose  # Enable debug logging

How It Works

Client Request: Your tool sends a request in the source protocol (e.g., Ollama format)
Protocol Translation: The proxy translates the request to the target protocol (e.g., OpenAI format)
Forward Request: The translated request is sent to the target API
Response Translation: The response is translated back to the source protocol
Client Response: Your tool receives the response in the expected format

Why No `.env` Files?

This tool is designed as a "hack tool" for quick prototyping and testing. We avoid .env files to prevent:

Accidental key collisions
Environment pollution
Configuration drift between projects
Surprising behavior from inherited environment variables

All configuration must be explicit via command-line arguments.

Common Use Cases

Local Development: Test tools that expect Ollama with cloud models
Protocol Migration: Gradually migrate from Ollama to OpenAI APIs
Cost Testing: Compare different model providers without changing code
Feature Testing: Test tool calling support with different backends
Load Testing: Proxy to different endpoints for performance comparison

Troubleshooting

"Connection refused" errors

Ensure the target URL is correct and accessible
Check if you need to use HTTPS vs HTTP
Verify API keys are correct

Streaming not working

Ensure your client supports Server-Sent Events (SSE)
Check that "stream": true is in the request
For Ollama protocol, use ndjson format

Tool calls not working

Verify the target model supports tool calls
Check tool definition format matches OpenAI specification
Ensure you're using the correct endpoint for the protocol

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests
Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

Built with LiteLLM for protocol translation
CLI powered by Typer
Beautiful outputs with Rich
Package management with uv

Note: This tool is for development and testing purposes. Always follow the terms of service for the APIs you're proxying to.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/llm_proxy		src/llm_proxy
.gitignore		.gitignore
LICENSE.MIT		LICENSE.MIT
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

LLM Protocol Proxy 🚀

Features

Installation

Using uv (Recommended)

Using pip

Quick Start

Ollama → OpenRouter (Most Common Use Case)

OpenAI → Anthropic

Supported Protocols

Usage Examples

Show Help

Quickstart Guide

List Protocols

Local Testing

Custom Host and Port

API Endpoints

Ollama Protocol

OpenAI Protocol

Anthropic Protocol

Testing with cURL

Test Ollama Endpoints

Test with Tool Calls

Test Streaming

Advanced Configuration

Timeout Settings

Verbose Logging

How It Works

Why No .env Files?

Common Use Cases

Troubleshooting

"Connection refused" errors

Streaming not working

Tool calls not working

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using `uv` (Recommended)

Why No `.env` Files?

Packages