Skip to content

kristopolous/llm-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Protocol Proxy 🚀

A modern, high-performance proxy that translates between different LLM API protocols. Make tools think they're talking to Ollama while actually proxying to OpenAI, Anthropic, OpenRouter, and more.

Features

Full Protocol Support - Translate between Ollama, OpenAI, Anthropic, OpenRouter, Azure, and more
Streaming Support - Real-time streaming for chat completions
🛠️ Tool Calls - Full OpenAI-style tool/function calling support
🔍 Model Queries - /api/tags, /v1/models, and other model endpoints
🛡️ No Environment Collisions - Explicit parameters only, no .env file conflicts
🎯 Zero Configuration - Run with a single command, no setup needed
📡 Async Architecture - High-performance async/await implementation
🎨 Beautiful CLI - Rich terminal output with helpful formatting

Installation

Using uv (Recommended)

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install
git clone <repository>
cd llm-proxy
uv sync
uv run pip install -e .

Using pip

pip install llm-proxy

Quick Start

Ollama → OpenRouter (Most Common Use Case)

llm-proxy serve \
  --from ollama \
  --to https://openrouter.ai/api/v1 \
  --to-proto openrouter \
  --model "openai/gpt-4" \
  --key "your-openrouter-key" \
  --port 11434

Now use Ollama client:

OLLAMA_HOST=http://localhost:11434 ollama run

OpenAI → Anthropic

llm-proxy serve \
  --from openai \
  --to https://api.anthropic.com \
  --to-proto anthropic \
  --model "claude-3-opus-20240229" \
  --key "your-anthropic-key"

Use with OpenAI SDK:

import openai
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy"  # Key is passed via --key parameter
)

Supported Protocols

Protocol Source (Accepts) Target (Proxies To) Features
Ollama Chat, Generate, Tools, Streaming
OpenAI Chat, Completions, Tools, Streaming, Embeddings
Anthropic Messages, Tools, Streaming
OpenRouter Chat, Tools, Streaming
Azure Chat, Tools, Streaming
Cohere Chat, Generate
VertexAI Chat, Tools
Bedrock Chat, Tools

Usage Examples

Show Help

llm-proxy --help
llm-proxy serve --help

Quickstart Guide

llm-proxy quickstart

List Protocols

llm-proxy protocols

Local Testing

# Test with local Ollama instance
llm-proxy serve \
  --from ollama \
  --to http://localhost:11434 \
  --to-proto ollama \
  --model llama2 \
  --port 11435

Custom Host and Port

llm-proxy serve \
  --from ollama \
  --to https://api.openai.com/v1 \
  --to-proto openai \
  --model gpt-4 \
  --key "sk-..." \
  --host 127.0.0.1 \
  --port 8080

API Endpoints

Based on your --from protocol, different endpoints will be available:

Ollama Protocol

  • POST /api/chat - Chat with tool calls support
  • POST /api/generate - Text generation
  • GET /api/tags - List available models
  • GET /api/version - Version information

OpenAI Protocol

  • POST /v1/chat/completions - Chat completions with tools
  • POST /v1/completions - Legacy completions
  • GET /v1/models - List models
  • POST /v1/embeddings - Embeddings (mock implementation)

Anthropic Protocol

  • POST /v1/messages - Anthropic messages API
  • GET /v1/models - List models

Testing with cURL

Test Ollama Endpoints

# List models
curl http://localhost:11434/api/tags

# Simple chat
curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "test",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "stream": false
  }'

Test with Tool Calls

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer dummy" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "What is the weather in San Francisco?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather in a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          }
        }
      }
    }],
    "stream": true
  }'

Test Streaming

# Ollama streaming
curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "test",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

# OpenAI streaming
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Advanced Configuration

Timeout Settings

llm-proxy serve \
  --from ollama \
  --to https://api.openai.com/v1 \
  --to-proto openai \
  --model gpt-4 \
  --key "sk-..." \
  --timeout 120  # 2 minute timeout

Verbose Logging

llm-proxy serve \
  --from ollama \
  --to https://api.openai.com/v1 \
  --to-proto openai \
  --model gpt-4 \
  --key "sk-..." \
  --verbose  # Enable debug logging

How It Works

  1. Client Request: Your tool sends a request in the source protocol (e.g., Ollama format)
  2. Protocol Translation: The proxy translates the request to the target protocol (e.g., OpenAI format)
  3. Forward Request: The translated request is sent to the target API
  4. Response Translation: The response is translated back to the source protocol
  5. Client Response: Your tool receives the response in the expected format

Why No .env Files?

This tool is designed as a "hack tool" for quick prototyping and testing. We avoid .env files to prevent:

  • Accidental key collisions
  • Environment pollution
  • Configuration drift between projects
  • Surprising behavior from inherited environment variables

All configuration must be explicit via command-line arguments.

Common Use Cases

  1. Local Development: Test tools that expect Ollama with cloud models
  2. Protocol Migration: Gradually migrate from Ollama to OpenAI APIs
  3. Cost Testing: Compare different model providers without changing code
  4. Feature Testing: Test tool calling support with different backends
  5. Load Testing: Proxy to different endpoints for performance comparison

Troubleshooting

"Connection refused" errors

  • Ensure the target URL is correct and accessible
  • Check if you need to use HTTPS vs HTTP
  • Verify API keys are correct

Streaming not working

  • Ensure your client supports Server-Sent Events (SSE)
  • Check that "stream": true is in the request
  • For Ollama protocol, use ndjson format

Tool calls not working

  • Verify the target model supports tool calls
  • Check tool definition format matches OpenAI specification
  • Ensure you're using the correct endpoint for the protocol

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built with LiteLLM for protocol translation
  • CLI powered by Typer
  • Beautiful outputs with Rich
  • Package management with uv

Note: This tool is for development and testing purposes. Always follow the terms of service for the APIs you're proxying to.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages