aiproxy

An OpenAI-compatible gateway that turns any LLM into a more capable one by fusing it with a reusable fabric of MCP servers.

📖 Documentation: https://sirmmo.github.io/aiproxy/

Point any OpenAI client at aiproxy, pick one of your configured assistants as the model, and the gateway runs the whole agentic tool loop for you — calling the wrapped LLM, executing tools against your Model Context Protocol servers, feeding results back — and returns a normal OpenAI response. From the client's side it just looks like a smarter model.

flowchart LR
    client(["Any OpenAI client"])
    client -- "POST /v1/chat/completions" --> gate
    subgraph proxy["aiproxy"]
        direction TB
        gate["auth chain<br/>static keys | Apiman"]
        loop(["agent loop"])
        backend["backend adapter<br/>OpenAI-compat | native Anthropic"]
        mcp["MCP servers<br/>fetch | filesystem | http ..."]
        gate --> loop
        loop -- "LLM turn" --> backend
        backend -. "assistant / tool_calls" .-> loop
        loop -- "tool calls" --> mcp
        mcp -. "results" .-> loop
    end
    backend -- "chat / messages API" --> llm(["Upstream LLM"])
    proxy -- "OpenAI response" --> client

Why

MCP gives you a growing ecosystem of tool servers (web fetch, filesystem, databases, search, your own APIs). But wiring those tools into every app and every model is repetitive. aiproxy makes that infrastructure reusable and model-agnostic: define your MCP servers and LLM backends once, compose them into named assistants, and every OpenAI-compatible app in your stack gets a tool-augmented model for free — no client changes, no SDK lock-in.

Features

OpenAI-compatible API — /v1/chat/completions (streaming and non-streaming) and /v1/models. Works with the OpenAI SDKs, LangChain, LlamaIndex, curl, etc.
Wraps any LLM — OpenAI-compatible backends (OpenAI, Groq, Together, Mistral, vLLM, Ollama, LM Studio, OpenRouter…) and native Anthropic (Messages API), behind one interface.
Reusable MCP fabric — attach any number of MCP servers (stdio, sse, streamable-http) to each assistant. Tools are namespaced per server and executed transparently.
Assistants as virtual models — each model a client can pick is a backend + system prompt + set of MCP servers + tool-loop budget.
Runtime admin API — add/edit/remove assistants, backends and MCP servers without a restart; introspect any server's tools.
Pluggable auth — static API keys and Apiman key validation (gateway round-trip or trusted-header topologies) run in parallel.
Docker-first — docker compose up and you have an endpoint. Node (npx) and uvx are baked in so most MCP servers install on demand.

Quick start (Docker)

Use the prebuilt image from GitHub Container Registry:

cp .env.example .env                # add your OPENAI_API_KEY / ANTHROPIC_API_KEY
cp config.example.yaml config.yaml  # define backends, MCP servers, assistants

docker run --rm -p 8000:8000 --env-file .env \
  -v "$PWD/config.yaml:/app/config.yaml:ro" \
  ghcr.io/sirmmo/aiproxy:latest

…or build from source with compose:

cp .env.example .env
cp config.example.yaml config.yaml
docker compose up --build

Then talk to it like OpenAI:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "research-assistant",
    "messages": [{"role": "user", "content": "Summarize https://modelcontextprotocol.io"}]
  }'

Or with the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused-or-your-PROXY_API_KEY")

resp = client.chat.completions.create(
    model="research-assistant",              # an assistant, not a raw model
    messages=[{"role": "user", "content": "What's on the MCP homepage?"}],
)
print(resp.choices[0].message.content)

Streaming works exactly as clients expect (stream=True) — content tokens flow through while tool rounds run transparently between them.

Configuration

Everything is declared in config.yaml. ${VAR} / ${VAR:-default} are expanded from the environment, so keep secrets in .env.

mcp_servers:
  fetch:                                   # a reusable MCP server
    transport: stdio
    command: uvx
    args: ["mcp-server-fetch"]

backends:
  anthropic:                               # a wrapped LLM provider
    kind: anthropic                        # or "openai" for any compat endpoint
    base_url: https://api.anthropic.com/v1
    api_key: ${ANTHROPIC_API_KEY}

assistants:
  - name: research-assistant               # <- clients pass this as `model`
    backend: anthropic
    model: claude-sonnet-5
    system_prompt: "You are a meticulous research assistant. Cite your sources."
    mcp_servers: [fetch]
    max_tool_iterations: 8
    temperature: 0.2

See config.example.yaml for the fully annotated version, including sse/http MCP servers, local model backends, and API-key auth.

Backends

`kind`	Talks to	Auth header
`openai`	any OpenAI-compatible `/chat/completions`	`Authorization: Bearer`
`anthropic`	native Anthropic `/messages`	`x-api-key`

The Anthropic backend translates the canonical chat messages ↔ the Messages API (system prompt, tool_use/tool_result blocks, streaming events, stop-reason mapping), so tool use works first-class with Claude.

MCP servers

`transport`	Fields
`stdio`	`command`, `args`, `env`, `cwd`
`sse`	`url`, `headers`
`http` / `streamable-http`	`url`, `headers`

Tools are exposed to the model as "<server>__<tool>" and routed back to the right server on call. Sessions are persistent (one subprocess per stdio server, reused across requests) and started lazily on first use.

Admin API

Mutate the live registry without restarting (set ADMIN_API_KEY to protect it):

# Inspect current state (secrets redacted)
curl localhost:8000/admin/config

# See what tools a server actually advertises
curl localhost:8000/admin/mcp/fetch/tools

# Add/replace an assistant
curl -X PUT localhost:8000/admin/assistants/coder \
  -H "Content-Type: application/json" \
  -d '{"backend":"openai","model":"gpt-4o","mcp_servers":["filesystem"],
       "system_prompt":"You are a coding agent."}'

Method & path	Purpose
`GET /admin/config`	Dump current registry (secrets redacted)
`GET/PUT/DELETE /admin/assistants[/{name}]`	Manage assistants
`GET/PUT/DELETE /admin/backends[/{name}]`	Manage LLM backends
`GET/PUT/DELETE /admin/mcp[/{name}]`	Manage MCP servers
`GET /admin/mcp/{name}/tools`	Introspect a server's tools

Admin changes are in-memory. Use GET /admin/config to export current state and persist it into config.yaml yourself.

Auth

/v1/* auth is a pluggable chain — a request is authorized if any enabled provider accepts it, so static keys and Apiman run in parallel:

Provider	Enable with	Accepts a request when…
Static keys	non-empty `proxy_api_keys`	the caller's key matches an entry
Apiman `gateway_probe`	`apiman.enabled: true`, `mode: gateway_probe`	the key validates via a round-trip through the Apiman gateway (2xx)
Apiman `trusted_header`	`apiman.enabled: true`, `mode: trusted_header`	the request carries the shared secret the gateway injects

The caller's key is read from Authorization: Bearer <key>, the X-API-Key header, or the ?apikey= query param (matching Apiman's own conventions). If no provider is configured, /v1/* is open (handy for local dev).

/admin/* — separate: if ADMIN_API_KEY is set, admin calls must send Authorization: Bearer <ADMIN_API_KEY>.

Apiman integration

Apiman is an open-source API-management layer (clients, plans, contracts, quotas, policies). Two topologies are supported:

gateway_probe — aiproxy stays reachable directly and validates each caller's key against Apiman itself. Register a tiny "auth check" API in Apiman whose backend is aiproxy's /health, then:

apiman:
  enabled: true
  mode: gateway_probe
  gateway_url: http://apiman-gateway:8080/apiman-gateway
  probe_api: aiproxy/authcheck/1.0    # {org}/{api}/{version}
  probe_path: health                  # backend path that returns 2xx
  cache_ttl: 60

On each request aiproxy calls GET {gateway_url}/aiproxy/authcheck/1.0/health with X-API-Key: <caller-key>; a 2xx authorizes the request (cached for cache_ttl seconds), 401/403 rejects it.

trusted_header — put the Apiman gateway in front of aiproxy and have an "Add Header" policy inject a shared secret; aiproxy trusts any request carrying it and never sees raw keys:

apiman:
  enabled: true
  mode: trusted_header
  header: X-Apiman-Gateway-Token
  secret: ${APIMAN_SHARED_SECRET}

Development & testing

The recommended path is Docker — it pins a clean Python 3.12 with node/uvx available, so you never fight a host interpreter or a leaked PYTHONPATH.

docker build -t aiproxy:latest .

# End-to-end check — spawns the demo MCP server and drives the full agent
# loop (streaming + non-streaming) with a scripted fake LLM. No API key needed:
docker run --rm aiproxy:latest python scripts/smoke_test.py

# Run the server against your config.yaml (or just `docker compose up`):
docker run --rm -p 8000:8000 -v "$PWD/config.yaml:/app/config.yaml:ro" aiproxy:latest

Running without Docker (uv venv)

uv venv --python 3.12 .venv
uv pip install --python .venv/bin/python -e '.[dev]'
# Clear PYTHONPATH if your shell injects system site-packages that shadow the venv:
env PYTHONPATH= .venv/bin/python scripts/smoke_test.py
env PYTHONPATH= .venv/bin/uvicorn app.main:app --reload --port 8000

How a request flows

Client POST /v1/chat/completions with model: "<assistant>".
Gateway resolves the assistant → backend + MCP servers, and ensures those servers are connected.
It builds the OpenAI tool schema from the servers' tools and enters the agent loop:
- call the backend LLM with the messages + tools;
- if the model returns tool calls, execute them concurrently against the MCP servers and append the results;
- repeat until the model answers or max_tool_iterations is hit (the last turn drops tools to force a final answer).
Return an OpenAI chat.completion (or stream chat.completion.chunks), with the assistant name as the model and aggregated token usage.

Project layout

app/
  main.py            FastAPI app + lifespan
  config.py          YAML config models, ${env} expansion
  state.py           live registry (assistants/backends/MCP), runtime mutation
  mcp_manager.py     persistent MCP sessions + namespaced ToolSet router
  agent.py           the agentic loop (streaming + non-streaming)
  backends/          openai + anthropic adapters behind one interface
  routes/            /v1 (chat) and /admin (registry) endpoints
  auth.py            pluggable auth chain (static keys + Apiman)
examples/echo_mcp_server.py   demo stdio MCP server
scripts/smoke_test.py         end-to-end test, no real LLM required

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
app		app
data		data
docs		docs
examples		examples
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aiproxy

Why

Features

Quick start (Docker)

Configuration

Backends

MCP servers

Admin API

Auth

Apiman integration

Development & testing

How a request flows

Project layout

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

aiproxy

Why

Features

Quick start (Docker)

Configuration

Backends

MCP servers

Admin API

Auth

Apiman integration

Development & testing

How a request flows

Project layout

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages