A Universal LLM API Gateway & Transformation Layer.
Plexus unifies interactions with multiple AI providers (OpenAI, Anthropic, Gemini, etc.) under a single API. Switch models and providers without rewriting client code.
- OAuth Providers (pi-ai): Authenticate to Anthropic, GitHub Copilot, Gemini CLI, Antigravity, and OpenAI Codex via the Admin UI and route them with
oauth://providers - OAuth Management APIs: Start, poll, prompt, and cancel OAuth login sessions via
/v0/management/oauth/* - Quota Tracking System: Monitor provider rate limits and quotas with configurable checkers
- OAuth-backed Quota Checkers:
claude-codeandopenai-codexquota checkers read tokens fromauth.jsonby default (no hardcoded quota API key required) - Audio Transcriptions API: Full OpenAI-compatible
/v1/audio/transcriptionsendpoint support with multipart file uploads - Embeddings API: Full OpenAI-compatible
/v1/embeddingsendpoint support - Model Type System: Distinguish between chat, embeddings, and transcriptions models with automatic API filtering
- Token Estimation: Automatic token counting for providers that don't return usage data
- Bulk Model Import: Import models directly in provider configuration
- Direct Model Routing: Route directly to provider models with
direct/provider/modelformat - Responses API Support: Full OpenAI
/v1/responsesendpoint with multi-turn conversation support. Inckudes support for previous_response_id tracking and injection, something many proxy tools lack. - Automatic Response Cleanup: Responses are retained for 7 days with hourly cleanup jobs to prevent database bloat
Plexus uses Drizzle ORM with SQLite or Postgres for data persistence:
- Schema Management: Type-safe database schemas in
packages/backend/drizzle/schema/ - Automatic Migrations: Migrations run automatically on startup
- Tables: Usage tracking, provider cooldowns, debug logs, inference errors, performance metrics, quota snapshots
docker run -p 4000:4000 \
-v $(pwd)/config/plexus.yaml:/app/config/plexus.yaml \
-e AUTH_JSON=/app/auth.json \
-v $(pwd)/auth.json:/app/auth.json \
-v plexus-data:/app/data \
ghcr.io/mcowger/plexus:latestAUTH_JSON points Plexus at the OAuth credentials file (default: ./auth.json).
For OAuth-backed quota checkers (claude-code, openai-codex), Plexus also uses this file automatically unless an explicit options.apiKey override is provided.
See Installation Guide for other options.
Plexus supports the OpenAI /v1/responses endpoint with full multi-turn conversation support:
curl -X POST http://localhost:4000/v1/responses \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "What is 2+2?",
"previous_response_id": "resp_abc123"
}'Unlike many LLM gateways that lack multi-turn state management, Plexus correctly handles previous_response_id:
- Stateful Conversations: Send just the new input and previous_response_id - no need to resend conversation history
- Automatic Context Loading: Previous response output items are merged into the current request automatically
- Storage & Linking: Responses are stored with TTL cleanup (7 days), linked via previous_response_id references
Responses are stored for multi-turn conversation support:
- Retention: 7-day TTL (configurable)
- Automatic Cleanup: Hourly job removes expired responses and orphaned conversations
- Management API: Retrieve, list, or delete stored responses via
/v1/responses/:response_id
MIT License - see LICENSE file.
