🔄 OpenAI Compatible Proxy

A lightweight, low-dependency proxy that presents an OpenAI-compatible API while forwarding all requests to any real backend — forcing your chosen model on every request.

Use case: You have a client hardcoded to call gpt-4 but you actually want it hitting accounts/fireworks/models/deepseek-v4-pro (or any other model/backend). This proxy sits in the middle and rewrites the model name transparently. Supports streaming (SSE) responses too.

✨ Features

OpenAI-compatible API — Drop-in replacement for any client expecting /v1/chat/completions
Model forcing — Every incoming model name gets rewritten to your configured model
Fast & Smart model routing — Request model="fast" for your cheap/fast model, model="smart" for your capable/slow model (optional — falls back to fast if not configured)
Automatic fallback on failure — If one backend times out, returns a server error (5xx), or rate-limits you (429), the proxy automatically retries on the other model. Works bidirectionally: fast↔smart.
Streaming support — SSE streaming works out of the box (no timeouts on long LLM requests)
Configurable timeout — Per-request timeout (default 120s) so stalled backends trigger fallback instead of hanging forever
Smart path stitching — Handles nested base paths (e.g., /inference/v1) without double-prefixing
Bind to any interface — Bind to a specific LAN IP to expose the proxy to your local network while keeping it off the WAN side (great for running on a router!)
Deployed via apt — Every dependency is available as a native Ubuntu package, no pip needed

🚀 Quick Start

Option A: Ubuntu native (no pip)

# Install dependencies via apt
./ubuntu-install.sh

# Copy and edit your config
cp .env.example .env
nano .env

# Run
./start.sh

Option B: pip / uv

pip install -r requirements.txt
# or: uv pip install -r requirements.txt

cp .env.example .env
nano .env

./start.sh

⚙️ Configuration

All config lives in .env:

Fast model (default)

Variable	Description	Example
`FAST_BASE_URL`	The actual backend to forward to	`https://api.fireworks.ai/inference/v1`
`FAST_API_KEY`	Your real API key for the backend	`sk-...`
`FAST_MODEL_NAME`	The model name to force	`accounts/fireworks/models/deepseek-v4-pro`

Backward compatibility: If you have an existing .env using the old REAL_* names (REAL_BASE_URL, REAL_API_KEY, REAL_MODEL_NAME), those still work as fallbacks.

Smart model (optional)

Variable	Description	Example
`SMART_BASE_URL`	Backend for "smart" requests (falls back to `FAST_BASE_URL`)	`https://api.openai.com/v1`
`SMART_API_KEY`	API key for the smart backend (falls back to `FAST_API_KEY`)	`sk-...`
`SMART_MODEL_NAME`	Model to use for "smart" requests (falls back to `FAST_MODEL_NAME`)	`gpt-4o`

If none of the SMART_* variables are set, requesting model="smart" will behave identically to model="fast" — and fallback will be disabled since there's nowhere else to fall back to.

Server settings

Variable	Description	Example
`PROXY_PORT`	Port the proxy listens on	`8086`
`BIND_HOST`	IP address the server binds to (set to LAN IP to avoid WAN exposure)	`192.168.1.1`
`PROXY_TIMEOUT`	Seconds before a backend request times out and triggers fallback (0 = no timeout)	`120`

🔄 Fallback behavior

When both fast and smart models are configured (i.e., they point to different backends or different model names), the proxy will automatically retry on the other model if the primary one fails due to:

Connection errors — can't reach the backend at all
Timeouts — backend doesn't respond within PROXY_TIMEOUT seconds
Server errors — backend returns HTTP 5xx
Rate limiting — backend returns HTTP 429

The client sees only the successful response from whichever backend worked. If both fail, the proxy returns the primary backend's error (504 for timeouts, or the actual error status otherwise).

Note: For streaming responses that have already started sending data to the client, mid-stream failures cannot be recovered — the fallback only applies to errors that occur before the first byte is sent.

🔒 Binding to LAN only

By default BIND_HOST=0.0.0.0 listens on all network interfaces — which includes the WAN side if you're on a router. To keep the proxy accessible from your LAN but hidden from the internet, set BIND_HOST to your router's LAN IP:

# Only listen on the LAN interface — WAN stays blind
BIND_HOST=192.168.1.1

Now devices on your local network can hit http://192.168.1.1:8086 but the outside world cannot.

📡 Usage

Once running, point any OpenAI-compatible client at your proxy:

# Use the fast model explicitly
curl http://localhost:8086/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer any-key-works" \
  -d '{
    "model": "fast",
    "messages": [{"role": "user", "content": "Tell me a joke!"}]
  }'

# Use the smart model for heavier tasks
curl http://localhost:8086/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer any-key-works" \
  -d '{
    "model": "smart",
    "messages": [{"role": "user", "content": "Write a detailed analysis of..."}]
  }'

# Any unrecognized model name goes to the fast backend
curl http://localhost:8086/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer any-key-works" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

The model field will be silently rewritten to your configured FAST_MODEL_NAME or SMART_MODEL_NAME. If the primary backend fails, the proxy transparently retries on the other one.

🧱 Dependency Mapping

pip package	Ubuntu apt package
`fastapi`	`python3-fastapi`
`uvicorn`	`python3-uvicorn`
`httpx`	`python3-httpx`
`python-dotenv`	`python3-dotenv`

🛡️ Security Note

Never commit .env to git. It contains your real API key. The proxy itself doesn't validate incoming auth — it replaces whatever the client sends with your real key. Keep this thing on localhost or behind a firewall/VPN.
Use BIND_HOST to lock down exposure. If running on a router or multi-interface machine, set BIND_HOST to your LAN IP so the proxy isn't reachable from the WAN side. The .gitignore already covers your live .env.

📄 License

MIT — do whatever you want with it.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
start.sh		start.sh
ubuntu-install.sh		ubuntu-install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔄 OpenAI Compatible Proxy

✨ Features

🚀 Quick Start

Option A: Ubuntu native (no pip)

Option B: pip / uv

⚙️ Configuration

Fast model (default)

Smart model (optional)

Server settings

🔄 Fallback behavior

🔒 Binding to LAN only

📡 Usage

🧱 Dependency Mapping

🛡️ Security Note

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔄 OpenAI Compatible Proxy

✨ Features

🚀 Quick Start

Option A: Ubuntu native (no pip)

Option B: pip / uv

⚙️ Configuration

Fast model (default)

Smart model (optional)

Server settings

🔄 Fallback behavior

🔒 Binding to LAN only

📡 Usage

🧱 Dependency Mapping

🛡️ Security Note

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages