This is the big one: the gateway grows from a personal proxy into something a whole team can run. You can hand out API keys, see what each team spends — and how much routing saves them — and put guardrails on cost, speed, and which models they can reach. It adds rate limiting and a response cache for instant, free repeats, and lets anyone steer routing right from the chat box. Everything is opt-in, and the routing decision is still made instantly and offline, with no extra model call.
Hand out keys, scope teams
Mint a key with one command — the gateway only ever stores a hash of it, so your config never holds a usable secret:
wayfinder-router keys new --id team-aPaste the printed block into your config and give each key whatever limits you like — a daily budget, a rate limit, and which models it's allowed to use:
[gateway.keys.team-a]
hash = "…" # from `keys new`
models = ["local"] # this team can only use the local model
[gateway.keys.team-a.budget]
limit = 20.0 # $20/dayOnce any key exists the gateway requires one (and stays wide open if you set none). Every request is tagged with its key, so you can see each team's usage — and, uniquely, how much routing saved them. If a team asks for a model it isn't allowed, the request quietly drops to the nearest model it can use instead of failing.
Put a ceiling on volume
Cap how many requests (or tokens) a minute the gateway will handle, so one runaway client or retry loop can't swamp your provider. Over the limit, callers get a polite "try again shortly":
[gateway.rate_limit]
rpm = 600Set it gateway-wide, or per key. And every response now tells a client how much of its allowance is left and when it resets — so a well-behaved one can ease off before it ever gets turned away.
Skip the repeats
Turn on the cache and identical requests come back instantly — and free — instead of going out again. Great for tests, dev loops, and agent tools that re-ask the same thing. Off by default; everything stays in memory, and the prompt is only ever stored as a hash.
[gateway.cache]
enabled = true
ttl = 300 # secondsA cached answer costs nothing, and Wayfinder keeps a running tally of what the cache saved you.
Steer from the chat box
Sometimes you just want this message on a particular model. Turn on slash directives and start a message with one — no settings, no headers, works in any chat box (and in Claude Code):
[gateway]
slash_directives = trueNow /local explain this regex forces it local, /prefer-hosted draft the proposal sends it to the top tier, and /auto … hands it back to the router. The directive is stripped before the model sees it, and only real directives are touched — a message that starts with a path or a /help is left exactly as you typed it.
Nothing changed about how it routes
All of this sits around the router, not inside it. Keys, limits, budgets, caching, and slash directives decide whether and how a request is delivered — they never change how Wayfinder picks a model, which is still computed instantly and offline with no model call. And virtual keys are separate from your real provider keys, which never leave your environment.
Upgrading
pip install -U "wayfinder-router[gateway]"Everything here is additive and opt-in: with no keys configured the gateway behaves exactly as before, and the cache, rate limiter, and slash directives do nothing until you switch them on.
Full changelog: https://github.com/itsthelore/wayfinder-router/blob/main/CHANGELOG.md