Skip to content

Releases: itsthelore/wayfinder-router

Wayfinder Router 2026.6.9 - the gateway as a control plane

28 Jun 07:44
7c600ac

Choose a tag to compare

This is the big one: the gateway grows from a personal proxy into something a whole team can run. You can hand out API keys, see what each team spends — and how much routing saves them — and put guardrails on cost, speed, and which models they can reach. It adds rate limiting and a response cache for instant, free repeats, and lets anyone steer routing right from the chat box. Everything is opt-in, and the routing decision is still made instantly and offline, with no extra model call.

Hand out keys, scope teams

Mint a key with one command — the gateway only ever stores a hash of it, so your config never holds a usable secret:

wayfinder-router keys new --id team-a

Paste the printed block into your config and give each key whatever limits you like — a daily budget, a rate limit, and which models it's allowed to use:

[gateway.keys.team-a]
hash = ""            # from `keys new`
models = ["local"]    # this team can only use the local model
[gateway.keys.team-a.budget]
limit = 20.0          # $20/day

Once any key exists the gateway requires one (and stays wide open if you set none). Every request is tagged with its key, so you can see each team's usage — and, uniquely, how much routing saved them. If a team asks for a model it isn't allowed, the request quietly drops to the nearest model it can use instead of failing.

Put a ceiling on volume

Cap how many requests (or tokens) a minute the gateway will handle, so one runaway client or retry loop can't swamp your provider. Over the limit, callers get a polite "try again shortly":

[gateway.rate_limit]
rpm = 600

Set it gateway-wide, or per key. And every response now tells a client how much of its allowance is left and when it resets — so a well-behaved one can ease off before it ever gets turned away.

Skip the repeats

Turn on the cache and identical requests come back instantly — and free — instead of going out again. Great for tests, dev loops, and agent tools that re-ask the same thing. Off by default; everything stays in memory, and the prompt is only ever stored as a hash.

[gateway.cache]
enabled = true
ttl = 300    # seconds

A cached answer costs nothing, and Wayfinder keeps a running tally of what the cache saved you.

Steer from the chat box

Sometimes you just want this message on a particular model. Turn on slash directives and start a message with one — no settings, no headers, works in any chat box (and in Claude Code):

[gateway]
slash_directives = true

Now /local explain this regex forces it local, /prefer-hosted draft the proposal sends it to the top tier, and /auto … hands it back to the router. The directive is stripped before the model sees it, and only real directives are touched — a message that starts with a path or a /help is left exactly as you typed it.

Nothing changed about how it routes

All of this sits around the router, not inside it. Keys, limits, budgets, caching, and slash directives decide whether and how a request is delivered — they never change how Wayfinder picks a model, which is still computed instantly and offline with no model call. And virtual keys are separate from your real provider keys, which never leave your environment.

Upgrading

pip install -U "wayfinder-router[gateway]"

Everything here is additive and opt-in: with no keys configured the gateway behaves exactly as before, and the cache, rate limiter, and slash directives do nothing until you switch them on.

Full changelog: https://github.com/itsthelore/wayfinder-router/blob/main/CHANGELOG.md

Wayfinder Router 2026.6.7 - the gateway as a control plane

25 Jun 05:59

Choose a tag to compare

This release turns the gateway from a personal proxy into something a whole team can run. You can hand out API keys, see what each team spends — and how much routing saves them - and put guardrails on cost, speed, and which models they can reach. It also adds rate limiting and a response cache for instant, free repeats. Everything is opt-in, and the routing decision is still made instantly and offline, with no extra model call.

Hand out keys, scope teams

Mint a key with one command - the gateway only ever stores a hash of it, so your config never holds a usable secret:

wayfinder-router keys new --id team-a

Paste the printed block into your config and give each key whatever limits you like - a daily budget, a rate limit, and which models it's allowed to use:

[gateway.keys.team-a]
hash = ""            # from `keys new`
models = ["local"]    # this team can only use the local model
[gateway.keys.team-a.budget]
limit = 20.0          # $20/day

Once any key exists the gateway requires one (and stays wide open if you set none). Every request is tagged with its key, so you can see each team's usage - and, uniquely, how much routing saved them. If a team asks for a model it isn't allowed, the request quietly drops to the nearest model it can use instead of failing.

Put a ceiling on volume

Cap how many requests (or tokens) a minute the gateway will handle, so one runaway client or retry loop can't swamp your provider. Over the limit, callers get a polite "try again shortly":

[gateway.rate_limit]
rpm = 600

Set it gateway-wide, or per key.

Skip the repeats

Turn on the cache and identical requests come back instantly - and free - instead of going out again. Great for tests, dev loops, and agent tools that re-ask the same thing. Off by default; everything stays in memory, and the prompt is only ever stored as a hash.

[gateway.cache]
enabled = true
ttl = 300    # seconds

A cached answer costs nothing, and Wayfinder keeps a running tally of what the cache saved you.

Nothing changed about how it routes

All of this sits around the router, not inside it. Keys, limits, budgets, and caching decide whether and how a request is delivered — they never change how Wayfinder picks a model, which is still computed instantly and offline with no model call. And virtual keys are separate from your real provider keys, which never leave your environment.

Upgrading

pip install -U "wayfinder-router[gateway]"

Everything here is additive and opt-in: with no keys configured the gateway behaves exactly as before, and the cache and rate limiter do nothing until you switch them on.

Full changelog: https://github.com/itsthelore/wayfinder-router/blob/main/CHANGELOG.md

Wayfinder Router 2026.6.6 - Claude Code, and a budget that bends

24 Jun 05:44

Choose a tag to compare

Two ways the gateway goes further. It now speaks Anthropic's Messages API, so Claude Code - and anything else pointed at ANTHROPIC_BASE_URL - routes through Wayfinder with a single environment variable. And it grows a spend budget that, when you hit your cap, degrades to the cheaper tier instead of cutting you off. Both are pure additions around the same offline scorer: the routing decision is unchanged, still computed with no model call.

Claude Code, through Wayfinder

Claude Code speaks Anthropic's Messages API, not OpenAI Chat Completions, so a base_url swap never worked - until now. A new POST /v1/messages adapter translates Messages ⇄ Chat Completions in both directions (buffered and streaming, tool use included) and hands the request to the same router everything else uses. Point Claude Code at the gateway:

pip install -U "wayfinder-router[gateway]"
wayfinder-router serve

export ANTHROPIC_BASE_URL="http://localhost:8088"   # the client appends /v1/messages
export ANTHROPIC_API_KEY="unused"                    # the gateway uses each upstream's own key
claude

Wayfinder scores each turn and routes it to the configured tier; the Claude model id rides along but the decision is Wayfinder's (send a configured endpoint name to pin one call). Every x-wayfinder-router-* header you get on the OpenAI endpoint - model, score, mode, served-by, budget - rides along here too, because it is the same decision. Streaming and tool calls are translated end to end; image/vision blocks and extended thinking are not translated yet. Full recipe in docs/integrations.md.

A budget that bends instead of breaks

A spend cap that is true to a router. Set a limit, and when the period's realized cost reaches it Wayfinder doesn't fail your requests - it degrades them to the cheapest tier (the same degrade primitive failover uses, which never raises cost). Want a hard stop instead? on_breach = "block" returns an HTTP 402.

[routing]
threshold = 0.55          # below → local, at/above → cloud

[gateway.budget]
limit = 5.0               # in your price table's unit (USD when cost_per_1k is set)
window = "day"            # day | month | all
on_breach = "degrade"     # degrade to the cheapest tier (default) | block (HTTP 402)

[gateway.models.local]
base_url = "http://localhost:11434/v1"
model = "llama3.1:8b"
cost_per_1k = 0.0

[gateway.models.cloud]
base_url = "https://api.openai.com/v1"
model = "gpt-4o"
api_key_env = "OPENAI_API_KEY"
cost_per_1k = 0.0125

With that config, once the day's spend hits $5 a prompt that would route to cloud is served by local instead - and the response says so: x-wayfinder-router-budget: degraded, mode: budget-degraded, with the true complexity score still in x-wayfinder-router-score. A breach is never silent. Budgets enforce only when real cost_per_1k prices are configured; a relative-units demo has nothing to cap.

The boundary still holds

Both features live in the invocation layer. The adapter scores nothing; the budget recomputes nothing - the complexity score and tier choice are the same deterministic, offline decision as always. A budget changes which tier a request is delivered to; the adapter changes only the request's shape. No model call enters the decision path.

Upgrading

pip install -U "wayfinder-router[gateway]"

Fully additive: existing configs, clients, and the JSON/header contract are unchanged. [gateway.budget] is opt-in (no cap unless you set one), and /v1/messages is a new endpoint that leaves the OpenAI path untouched.

Full changelog: https://github.com/itsthelore/wayfinder-router/blob/main/CHANGELOG.md

Wayfinder Router 2026.6.3 - the "terminal-first" release

22 Jun 18:09

Choose a tag to compare

Everything since the last dated changelog lands together, and the headline is where Wayfinder now lives. Routing stopped being something you watch through response headers and became somewhere you can sit - a decision-first chat in your terminal, set up with one command, with keys that never leave your environment. The deterministic core is untouched; all of this is presentation and glue around the same offline scorer.

Routing you can sit inside

wayfinder-router chat is now a full-screen terminal app (built on Textual). A scrolling transcript headed by the wordmark, every prompt routed inline as ● LOCAL or ◆ CLOUD with the score, an expandable /why breakdown of the structural features behind the call, a /settings panel for thresholds and scope, and model replies streamed token-by-token. It talks to your models two ways: in-process via your [gateway.models], or over HTTP to a running gateway with --base-url. No model call is made to decide — the routing is still pure and offline.

pip install "wayfinder-router[tui]"
wayfinder-router chat

One command to set up

wayfinder-router init scaffolds a starter wayfinder-router.toml and a .env.example of variable names, from a preset. hybrid pairs a keyless local Ollama arm with an Anthropic cloud arm; openai is a two-tier gpt-4o-mini → gpt-4o. It then prints a per-model key check so you know exactly what's left to do.

pip install "wayfinder-router[gateway]"
wayfinder-router init
wayfinder-router init --preset openai

Pick your stack, step by step

For anything the presets don't cover, --interactive walks you through it: choose a provider for each tier (Ollama, OpenAI, Anthropic, or a custom OpenAI-compatible endpoint), name the arms, set the score cuts, mix as many tiers as you like. It writes explicit [[routing.tiers]] that load straight back through the parser — and still asks only for variable names, never a secret.

wayfinder-router init --interactive

Is it wired up?

wayfinder-router doctor checks the nearest config and whether each model's key resolves — ✓ set, ✗ not set, or keyless — with no server to start. It exits non-zero when a named key is missing, so it drops cleanly into a script or a pre-flight check.

wayfinder-router doctor

Keys stay in your environment

Wayfinder never stores a secret. A model names an env var via api_key_env; the key is read from your environment at request time and is never written to the config, the logs, or anywhere else. init and doctor only ever name the variables and tell you which to export — so there is nothing to install and nothing to leak.

Still in your browser

The browser demo lives on as wayfinder-router webchat — the gateway opened straight at /demo, with the live threshold slider and the same decision-first view. When no models are configured, both chat and webchat now point you at wayfinder-router init to get started.

wayfinder-router webchat --dry-run

Calendar versioning

Wayfinder moves to CalVer (YYYY.M.MICRO). This is the line the roadmap tracked as v0.3.0; from here, the version tells you when a release shipped. It's PEP 440-native, so the source attribute, the 2026.6.3 tag, and the published PyPI version all read the same.

Upgrading

pip install --upgrade wayfinder-router
uv tool upgrade wayfinder-router

Everything is additive for the core: the scorer, the JSON contract, the gateway, and exit codes work as before, and the package stays stdlib-only with no runtime dependencies. The terminal chat is an opt-in extra — [tui] now pulls in both rich and textual — imported lazily, so nothing changes for installs that don't use it.

Wayfinder Router v0.1.6 - Prove it: an honest, offline routing benchmark

19 Jun 05:56

Choose a tag to compare

v0.1.6 is a big one — it caps four releases of work since v0.1.2 and lands the piece the project most needed: proof.

Wayfinder claims to route with no model call, deterministically, offline. This release backs that claim with a reproducible benchmark - one that's deliberately honest about where structural routing loses. Along the way it also makes the chat-UI path actually work
(streaming), makes routing visible, and lets clients discover the routing modes.

What's new

An honest, offline benchmark (the headline).
A new benchmarks/ harness (make benchmark) scores routers against per-model correctness labels — no model is called, nothing hits the network, and it reproduces byte-for-byte. Metrics are the ones the field already uses (RouteLLM's PGR / call-fraction, RouterArena's cost / latency), not a flattering invention, and it reports the full cost-quality curve with a cost-aware knee.

The point is credibility, so the harness is built to not flatter us:

  • it ships honest baselines — always-local, always-cloud, a stable-random, a tuned length-threshold, and an oracle upper bound;
  • the shipped dataset includes Wayfinder's failure mode — short-but-hard prompts ("prove √2 is irrational") that score structurally invisible;
  • on that illustrative set, the length baseline slightly beats Wayfinder (PGR 0.67 vs 0.60). We published that on purpose.
  • routers that need a model call to decide (RouteLLM, NotDiamond, …) get a pluggable adapter and a comparison citing their own published numbers with provenance - never presented as ours.
make benchmark            # reproduce benchmarks/results.md, offline, no keys

The README gains a "How it compares" section with the precise positioning: the only offline, zero-model-call, calibrate-on-your-data, self-hosted structural router - and a link to the benchmark.

The chat path is real now: streaming + async (v0.1.4).
A request with stream: true is relayed back as Server-Sent-Events, so chat clients (LibreChat, Open WebUI, …) render tokens progressively. The gateway forwards asynchronously (httpx.AsyncClient), so concurrent requests no longer block one another. Upstream timeouts and connection failures now return an OpenAI-shaped wayfinder_router_upstream_error (a 502, or a terminal SSE event) instead of a bare 500.

Production hardening (v0.1.4).
A configurable upstream timeout (WAYFINDER_ROUTER_TIMEOUT / serve --timeout); an optional WAYFINDER_ROUTER_FEEDBACK_TOKEN that gates the /v1/feedback write behind a bearer token; an x-wayfinder-router-request-id on every response with routing decisions and config-reload failures logged; and GET /healthz reporting degraded + missing_keys. New serve --dry-run returns the routing decision with no backends configured - try the router in 30 seconds.

Routing made visible (v0.1.5).
A read-only dashboard answers "is it actually routing, and where?" without digging through headers: GET /router serves a tiny self-contained page (no CDN, no build) and GET /router/recent is the JSON behind it - recent decisions, a per-model count, and scores at a glance, metadata only, never prompt text. For clients that hide headers, opt-in X-Wayfinder-Debug: true surfaces the
decision in the response body; the default response stays byte-clean.

Model discovery + no-fork chat UIs (v0.1.3).
GET /v1/models advertises the selectable routing options — auto, prefer-local / prefer-hosted, and each configured endpoint - so any OpenAI-compatible client auto-populates its model dropdown as a routing-mode picker, no hand-written list. New examples/ recipes put a chat UI in front with no fork: a LibreChat custom-endpoint config + a Compose sidecar override, and
Open WebUI notes. The high-end directive is now prefer-hosted (with prefer-cloud kept as a silent back-compat alias).

The boundary still holds

None of this touches the deterministic core. The score is still computed offline with no model call, no key read, and no network; only the optional gateway/UI layers reach the network or keys, and only from the environment.

Install / upgrade

pip install -U "wayfinder-router[gateway]"

Fully backward-compatible with v0.1.2: existing clients and wayfinder-router.toml
configs are unaffected, and prefer-cloud keeps working.

Looking ahead → v0.2.0

This release finishes hardening the router so it's safe and pleasant to put behind any chat client. Next is wayfinder-chat — the turnkey path: a LibreChat fork with the gateway inside, where the per-request override and the visibility surface shipped here become a built-in per-conversation routing-mode picker and threshold slider. wayfinder-router stays the lean, deterministic, bring-your-own-client router; wayfinder-chat will be the one-thing-to-install experience for everyone else.

Full changelog: https://github.com/itsthelore/wayfinder-router/blob/main/CHANGELOG.md

Wayfinder Router v0.1.2 - Steer any single request

18 Jun 16:54

Choose a tag to compare

v0.1.2 makes the gateway's routing decision controllable per request. Until now the local-vs-cloud boundary was a property of your wayfinder-router.toml - the same for every call. Now any client can override it for one request, over plain OpenAI-compatible transport, with no application code change and no change to the deterministic core.

What's new

The model field is now a routing directive. It rides on every OpenAI-style request and was previously ignored:

  • auto (or any ordinary model id like gpt-4o) — Wayfinder scores and decides, exactly as before.
  • a configured endpoint name (local, cloud, …) — pin this call straight to that endpoint.
  • prefer-local / prefer-cloud — pin to the low / high end of your router without naming a concrete endpoint.

A new X-Wayfinder-Threshold header re-cuts a binary router for a single request — a number in 0.01.0, reusing your configured scoring weights:

client.chat.completions.create(model="cloud", messages=[...])              # pin one call to cloud
client.chat.completions.create(model="auto", messages=[...],
    extra_headers={"X-Wayfinder-Threshold": "0.8"})                        # move the cut for one call

Every response now reports how it was routed. Alongside the existing
x-wayfinder-router-model and x-wayfinder-router-score, responses carry a new
x-wayfinder-router-mode header: scored, pinned, or threshold-override.

The boundary still holds

An override only changes which endpoint a request is forwarded to — the prompt is still scored deterministically and offline, and no override adds a model call, key read, or provider logic to the core (WF-ADR-0001 / WF-ADR-0004). The threshold override is deliberately scoped to binary routers; against a classifier or multi-tier config it returns a clear 400 rather than guessing.

Fully backwards compatible

Existing clients are unaffected: an unrecognized model value falls through to scoring (so Wayfinder stays a drop-in proxy) and any unrelated header is ignored.

Install / upgrade

pip install -U "wayfinder-router[gateway]"

Looking ahead → v0.2.0

This release is the first, router-side piece of the v0.2.0 direction: wayfinder-chat, a turnkey chat experience (a LibreChat fork that depends on wayfinder-router) where local-vs-cloud routing and its controls are built into the UI. The per-request override shipped here is exactly what lets a chat client expose a routing-mode picker and a threshold slider per conversation - without compromising the router's transparent, bring-your-own-client identity. The router stays lean for power users; the chat app will serve everyone who just wants one thing to install and talk to.

Full changelog: https://github.com/itsthelore/wayfinder-router/blob/main/CHANGELOG.md

Wayfinder 0.1.1 - sharper console, one-step install

18 Jun 16:38

Choose a tag to compare

A polish release on top of v0.1.0: the operator console gets a branded redesign, the gateway and UI install in one step, and the docs make it clear where Wayfinder actually sits. No behavior, API, or config changes — drop-in over v0.1.0.

Installation

pip install -U "wayfinder-router[gateway]"   # route traffic through the gateway
pip install -U "wayfinder-router[all]"        # gateway + UI together (new)
pip install -U wayfinder-router               # just the scorer + CLI + Python API

What's new

A branded console
wayfinder-router ui is no longer a generic form — it's a deliberate, modern surface: a teal-on-cream/navy palette, automatic light and dark (it follows your OS), a wordmark + tagline header, carded sections, a custom threshold slider, a recommendation pill, and refined tables and contribution bars. Same four tabs (Explain / Calibrate / Configure / Onboard), same behavior — just slick now.

One-step install for the full local experience
pip install "wayfinder-router[all]" brings up the gateway and the UI together, instead of adding the extras separately. The deterministic core stays zero-dependency; all is just a convenience aggregate of the existing extras.

Docs that explain where Wayfinder sits
A new "Where Wayfinder sits" section (with a diagram) makes the model explicit: Wayfinder is transparent middleware behind any OpenAI-compatible client — Open WebUI, LibreChat, an IDE assistant, your own app — and "local" vs "hosted" are backends, not separate UIs. Install guidance now leads with [gateway], the extra you need to actually route traffic.

Upgrade

Fully backward-compatible with v0.1.0 — no changes to the CLI, the gateway endpoints, the JSON contract, or wayfinder-router.toml. Just pip install -U.

Looking ahead: v0.2.0 and wayfinder-chat

Today Wayfinder is something you put behind your own client. Next, we want a version you can just open and chat with — routing built in, nothing to wire up.

wayfinder-chat will be a companion app: a fork of LibreChat (MIT) with the Wayfinder gateway inside, so a casual user installs one thing and gets —

  • a real chat window, not a config form;
  • routing controls in the settings — set the local-vs-cloud threshold and watch it take effect;
  • a clear read on which model answered each message, and why.

It's a deliberate split, not a pivot: wayfinder-router stays the lean, deterministic, bring-your-own-client router for power users and production gateways, while wayfinder-chat is the turnkey, batteries-included path for everyone else — two products, used independently.

The direction is recorded in WF-ADR-0010, and it's still exploratory - no firm timeline yet - but that's where we're headed.

Wayfinder v0.1.0 - the "first cut" release

18 Jun 06:07

Choose a tag to compare

The first public release. Wayfinder is a deterministic prompt-complexity router:
hand it a prompt, get back a reproducible structural score and a recommendation —
route this one to your local model, or to the cloud. No model call to make
the decision, no API key, no network in the scored path. Same prompt, same
threshold, same answer, every time.

Installation

pip install wayfinder-router              # the `wayfinder-router` CLI + Python API
pip install "wayfinder-router[gateway]"   # plus the OpenAI-compatible routing gateway
pip install "wayfinder-router[ui]"        # plus the local calibration/explain UI

Its own product

Wayfinder was prototyped inside rac-core
and split out because routing is a runtime inference concern, not recorded
knowledge — a prompt router shouldn't make you install a requirements engine. It
has zero dependency on RAC: no rac import, no .rac/ reads, a stdlib-only
core, Python 3.11+. Apache-2.0.

Key Features

Structure, Not a Second Opinion
The score comes from the shape of a prompt — word count, headings and their
depth, list items, links, code blocks, table rows — combined into a bounded
0.0–1.0 value. The obvious alternative, asking a model how hard a prompt is, is
non-deterministic and costs a model call to decide whether to make a model call.
Wayfinder takes the opposite stance.

echo "Refactor the auth module to support OAuth2 and SAML." | wayfinder-router route -

Drop It In Front of Your Models
wayfinder-router serve runs an OpenAI-compatible gateway: point any client's
base_url at it and easy prompts go local, hard ones go cloud — no application
code change. Each response carries x-wayfinder-router-model and
x-wayfinder-router-score so you can see the routing. Keys are read from the
environment at request time, never stored.

Tune It to Your Traffic
The cut is a proxy, so calibrate it. wayfinder-router calibrate turns a labeled
JSONL dataset into a config — a binary threshold, ordered tiers across any number
of models, or a fitted classifier (deterministic Newton/IRLS, pure Python).

Learn the Cut From Feedback
wayfinder-router onboard A/B-tests a local vs hosted model and records your
good-enough judgment; the gateway's /v1/feedback endpoint captures the rest.
The label log is the calibrate dataset, and wayfinder-router recalibrate
(cron or a click in the UI) re-fits it live — a running gateway hot-reloads with
no restart.

See Why It Routed
wayfinder-router ui is a local web app — per-feature contribution bars, a live
threshold slider, a calibration view, and a config editor with real validation.
A thin consumer of the same pure core; no secret ever appears in it.

Ship It as a Sidecar
A slim Dockerfile runs the gateway as a service, with a docker-compose example
that persists config and the feedback log. The library also runs in-process; the
CLI and UI are the operator surfaces.

The boundary

Wayfinder scores and recommends — it never invokes a model, picks a provider,
reads a credential, or tokenizes per a vendor model. The caller runs inference.
This is an early (Alpha) first release; the deterministic core is stable and
fully tested, and the gateway/UI layers are the place that growth will land.