feat: add per-alias enforce_limits toggle for pre-dispatch context check by darkspadez · Pull Request #167 · mcowger/plexus

darkspadez · 2026-04-15T05:07:02Z

Adds an opt-in boolean on model aliases that runs a fast token-estimation
check before dispatching to the upstream provider. When enabled, the
dispatcher rejects locally with a 400 context_length_exceeded error if the
estimated input tokens plus reserved output tokens exceed the model's
context window — avoiding a wasted upstream round-trip and an opaque
provider-side 400.

Behavior:

Toggle lives on ModelConfig alongside use_image_fallthrough (not in
metadata.overrides) — it is routing policy, not catalog data.
Reservation uses min(request.max_tokens, metadata.max_completion_tokens)
to minimize false rejections when the caller asked for a small
completion.
Fails open (with a debug log) when no context_length is known — can't
enforce what we don't know.
Reuses existing estimateInputTokens() heuristic (microseconds, no WASM)
with a 10% safety multiplier to cover the estimator's ±20–30% variance.
The context_length_exceeded code propagates through each endpoint's
native error envelope (chat/messages/responses/gemini).

Includes 12 new tests covering toggle on/off, oversized/under-limit,
missing metadata (fail-open), max_tokens-vs-metadata precedence, and all
four API shapes (chat, messages, gemini, responses).

mcowger · 2026-04-16T00:03:32Z

Please rebase this on main.

ALso, I've changed how I'm doing migrations - you'll no longer commit the journals, sql files etc. Please see CONTRIBUTING.md.

Also, please run a bun install after your rebase, which will configure the required git hooks and stuff

Adds an opt-in boolean on model aliases that runs a fast token-estimation check before dispatching to the upstream provider. When enabled, the dispatcher rejects locally with a 400 context_length_exceeded error if the estimated input tokens plus reserved output tokens exceed the model's context window — avoiding a wasted upstream round-trip and an opaque provider-side 400. Behavior: - Toggle lives on ModelConfig alongside use_image_fallthrough (not in metadata.overrides) — it is routing policy, not catalog data. - Reservation uses min(request.max_tokens, metadata.max_completion_tokens) to minimize false rejections when the caller asked for a small completion. - Fails open (with a debug log) when no context_length is known — can't enforce what we don't know. - Reuses existing estimateInputTokens() heuristic (microseconds, no WASM) with a 10% safety multiplier to cover the estimator's ±20–30% variance. - The context_length_exceeded code propagates through each endpoint's native error envelope (chat/messages/responses/gemini). Includes 12 new tests covering toggle on/off, oversized/under-limit, missing metadata (fail-open), max_tokens-vs-metadata precedence, and all four API shapes (chat, messages, gemini, responses). https://claude.ai/code/session_0118yZYx8rXc4oV2SFpAiBcF

- Move enforceContextLimit into the dispatcher's per-target loop, right after vision fallthrough completes and cooldown selects a live target. This validates the finalized (possibly fallthrough-expanded) prompt against the context window instead of the raw request. A thrown ContextLengthExceededError still escapes the loop since it's a client-side problem that failover can't resolve. - Stop passing the whole UnifiedChatRequest to estimateInputTokens when originalBody is absent — unified fields like tools/metadata/model would inflate the estimate. Defensive fallback is now a minimal { messages } body. - Clarify the Models page copy: the reservation is the *smaller* of max_tokens and the model's max completion, not an either/or.

…adata

…gle-O1uyF feat: add per-alias enforce_limits toggle for pre-dispatch context check

github-actions bot force-pushed the main branch from 44c5126 to 8f854d1 Compare April 16, 2026 03:06

darkspadez force-pushed the claude/add-enforce-limits-toggle-O1uyF branch from 34f5b7d to 61658b5 Compare April 16, 2026 04:36

claude and others added 3 commits April 15, 2026 22:54

feat(ui): warn when enforce_limits enabled without context_length met…

0201cfd

…adata

mcowger force-pushed the claude/add-enforce-limits-toggle-O1uyF branch from 61658b5 to 0201cfd Compare April 16, 2026 06:04

mcowger merged commit 5858376 into mcowger:main Apr 16, 2026
1 check failed

github-actions bot pushed a commit that referenced this pull request Apr 17, 2026

Merge pull request #167 from darkspadez/claude/add-enforce-limits-tog…

1ad703e

…gle-O1uyF feat: add per-alias enforce_limits toggle for pre-dispatch context check

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add per-alias enforce_limits toggle for pre-dispatch context check#167

feat: add per-alias enforce_limits toggle for pre-dispatch context check#167
mcowger merged 3 commits intomcowger:mainfrom
darkspadez:claude/add-enforce-limits-toggle-O1uyF

darkspadez commented Apr 15, 2026

Uh oh!

mcowger commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

darkspadez commented Apr 15, 2026

Uh oh!

mcowger commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants