fix(pricing): prevent cross-family fuzzy match + warn on fallback#143
Merged
fix(pricing): prevent cross-family fuzzy match + warn on fallback#143
Conversation
When get_pricing() resolves a model name via the longest-prefix or suffix-strip fallback paths, it silently returned a sibling model's ModelPricing — including its context_window — with no log line. Names like "claude-opus-4-1m-internal" inherited claude-opus-4's 200K window even though the suffix suggests 1M, and the dashboard / cost calc treated those numbers as authoritative. This change emits a one-time logging.warning per requested model name when get_pricing() returns a non-exact entry, naming both the requested model and the matched key. Exact matches, overrides, and unknown models (None) do not warn. De-duped via a module-level set so hot-loop callers don't spam logs. Behavior is otherwise unchanged — this is the smallest viable change suggested in #137 and is fully backward-compatible. Closes #137 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ix-strip code Tightens the longest-prefix fallback in get_pricing() to require a '-' delimiter after the matched key (`model.startswith(known_model + "-")`). Without the delimiter, names that share a textual prefix with a known key but belong to a different model family — e.g. claude-opus-4.7-high matching claude-opus-4 — silently inherited the wrong context_window and pricing. The four repro names from #137 now correctly return None and degrade gracefully (dashboard hides the bar; cost is null) rather than reporting confidently wrong data. Real versioned names like claude-sonnet-4-20250514 still match claude-sonnet-4 because the date suffix is preceded by '-'. Also removes the suffix-strip and suffix-strip+longest-prefix branches: they were unreachable because longest-prefix runs first and catches every name they would have simplified. Updates the strategy label in fuzzy-match warnings from "longest-prefix" to "versioned-suffix" to reflect the new, narrower matching semantics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fd6cd61 to
c4d7b52
Compare
jrob5756
added a commit
that referenced
this pull request
May 4, 2026
…ion (#129) * fix(copilot): pass streaming=True to SDK to prevent tool-call truncation The Copilot SDK's create_session accepts a 'streaming' parameter that defaults to false. In non-streaming mode the model must emit its entire turn (text + tool_use blocks + arguments) under a single per-turn output budget. For agents that issue large tool-call arguments — most commonly 'create' with multi-KB 'file_text' — that budget is exhausted mid-JSON and the CLI silently executes the partial tool call (path only, no file_text). The model sees the tool succeed with no content, retries the same broken call, and loops indefinitely until the wall-clock session limit fires (default 1800s). The interactive 'copilot' CLI defaults to streaming, which is why the same model + tool combination works there but not via the SDK without this flag. Empirically verified red→green on the same workflow + model (claude-opus-4.7-1m-internal, single ~50 KB create tool call): - Without streaming=True: 9m08s wall-clock failure, 0 bytes written (ProviderError: tool 'create' was executing). - With streaming=True: 4m57s success, 62 KB written in a single create call. Tests: - tests/test_providers/test_copilot_streaming.py — unit test that verifies create_session is called with streaming=True (and that the existing required kwargs are preserved). - tests/test_integration/test_copilot_large_write.py — opt-in (real_api marker) regression test that builds a workflow inline, asks the writer agent to produce a single large create call, and asserts the file is at least 30 KB. Skips automatically when no copilot CLI is available. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add changelog entry for streaming fix (#129) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add #107 and #109 to unreleased changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add #100, #110, #111, #139, #142, #143, #144 to unreleased changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #137. Two related changes to
get_pricing():1. Boundary check on prefix matching
The longest-prefix fallback now requires a
-delimiter after the matched key (model.startswith(known_model + "-")). Without it, names that share a textual prefix with a known key but belong to a different model family silently inherited the wrongcontext_windowand pricing. The four repro names from #137 now correctly returnNoneand degrade gracefully (dashboard hides the bar; cost isnull) rather than reporting confidently wrong data:claude-opus-4.7claude-opus-4(200K) ❌None✓claude-opus-4.7-highclaude-opus-4(200K) ❌None✓claude-opus-4.7-xhighclaude-opus-4(200K) ❌None✓claude-opus-4.7-1m-internalclaude-opus-4(200K) ❌None✓claude-sonnet-4-20250514claude-sonnet-4✓claude-sonnet-4✓claude-3-5-sonnet-latestclaude-3-5-sonnet✓claude-3-5-sonnet✓2. One-time warning on non-exact match
When a versioned-suffix match still happens (legitimate dated/
-latestnames),get_pricing()now emits a one-timelogging.warningper requested name, naming the requested model, the matched key, and the strategy. Exact matches, overrides, and unknown models (returningNone) do not warn. De-duped via a module-levelsetso hot-loop callers don't spam logs.3. Dead code removal
The suffix-strip branches in
get_pricing()were unreachable — longest-prefix runs first and catches anything they would have simplified. Removed for clarity.Behavior change note
This is technically a behavior change: any caller relying on the loose prefix matching (e.g. a hypothetical
gpt-5.4-minisilently inheritinggpt-5pricing) will now getNoneinstead. That's the asymmetric "unknown name degrades gracefully" path the issue reporter explicitly prefers — and matches the spirit of one of the alternatives proposed in #137 ("drop the longest-prefix step"; this is a softer version that preserves the dated-version use case).All existing tests pass unchanged, including
tests/test_providers/test_context_window.py::TestPrefixMatch.Changes
src/conductor/engine/pricing.pylogger,_FUZZY_MATCH_WARNED: set[str]for de-dupe._warn_fuzzy_match()helper.+ "-"delimiter check.tests/test_engine/test_pricing.py— newTestFuzzyMatchWarningsclass (7 tests):None, no warnTesting