feat: add support for tiered model pricing#67605
Conversation
🔒 Aisle Security AnalysisWe found 2 potential security issue(s) in this PR:
1. 🟡 Unbounded response buffering in pricing catalog fetch (memory/availability DoS)
Description
Vulnerable code: const buffer = await response.arrayBuffer();
if (buffer.byteLength > MAX_PRICING_CATALOG_BYTES) {
throw new Error(`${source} pricing response too large: ${buffer.byteLength} bytes`);
}RecommendationAvoid Example (Node/undici fetch): async function readJsonObjectWithLimit(response: Response, source: string) {
const reader = response.body?.getReader();
if (!reader) throw new Error(`${source} response has no body`);
const chunks: Uint8Array[] = [];
let total = 0;
while (true) {
const { value, done } = await reader.read();
if (done) break;
if (!value) continue;
total += value.byteLength;
if (total > MAX_PRICING_CATALOG_BYTES) {
try { await reader.cancel(); } catch {}
throw new Error(`${source} pricing response too large: >${MAX_PRICING_CATALOG_BYTES} bytes`);
}
chunks.push(value);
}
const buffer = Buffer.concat(chunks, total);
const payload = JSON.parse(buffer.toString("utf8"));
if (!payload || typeof payload !== "object" || Array.isArray(payload)) {
throw new Error(`${source} pricing response is not a JSON object`);
}
return payload as Record<string, unknown>;
}Additionally consider limiting decompressed size (streaming check covers this) and setting reasonable fetch/agent limits (timeouts, max response size) to reduce DoS risk. 2. 🟡 Unbounded concurrency in channels.status can exhaust resources (DoS)
Description
Because Vulnerable code: const { results } = await runTasksWithConcurrency({
tasks: accountIds.map(...),
limit: probe ? CHANNEL_STATUS_PROBE_CONCURRENCY : accountIds.length || 1,
});
...
const { results: channelResults } = await runTasksWithConcurrency({
tasks: plugins.map(...),
limit: probe ? CHANNEL_STATUS_PROBE_CONCURRENCY : plugins.length || 1,
});RecommendationAlways cap concurrency regardless of For example: const MAX_STATUS_CONCURRENCY = 10; // tune per deployment
const accountLimit = probe ? CHANNEL_STATUS_PROBE_CONCURRENCY : MAX_STATUS_CONCURRENCY;
const pluginLimit = probe ? CHANNEL_STATUS_PROBE_CONCURRENCY : MAX_STATUS_CONCURRENCY;
await runTasksWithConcurrency({ tasks, limit: accountLimit });
await runTasksWithConcurrency({ tasks: pluginTasks, limit: pluginLimit });Additionally consider:
Analyzed PR: #67605 at commit Last updated on: 2026-04-21T01:59:59Z |
Evidence — Manual Test TranscriptConfig used: "cost": {
"tieredPricing": [
{ "input": 1, "output": 1, "cacheRead": 0, "cacheWrite": 0, "range": [0, 13000] },
{ "input": 10000, "output": 10000, "cacheRead": 0, "cacheWrite": 0, "range": [13000, 128000] },
{ "input": 10000, "output": 10000, "cacheRead": 0, "cacheWrite": 0, "range": [128000] }
]
}Test session (GLM-5V-Turbo via OpenRouter, context window 128k):
Key observations:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 25d509ab83
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Greptile SummaryThis PR fixes three integration gaps that blocked end-to-end tiered pricing: the Zod The core Confidence Score: 5/5Safe to merge — all remaining findings are P2 style/type improvements, no logic bugs found. The bug fixes are correct (Zod schema, resolveModelCost passthrough, scanTranscriptFile recompute). computeTieredCost math is verified against test comments. LiteLLM fetch has proper timeout and graceful fallback. The two P2 comments (duplicate branches, range type width) don't block correctness. No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/infra/session-cost-usage.ts
Line: 258-264
Comment:
**Redundant duplicate branches**
Both the `if` and `else if` bodies call `estimateUsageCost` with identical arguments; the only difference is the condition under which the call happens. These can be collapsed into a single branch.
```suggestion
if ((cost?.tieredPricing && cost.tieredPricing.length > 0) || entry.costTotal === undefined) {
// When tiered pricing is configured, always recompute to override
// the flat-rate cost that the transport layer wrote into the transcript.
// Otherwise, only fill in missing cost estimates.
entry.costTotal = estimateUsageCost({ usage: entry.usage, cost });
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/config/types.models.ts
Line: 66-72
Comment:
**`range: [number, number]` doesn't match the documented one-element shorthand**
The PR description documents `[start]` (one-element array) as the supported shorthand for an unbounded top tier. The Zod schema correctly accepts it via `z.union([z.tuple([z.number(), z.number()]), z.tuple([z.number()])])`, and `normalizeTieredPricing` normalizes it to `[start, Infinity]` at load time. However, this TypeScript type still says `[number, number]`, so any TypeScript caller (e.g. an extension author) who writes `range: [128_000]` will get a compile error and have no indication that the JSON shorthand exists.
Widening the type to match the Zod schema input makes the contract self-documenting:
```suggestion
tieredPricing?: Array<{
input: number;
output: number;
cacheRead: number;
cacheWrite: number;
/** Bounded tier: `[start, end)`. Open-ended top tier: `[start]` (normalized to `[start, Infinity]` at load time). */
range: [number, number] | [number];
}>;
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "feat: add support for tiered model prici..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 344e4eb584
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f32a6c53c5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab4b3724f9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| fetchOpenRouterPricingCatalog(fetchImpl).catch((error: unknown) => { | ||
| log.warn(`OpenRouter pricing fetch failed: ${String(error)}`); | ||
| return new Map<string, OpenRouterPricingEntry>(); | ||
| }), |
There was a problem hiding this comment.
Preserve existing entries when one pricing source fails
Catching a catalog fetch error and returning an empty map makes a transient source outage look like a successful empty response; if the other source still returns at least one model, the later full-cache replacement drops every model that depended on the failed source. In a mixed config (for example, OpenRouter-only models plus a LiteLLM-tiered model), a single OpenRouter failure will silently remove OpenRouter prices until the next successful refresh, regressing cost lookups despite having a previously healthy cache.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed — The .catch() handlers now track which source failed (openRouterFailed / litellmFailed). On partial failure, models missing from nextPricing are back-filled from the existing cache, so a single-source outage no longer silently drops pricing for models that depended on the failed source. When both sources fail, the entire existing cache is retained as before.
| @@ -36,6 +37,8 @@ type OpenRouterModelPayload = { | |||
| export { getCachedGatewayModelPricing }; | |||
|
|
|||
| const OPENROUTER_MODELS_URL = "https://openrouter.ai/api/v1/models"; | |||
| const LITELLM_PRICING_URL = | |||
| "https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json"; | |||
There was a problem hiding this comment.
/nit this is technically not an official API, and although should be stable, it's a bit odd to me to depend on it at runtime. Something like pulling into the repo during a build step would be more reliable. Although not consequential as we handle the failures gracefully.
There was a problem hiding this comment.
Acknowledged — agreed it's not an official API. We handle fetch failures gracefully (empty catalog → retain existing cache), so the risk is limited. We can consider vendoring the JSON at build time in a follow-up if this becomes fragile.
| @@ -2767,6 +2767,51 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = { | |||
| cacheWrite: { | |||
| type: "number", | |||
| }, | |||
| tieredPricing: { | |||
There was a problem hiding this comment.
Looks like docs/.generated/config-baseline.sha256 should be updated and committed
| continue; | ||
| } | ||
| const inputInTier = Math.min(inputRemaining, tierWidth); | ||
| const fraction = inputInTier / totalInputTokens; |
There was a problem hiding this comment.
Does the billing actually apply proportionally within the request? Or is it a fixed cost per request based on the token threshold?
There was a problem hiding this comment.
It's proportional. The algorithm splits the input tokens across tier boundaries, then uses each tier's share as a fraction to apportion output/cache costs proportionally. For example, with 40k input tokens and tiers at [0, 32k) and [32k, 128k): 80% of input falls in tier 1 and 20% in tier 2, so output and cache costs are also split 80/20 across those tiers. This matches how providers like Anthropic and Google bill — the per-token rate depends on which tier the token falls into, not a single fixed rate for the whole request.
There was a problem hiding this comment.
I know the code applying proportionally, but is this actually aligned with how most/all providers do the billing? I thought not.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 850a516792
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Only use provider-qualified key to avoid cross-provider pricing collisions. | ||
| return params.catalog.get(`${params.ref.provider}/${params.ref.model}`); |
There was a problem hiding this comment.
Resolve LiteLLM tiers for wrapped OpenRouter refs
collectConfiguredModelPricingRefs includes wrapper-style refs like openrouter/anthropic/claude-sonnet-4-6, but this lookup only queries LiteLLM with the literal provider/model key from that ref. That means it searches for openrouter/anthropic/... instead of the underlying provider key (anthropic/...), so litellmPricing is missed and the merge path never adds tieredPricing for wrapped OpenRouter models. In practice, tiered pricing silently does not apply for those models even when LiteLLM has tier data.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed — resolveLiteLLMPricingForRef now unwraps wrapper-provider refs (e.g. openrouter/anthropic/claude-sonnet-4-6 → looks up anthropic/claude-sonnet-4-6 in the LiteLLM catalog), consistent with how OpenRouter catalog lookups already handle nested refs. This ensures tiered pricing is correctly resolved for wrapped models.
There was a problem hiding this comment.
Fixed — resolveLiteLLMPricingForRef now unwraps wrapper-provider refs (e.g. openrouter/anthropic/claude-sonnet-4-6 → looks up anthropic/claude-sonnet-4-6 in the LiteLLM catalog), consistent with how OpenRouter catalog lookups already handle nested refs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74dbb06fa2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for (const result of results) { | ||
| if (result.status === "fulfilled") { | ||
| resolvedAccounts[result.value.accountId] = result.value.account; | ||
| accounts.push(result.value.snapshot); | ||
| } |
There was a problem hiding this comment.
Preserve default account snapshot when account tasks reject
Filtering Promise.allSettled results down to only fulfilled entries drops failed accounts entirely, which creates a new path where the default account snapshot is missing even though other accounts succeeded. In that case defaultAccount falls back to another account (accounts[0]), so buildChannelSummary can combine the default account id/object with a non-default snapshot and report incorrect health/config state for the default channel account.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed — Rejected account tasks now produce a minimal degraded snapshot (configured: false, error) instead of being silently dropped. This ensures accounts.find() always locates the default account entry, preventing the fallback to accounts[0] from mixing a non-default account's snapshot with the default account's identity.
| if (result.status === "fulfilled") { | ||
| channelsMap[result.value.pluginId] = result.value.summary; | ||
| accountsMap[result.value.pluginId] = result.value.accounts; | ||
| defaultAccountIdMap[result.value.pluginId] = result.value.defaultAccountId; | ||
| } |
There was a problem hiding this comment.
Surface rejected channel-status jobs instead of dropping channels
At the channel level, rejected tasks from Promise.allSettled are silently ignored and the handler still returns success, so a thrown buildChannelAccounts/buildChannelSummary causes that plugin to disappear from channels, channelAccounts, and channelDefaultAccountId. Because the CLI formatter only prints channels present in these payload maps, users get an incomplete status report with no error signal instead of seeing the failed channel as degraded.
Useful? React with 👍 / 👎.
74dbb06 to
9374604
Compare
|
Landed in b938e63.\n\nMaintainer follow-up before merge:\n- Replayed the PR on current main without the old merge-from-main noise.\n- Added latest bundled Moonshot Kimi K2.6/K2.5 cost estimates.\n- Hardened tiered range billing, oversized remote pricing catalogs, and channel status probe concurrency/timeout.\n- Added changelog credit.\n\nVerification:\n- pnpm test src/utils/usage-format.test.ts src/gateway/model-pricing-cache.test.ts src/gateway/server-methods/channels.status.test.ts extensions/moonshot/provider-catalog.test.ts\n- pnpm check:changed\n- pnpm check\n- GitHub checks green\n\nThanks @sliverp. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 937460463c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!nextPricing.has(key)) { | ||
| const existing = getCachedGatewayModelPricing({ |
There was a problem hiding this comment.
Preserve tiered cache data during LiteLLM fetch outages
When litellmFailed is true, the refresh only back-fills models that are completely missing from nextPricing. For models that still resolve via OpenRouter, the key is present, so previously cached tieredPricing is dropped and replaced with flat-only pricing for the outage window. This causes temporary cost regressions (tiered → flat) even though valid tier data is already in memory; the fallback should also preserve existing tier metadata for overlapping keys when the LiteLLM source failed.
Useful? React with 👍 / 👎.
Adds tiered model pricing support for cost tracking, keeps configured pricing ahead of cached catalog values, and includes latest Moonshot Kimi K2.6/K2.5 cost estimates.\n\nThanks @sliverp.
Summary
computeTieredCostfunction andModelCostConfigtype existed internally but were never wired into the config-to-display pipeline.calculateCost()in@mariozechner/pi-aiis untouched. Tiered cost recomputation happens at read time inscanTranscriptFile.Change Type (select all)
Regression Test Plan (if applicable)
src/config/zod-schema.core.test.ts,src/utils/usage-format.test.ts,src/infra/session-cost-usage.test.tscostwith validtieredPricingarray, (2)resolveModelCostpreservestieredPricingin output, (3)scanTranscriptFilecomputes cost using tiered pricing when configured.User-visible / Behavior Changes
tieredPricinginsidecostinopenclaw.jsonmodel definitions.tieredPricingin their model catalogcostobjects./usagepage displays tiered-pricing-computed costs whentieredPricingis configured.tieredPricingrange format:[start, end]for bounded tiers,[start]for unbounded top tier.tieredPricingis present, it takes priority over flat-rateinput/outputfields.Diagram (if applicable)
Security Impact (required)
Repro + Verification
Environment
Steps
tieredPricingto a model'scostinopenclaw.json/usagepageExpected
Human Verification (required)
range: [128000](single-element unbounded tier), tieredPricing-only cost (no top-level input/output fields)/usagepage after rebuild (pending rebuild + restart)OpenRouter / LiteLLM Merge Priority Strategy
Design Principle
The gateway pricing cache merges data from two sources with different strengths:
Merge Rules
Priority Order
Rationale
LiteLLM's tiered data adds information that OpenRouter doesn't provide, so it takes priority. But for flat pricing, OpenRouter tends to be more accurate, so when both only offer flat rates, OpenRouter is the authoritative source.