Skip to content

feat: add support for tiered model pricing#67605

Merged
steipete merged 6 commits intomainfrom
feat/supprt-tiered-token-pricing
Apr 21, 2026
Merged

feat: add support for tiered model pricing#67605
steipete merged 6 commits intomainfrom
feat/supprt-tiered-token-pricing

Conversation

@sliverp
Copy link
Copy Markdown
Member

@sliverp sliverp commented Apr 16, 2026

Summary

  • Problem: There is no way to configure context-dependent token pricing (e.g. cheaper rates under 32K tokens, premium rates above 128K). The computeTieredCost function and ModelCostConfig type existed internally but were never wired into the config-to-display pipeline.
  • Why it matters: Many model providers charge different rates based on context length. Without tiered pricing support, cost tracking is inaccurate for these models.
  • What changed: Completed the tiered pricing pipeline across 4 files — config validation, cost resolution, cost type propagation, and usage display recomputation.
  • What did NOT change (scope boundary): Transport-layer calculateCost() in @mariozechner/pi-ai is untouched. Tiered cost recomputation happens at read time in scanTranscriptFile.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Regression Test Plan (if applicable)

  • Coverage level:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/config/zod-schema.core.test.ts, src/utils/usage-format.test.ts, src/infra/session-cost-usage.test.ts
  • Scenario the test should lock in: (1) Zod schema accepts cost with valid tieredPricing array, (2) resolveModelCost preserves tieredPricing in output, (3) scanTranscriptFile computes cost using tiered pricing when configured.
  • Why this is the smallest reliable guardrail: Unit tests on each changed function directly verify the feature without requiring a running gateway.

User-visible / Behavior Changes

  • Users can now configure tieredPricing inside cost in openclaw.json model definitions.
  • Extension authors can declare tieredPricing in their model catalog cost objects.
  • Web UI /usage page displays tiered-pricing-computed costs when tieredPricing is configured.
  • tieredPricing range format: [start, end] for bounded tiers, [start] for unbounded top tier.
  • When tieredPricing is present, it takes priority over flat-rate input/output fields.

Diagram (if applicable)

## Cost Resolution (resolveModelCostConfig — multi-layer lookup)

                         ┌─────────────────────────────┐
                         │   resolveModelCostConfig()   │
                         └──────────┬──────────────────┘
                                    │
              ┌─────────────────────┼─────────────────────────────┐
              ▼                     ▼                             ▼
    Layer 1: models.json    Layer 2: config.models       Layer 3: Gateway cache
    (auto-generated from    .providers[].models[]        (OpenRouter / LiteLLM
     openclaw.json)         (openclaw.json + extension    remote pricing)
                             catalogs)
              │                     │                             │
              │  ← first match wins, short-circuits →            │
              └─────────────────────┴─────────────────────────────┘
                                    │
                                    ▼
                    ┌───────────────────────────────┐
                    │  ModelCostConfig returned      │
                    │  (may include tieredPricing)   │
                    └───────────┬───────────────────┘
                                │
                                ▼
                    ┌───────────────────────────────┐
                    │    estimateUsageCost()         │
                    │                               │
                    │  tieredPricing exists?         │
                    │    YES → computeTieredCost()   │
                    │    NO  → flat-rate calculation │
                    └───────────┬───────────────────┘
                                │
                  ┌─────────────┴──────────────┐
                  ▼                            ▼
        In-chat display              Web UI /usage page
    [agent-runner-usage-line.ts]   [session-cost-usage.ts]
                                   (recomputes when tieredPricing
                                    is configured, overriding
                                    transport flat-rate value)


## Config Ingestion (how tieredPricing enters the system)

  openclaw.json                    Extension model catalog
  (user config)                    (e.g. extensions/deepseek/models.ts)
        │                                    │
        ▼                                    ▼
  Zod validates tieredPricing        ModelDefinitionConfig type
  [zod-schema.core.ts]              supports tieredPricing
        │                                    │
        ▼                                    ▼
  resolveModelCost()              catalog/onboard registration
  preserves tieredPricing          → config.models.providers
  [defaults.ts]                          │
        │                                │
        ├───── models.json (auto-sync) ──┘
        │
        ▼
  buildProviderCostIndex()
  normalizes tier ranges
  (e.g. [128000] → [128000, Infinity])
  [usage-format.ts]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Node.js / OpenClaw gateway
  • Model/provider: OpenRouter / z-ai/glm-5v-turbo
  • Relevant config (redacted):
"cost": {
  "tieredPricing": [
    { "input": 1, "output": 1, "cacheRead": 0, "cacheWrite": 0, "range": [0, 13000] },
    { "input": 10000, "output": 10000, "cacheRead": 0, "cacheWrite": 0, "range": [13000, 128000] },
    { "input": 10000, "output": 10000, "cacheRead": 0, "cacheWrite": 0, "range": [128000] }
  ]
}

Steps

  1. Add tieredPricing to a model's cost in openclaw.json
  2. Build and restart gateway
  3. Send messages to accumulate tokens past the first tier boundary
  4. Check in-chat cost reporting and Web UI /usage page

Expected

  • Config loads successfully
  • Cost stays low while within tier 1 range
  • Cost jumps when crossing into tier 2 range
  • New session resets to tier 1 pricing

Human Verification (required)

  • Verified scenarios: Config accepts tieredPricing, in-chat cost uses tiered rates, cost jumps at tier boundary, new session resets to lower tier
  • Edge cases checked: range: [128000] (single-element unbounded tier), tieredPricing-only cost (no top-level input/output fields)
  • What you did not verify: Web UI /usage page after rebuild (pending rebuild + restart)

OpenRouter / LiteLLM Merge Priority Strategy

Design Principle

The gateway pricing cache merges data from two sources with different strengths:

  • OpenRouter — provides more accurate flat pricing (base input/output rates).
  • LiteLLM — provides richer tiered pricing data (context-window-dependent rates).

Merge Rules

Scenario Strategy
OpenRouter ✅ + LiteLLM ✅ (has tiers) Use OpenRouter as base pricing, overlay LiteLLM tiered pricing
OpenRouter ✅ + LiteLLM ✅ (no tiers) Prefer OpenRouter flat pricing
OpenRouter ✅ + LiteLLM ❌ Use OpenRouter
OpenRouter ❌ + LiteLLM ✅ Use LiteLLM

Priority Order

  1. LiteLLM tiered pricing — highest value data, always preferred when available
  2. OpenRouter flat pricing — more accurate base rates, preferred over LiteLLM flat rates
  3. LiteLLM flat pricing — fallback when OpenRouter has no data

Rationale

LiteLLM's tiered data adds information that OpenRouter doesn't provide, so it takes priority. But for flat pricing, OpenRouter tends to be more accurate, so when both only offer flat rates, OpenRouter is the authoritative source.

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 16, 2026

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Unbounded response buffering in pricing catalog fetch (memory/availability DoS)
2 🟡 Medium Unbounded concurrency in channels.status can exhaust resources (DoS)
1. 🟡 Unbounded response buffering in pricing catalog fetch (memory/availability DoS)
Property Value
Severity Medium
CWE CWE-400
Location src/gateway/model-pricing-cache.ts:127-139

Description

readPricingJsonObject() attempts to enforce a 5MB maximum pricing catalog size, but it uses await response.arrayBuffer() which buffers the entire HTTP response body in memory before enforcing the byte limit.

  • If the upstream server omits or lies about Content-Length, the pre-check can be bypassed.
  • A large response body (or decompression-expanded body) can be fully downloaded/allocated, potentially causing excessive memory usage or process termination (OOM) before the post-check runs.

Vulnerable code:

const buffer = await response.arrayBuffer();
if (buffer.byteLength > MAX_PRICING_CATALOG_BYTES) {
  throw new Error(`${source} pricing response too large: ${buffer.byteLength} bytes`);
}

Recommendation

Avoid arrayBuffer() / response.json() for untrusted or potentially large responses. Stream the response body and enforce a hard cap while reading, aborting once the limit is exceeded.

Example (Node/undici fetch):

async function readJsonObjectWithLimit(response: Response, source: string) {
  const reader = response.body?.getReader();
  if (!reader) throw new Error(`${source} response has no body`);

  const chunks: Uint8Array[] = [];
  let total = 0;

  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    if (!value) continue;

    total += value.byteLength;
    if (total > MAX_PRICING_CATALOG_BYTES) {
      try { await reader.cancel(); } catch {}
      throw new Error(`${source} pricing response too large: >${MAX_PRICING_CATALOG_BYTES} bytes`);
    }
    chunks.push(value);
  }

  const buffer = Buffer.concat(chunks, total);
  const payload = JSON.parse(buffer.toString("utf8"));
  if (!payload || typeof payload !== "object" || Array.isArray(payload)) {
    throw new Error(`${source} pricing response is not a JSON object`);
  }
  return payload as Record<string, unknown>;
}

Additionally consider limiting decompressed size (streaming check covers this) and setting reasonable fetch/agent limits (timeouts, max response size) to reduce DoS risk.

2. 🟡 Unbounded concurrency in channels.status can exhaust resources (DoS)
Property Value
Severity Medium
CWE CWE-400
Location src/gateway/server-methods/channels.ts:270-327

Description

channels.status changed from sequential processing to parallel execution using runTasksWithConcurrency. For non-probe requests (probe === false), the concurrency limit is set to the full size of the task list:

  • limit: accountIds.length || 1 for per-account snapshot building
  • limit: plugins.length || 1 for per-plugin summary building

Because runTasksWithConcurrency spawns resolvedLimit worker promises (up to tasks.length), a large number of configured accounts/plugins can cause a burst of simultaneous async work (plugin hooks, I/O, CPU) and memory usage. An authenticated caller with READ scope can repeatedly invoke channels.status to overload the gateway.

Vulnerable code:

const { results } = await runTasksWithConcurrency({
  tasks: accountIds.map(...),
  limit: probe ? CHANNEL_STATUS_PROBE_CONCURRENCY : accountIds.length || 1,
});
...
const { results: channelResults } = await runTasksWithConcurrency({
  tasks: plugins.map(...),
  limit: probe ? CHANNEL_STATUS_PROBE_CONCURRENCY : plugins.length || 1,
});

Recommendation

Always cap concurrency regardless of probe to prevent resource exhaustion.

For example:

const MAX_STATUS_CONCURRENCY = 10; // tune per deployment

const accountLimit = probe ? CHANNEL_STATUS_PROBE_CONCURRENCY : MAX_STATUS_CONCURRENCY;
const pluginLimit = probe ? CHANNEL_STATUS_PROBE_CONCURRENCY : MAX_STATUS_CONCURRENCY;

await runTasksWithConcurrency({ tasks, limit: accountLimit });
await runTasksWithConcurrency({ tasks: pluginTasks, limit: pluginLimit });

Additionally consider:

  • applying server-side rate limiting for channels.status
  • enforcing maximum total accounts processed per request (or pagination)

Analyzed PR: #67605 at commit 9374604

Last updated on: 2026-04-21T01:59:59Z

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: L maintainer Maintainer-authored PR labels Apr 16, 2026
@sliverp
Copy link
Copy Markdown
Member Author

sliverp commented Apr 16, 2026

Evidence — Manual Test Transcript

Config used:

"cost": {
  "tieredPricing": [
    { "input": 1, "output": 1, "cacheRead": 0, "cacheWrite": 0, "range": [0, 13000] },
    { "input": 10000, "output": 10000, "cacheRead": 0, "cacheWrite": 0, "range": [13000, 128000] },
    { "input": 10000, "output": 10000, "cacheRead": 0, "cacheWrite": 0, "range": [128000] }
  ]
}

Test session (GLM-5V-Turbo via OpenRouter, context window 128k):

Turn Action Input tokens Output tokens Context % Cumulative cost Notes
1 /new — fresh session 9.9k 338 9% $0.01 All tokens within tier 1 [0, 13000) at $1/M — cost is negligible as expected
2 Asked for 1000-word explanation of "attention" ~11k ~1.2k 10% $0.01 Still within tier 1, cost barely changes
3 Asked for 1000-word explanation of "convolution" ~12k ~1.2k 12% $0.01 Approaching tier 1 boundary (13k)
4 /cost — checked billing 14k 222 13% $13.31 Input crossed 13k boundary into tier 2 [13000, 128000) at $10,000/M — cost jumped sharply, confirming tiered pricing is active
5 /new — started fresh session 11k 63 12% $0.01 New session resets context below 13k, back to tier 1 pricing — cost drops back to $0.01

Key observations:

  1. Tier boundary works: Cost stayed at $0.01 while input tokens were under 13k (tier 1 at $1/M). Once input crossed 13k, cost jumped to $13.31 — the overflow tokens were billed at tier 2's $10,000/M rate.
  2. New session resets correctly: After /new, context dropped below 13k, and cost returned to $0.01, proving tiered pricing is evaluated per-request based on actual input token count.
  3. Config reload succeeded: No Unrecognized key: "tieredPricing" errors after rebuild — Zod schema fix confirmed working.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25d509ab83

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/gateway/model-pricing-cache.ts Outdated
Comment thread src/gateway/model-pricing-cache.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 16, 2026

Greptile Summary

This PR fixes three integration gaps that blocked end-to-end tiered pricing: the Zod .strict() schema now accepts tieredPricing, resolveModelCost() passes it through, and scanTranscriptFile recomputes costs at read time when tiered pricing is configured. A new parallel LiteLLM catalog fetch enriches the pricing cache with per-model tier data from open-source metadata.

The core computeTieredCost logic and normalization are correct, and the test suite is thorough. Two minor cleanup opportunities remain (see inline comments).

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/type improvements, no logic bugs found.

The bug fixes are correct (Zod schema, resolveModelCost passthrough, scanTranscriptFile recompute). computeTieredCost math is verified against test comments. LiteLLM fetch has proper timeout and graceful fallback. The two P2 comments (duplicate branches, range type width) don't block correctness.

No files require special attention.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/session-cost-usage.ts
Line: 258-264

Comment:
**Redundant duplicate branches**

Both the `if` and `else if` bodies call `estimateUsageCost` with identical arguments; the only difference is the condition under which the call happens. These can be collapsed into a single branch.

```suggestion
      if ((cost?.tieredPricing && cost.tieredPricing.length > 0) || entry.costTotal === undefined) {
        // When tiered pricing is configured, always recompute to override
        // the flat-rate cost that the transport layer wrote into the transcript.
        // Otherwise, only fill in missing cost estimates.
        entry.costTotal = estimateUsageCost({ usage: entry.usage, cost });
      }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/config/types.models.ts
Line: 66-72

Comment:
**`range: [number, number]` doesn't match the documented one-element shorthand**

The PR description documents `[start]` (one-element array) as the supported shorthand for an unbounded top tier. The Zod schema correctly accepts it via `z.union([z.tuple([z.number(), z.number()]), z.tuple([z.number()])])`, and `normalizeTieredPricing` normalizes it to `[start, Infinity]` at load time. However, this TypeScript type still says `[number, number]`, so any TypeScript caller (e.g. an extension author) who writes `range: [128_000]` will get a compile error and have no indication that the JSON shorthand exists.

Widening the type to match the Zod schema input makes the contract self-documenting:

```suggestion
    tieredPricing?: Array<{
      input: number;
      output: number;
      cacheRead: number;
      cacheWrite: number;
      /** Bounded tier: `[start, end)`. Open-ended top tier: `[start]` (normalized to `[start, Infinity]` at load time). */
      range: [number, number] | [number];
    }>;
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat: add support for tiered model prici..." | Re-trigger Greptile

Comment thread src/infra/session-cost-usage.ts
Comment thread src/config/types.models.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 344e4eb584

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/infra/session-cost-usage.ts Outdated
Comment thread src/utils/usage-format.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f32a6c53c5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/gateway/model-pricing-cache.ts Outdated
Comment thread src/gateway/model-pricing-cache.ts
@sliverp
Copy link
Copy Markdown
Member Author

sliverp commented Apr 16, 2026

🔒 Aisle Security Analysis

We found 4 potential security issue(s) in this PR:

Severity Title

1 🟡 Medium Unbounded remote JSON fetch/parse can cause memory/CPU exhaustion in LiteLLM pricing catalog loader
2 🟡 Medium Runtime fetch of mutable GitHub raw pricing catalog without integrity/pinning allows tampering of cost calculations
3 🟡 Medium Incorrect tiered pricing validation allows malformed ranges (Infinity/out-of-order) to skew cost calculation
4 🟡 Medium Stale gateway pricing cache can be retained indefinitely on upstream fetch failures (billing integrity risk)

  1. Won't fix — The fetch URL is a hardcoded GitHub raw link to LiteLLM's official pricing data, not user-controlled input. This is a CLI tool, not a long-running server, so the blast radius of an unexpectedly large response is limited to a single process. The current payload is ~hundreds of KB. Adding streaming size limits would add complexity disproportionate to the actual risk.
  2. Won't fix — The pricing data is advisory/informational only — it serves as a cost reference estimate for the user, not an actual billing or payment mechanism. OpenClaw does not enforce budgets, process charges, or make access-control decisions based on these numbers. Pinning to a commit SHA would require constant manual updates to track LiteLLM's frequent model additions, defeating the purpose of runtime fetching. Users who need precise cost tracking can override prices via local config.
  3. Won't fix — Same rationale as the earlier P2 (tier start boundaries). Tiered pricing data comes from two controlled sources: LiteLLM's upstream catalog (always sorted, contiguous, starting from 0) and user/extension config (self-service, user assumes responsibility). Adding full range validation adds complexity for a scenario that doesn't occur with actual data sources. If a user manually writes malformed tiers in their config, that's on them.
  4. Won't fix — As noted in previous responses, pricing data is advisory only — it provides cost reference estimates, not billing controls. OpenClaw does not enforce budgets, process payments, or gate access based on cached pricing. Retaining a stale cache is strictly better than replacing it with nothing (which was the original bug). Adding a staleness expiry that clears the cache would reintroduce the exact problem this guard was designed to solve — users seeing $0.00 costs. An attacker who can persistently block outbound HTTPS has far more impactful attack vectors than pinning a display-only cost estimate.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab4b3724f9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +506 to +509
fetchOpenRouterPricingCatalog(fetchImpl).catch((error: unknown) => {
log.warn(`OpenRouter pricing fetch failed: ${String(error)}`);
return new Map<string, OpenRouterPricingEntry>();
}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve existing entries when one pricing source fails

Catching a catalog fetch error and returning an empty map makes a transient source outage look like a successful empty response; if the other source still returns at least one model, the later full-cache replacement drops every model that depended on the failed source. In a mixed config (for example, OpenRouter-only models plus a LiteLLM-tiered model), a single OpenRouter failure will silently remove OpenRouter prices until the next successful refresh, regressing cost lookups despite having a previously healthy cache.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — The .catch() handlers now track which source failed (openRouterFailed / litellmFailed). On partial failure, models missing from nextPricing are back-filled from the existing cache, so a single-source outage no longer silently drops pricing for models that depended on the failed source. When both sources fail, the entire existing cache is retained as before.

Comment thread src/utils/usage-format.ts
@@ -36,6 +37,8 @@ type OpenRouterModelPayload = {
export { getCachedGatewayModelPricing };

const OPENROUTER_MODELS_URL = "https://openrouter.ai/api/v1/models";
const LITELLM_PRICING_URL =
"https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/nit this is technically not an official API, and although should be stable, it's a bit odd to me to depend on it at runtime. Something like pulling into the repo during a build step would be more reliable. Although not consequential as we handle the failures gracefully.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — agreed it's not an official API. We handle fetch failures gracefully (empty catalog → retain existing cache), so the risk is limited. We can consider vendoring the JSON at build time in a follow-up if this becomes fragile.

@@ -2767,6 +2767,51 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
cacheWrite: {
type: "number",
},
tieredPricing: {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like docs/.generated/config-baseline.sha256 should be updated and committed

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread src/utils/usage-format.ts
continue;
}
const inputInTier = Math.min(inputRemaining, tierWidth);
const fraction = inputInTier / totalInputTokens;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the billing actually apply proportionally within the request? Or is it a fixed cost per request based on the token threshold?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's proportional. The algorithm splits the input tokens across tier boundaries, then uses each tier's share as a fraction to apportion output/cache costs proportionally. For example, with 40k input tokens and tiers at [0, 32k) and [32k, 128k): 80% of input falls in tier 1 and 20% in tier 2, so output and cache costs are also split 80/20 across those tiers. This matches how providers like Anthropic and Google bill — the per-token rate depends on which tier the token falls into, not a single fixed rate for the whole request.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the code applying proportionally, but is this actually aligned with how most/all providers do the billing? I thought not.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 850a516792

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +225 to +226
// Only use provider-qualified key to avoid cross-provider pricing collisions.
return params.catalog.get(`${params.ref.provider}/${params.ref.model}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve LiteLLM tiers for wrapped OpenRouter refs

collectConfiguredModelPricingRefs includes wrapper-style refs like openrouter/anthropic/claude-sonnet-4-6, but this lookup only queries LiteLLM with the literal provider/model key from that ref. That means it searches for openrouter/anthropic/... instead of the underlying provider key (anthropic/...), so litellmPricing is missed and the merge path never adds tieredPricing for wrapped OpenRouter models. In practice, tiered pricing silently does not apply for those models even when LiteLLM has tier data.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixedresolveLiteLLMPricingForRef now unwraps wrapper-provider refs (e.g. openrouter/anthropic/claude-sonnet-4-6 → looks up anthropic/claude-sonnet-4-6 in the LiteLLM catalog), consistent with how OpenRouter catalog lookups already handle nested refs. This ensures tiered pricing is correctly resolved for wrapped models.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixedresolveLiteLLMPricingForRef now unwraps wrapper-provider refs (e.g. openrouter/anthropic/claude-sonnet-4-6 → looks up anthropic/claude-sonnet-4-6 in the LiteLLM catalog), consistent with how OpenRouter catalog lookups already handle nested refs.

@openclaw-barnacle openclaw-barnacle Bot added the commands Command implementations label Apr 20, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74dbb06fa2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +264 to 268
for (const result of results) {
if (result.status === "fulfilled") {
resolvedAccounts[result.value.accountId] = result.value.account;
accounts.push(result.value.snapshot);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve default account snapshot when account tasks reject

Filtering Promise.allSettled results down to only fulfilled entries drops failed accounts entirely, which creates a new path where the default account snapshot is missing even though other accounts succeeded. In that case defaultAccount falls back to another account (accounts[0]), so buildChannelSummary can combine the default account id/object with a non-default snapshot and report incorrect health/config state for the default channel account.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — Rejected account tasks now produce a minimal degraded snapshot (configured: false, error) instead of being silently dropped. This ensures accounts.find() always locates the default account entry, preventing the fallback to accounts[0] from mixing a non-default account's snapshot with the default account's identity.

Comment thread src/gateway/server-methods/channels.ts Outdated
Comment on lines +314 to +318
if (result.status === "fulfilled") {
channelsMap[result.value.pluginId] = result.value.summary;
accountsMap[result.value.pluginId] = result.value.accounts;
defaultAccountIdMap[result.value.pluginId] = result.value.defaultAccountId;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface rejected channel-status jobs instead of dropping channels

At the channel level, rejected tasks from Promise.allSettled are silently ignored and the handler still returns success, so a thrown buildChannelAccounts/buildChannelSummary causes that plugin to disappear from channels, channelAccounts, and channelDefaultAccountId. Because the CLI formatter only prints channels present in these payload maps, users get an incomplete status report with no error signal instead of seeing the failed channel as degraded.

Useful? React with 👍 / 👎.

@steipete steipete force-pushed the feat/supprt-tiered-token-pricing branch from 74dbb06 to 9374604 Compare April 21, 2026 01:57
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation extensions: moonshot labels Apr 21, 2026
@steipete steipete merged commit b938e63 into main Apr 21, 2026
94 of 95 checks passed
@steipete steipete deleted the feat/supprt-tiered-token-pricing branch April 21, 2026 02:03
@steipete
Copy link
Copy Markdown
Contributor

Landed in b938e63.\n\nMaintainer follow-up before merge:\n- Replayed the PR on current main without the old merge-from-main noise.\n- Added latest bundled Moonshot Kimi K2.6/K2.5 cost estimates.\n- Hardened tiered range billing, oversized remote pricing catalogs, and channel status probe concurrency/timeout.\n- Added changelog credit.\n\nVerification:\n- pnpm test src/utils/usage-format.test.ts src/gateway/model-pricing-cache.test.ts src/gateway/server-methods/channels.status.test.ts extensions/moonshot/provider-catalog.test.ts\n- pnpm check:changed\n- pnpm check\n- GitHub checks green\n\nThanks @sliverp.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 937460463c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +605 to +606
if (!nextPricing.has(key)) {
const existing = getCachedGatewayModelPricing({
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve tiered cache data during LiteLLM fetch outages

When litellmFailed is true, the refresh only back-fills models that are completely missing from nextPricing. For models that still resolve via OpenRouter, the key is present, so previously cached tieredPricing is dropped and replaced with flat-only pricing for the outage window. This causes temporary cost regressions (tiered → flat) even though valid tier data is already in memory; the fallback should also preserve existing tier metadata for overlapping keys when the LiteLLM source failed.

Useful? React with 👍 / 👎.

loongfay pushed a commit to yb-claw/openclaw that referenced this pull request Apr 21, 2026
Adds tiered model pricing support for cost tracking, keeps configured pricing ahead of cached catalog values, and includes latest Moonshot Kimi K2.6/K2.5 cost estimates.\n\nThanks @sliverp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations docs Improvements or additions to documentation extensions: moonshot gateway Gateway runtime maintainer Maintainer-authored PR size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants