Skip to content

Integrate models.dev pricing flow#884

Merged
steipete merged 4 commits intosteipete:mainfrom
iam-brain:iam-brain/models-dev-pricing-integration-883
May 10, 2026
Merged

Integrate models.dev pricing flow#884
steipete merged 4 commits intosteipete:mainfrom
iam-brain:iam-brain/models-dev-pricing-integration-883

Conversation

@iam-brain
Copy link
Copy Markdown
Contributor

Summary

  • Prefer cached models.dev pricing for Codex and Claude cost calculations before bundled fallback tables.
  • Thread provider-scoped models.dev catalog/cache context through Codex, Claude, and Pi session cost scanners.
  • Recompute report costs from the current catalog before falling back to cached packed cost rows, with focused regression coverage.

Reasoning

Scope

  • Codex and Claude pricing layer.
  • Cost usage scanner report construction.
  • Pi session merged cost reports.
  • Focused pricing tests for models.dev precedence, threshold pricing, research-preview override, and isolated bundled fallback behavior.

Screenshots

N/A

GIFs

N/A

References

Validation

  • swift test --filter CostUsagePricingTests --filter PiSessionCostScannerTests --filter CostUsageScannerTests
  • make check
  • ./Scripts/compile_and_run.sh

Audit

  • GPT-5.5 high subagent audit completed. Initial findings were fixed, and the final pass found no material issues.

@steipete
Copy link
Copy Markdown
Owner

Pushed a small test-only CI fix in 59a1ad7: precompute the pricing expectations so Swift does not time out type-checking #expect arithmetic expressions.\n\nValidation:\n- swift test --filter 'CostUsagePricingTests|PiSessionCostScannerTests|CostUsageScannerTests'\n- make check\n- CI is green on the pushed commit.

@steipete steipete marked this pull request as ready for review May 10, 2026 14:54
@steipete steipete merged commit 03c6801 into steipete:main May 10, 2026
4 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 680561a4a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +664 to +665
let cost = currentPricingCost
?? (cachedCost > 0 ? Double(cachedCost) / costScale : nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve per-request threshold costs for Claude rows

This now prioritizes currentPricingCost over cachedCost, but currentPricingCost is computed from day/model aggregates instead of original per-request usage. For Claude models with long-context threshold pricing, aggregating many sub-200k requests can incorrectly cross the 200k tier and overstate cost (or understate in other mixes). Before this change, cached nanos preserved per-request pricing behavior, so this introduces report inaccuracies whenever threshold-priced models are used multiple times in a day.

Useful? React with 👍 / 👎.

Comment on lines +623 to +627
let currentPricingCost = self.computedCostUSD(
provider: provider,
modelName: modelName,
usage: packed,
pricingContext: pricingContext)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid re-rating aggregated Pi usage with tiered pricing

computedCostUSD is being re-run on the aggregated packed totals for each model/day, which loses per-message boundaries needed for correct tiered Claude pricing. If multiple requests are each below the long-context threshold, recomputing on the sum can apply above-threshold rates that never actually occurred per request. The prior behavior used accumulated packed.costNanos from parse-time samples and did not introduce this aggregation error.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants