Skip to content

refactor(core): source pricing + context limits from tokenlens registry#8

Merged
faraa2m merged 1 commit into
mainfrom
refactor/tokenlens-pricing-registry
May 9, 2026
Merged

refactor(core): source pricing + context limits from tokenlens registry#8
faraa2m merged 1 commit into
mainfrom
refactor/tokenlens-pricing-registry

Conversation

@faraa2m
Copy link
Copy Markdown
Owner

@faraa2m faraa2m commented May 9, 2026

Summary

  • Replace the hand-maintained RATES / MODELS tables in packages/core/src/rates.ts with a registry built at module load from @tokenlens/models (anthropic + openai + google subpaths). 7 curated models → 42 stable models, no extra manual maintenance.
  • Every ModelDescriptor now carries contextWindow, maxOutputTokens, and pricingSource: 'local' | 'tokenlens' (from models.dev).
  • Surface the new metadata in CLI (Limits: block under the cost table), web Playground (provider-grouped selector with · 128k context chips), and GitHub Action (Limits: line in the sticky PR comment).

Public API (getRate, getModel, KNOWN_MODELS, RateEntry, ModelDescriptor) is byte-compatible — no consumer code outside the touched files needs to change.

Why a hybrid registry, not a full swap?

claude-haiku-4-5 / claude-opus-4-7 / claude-sonnet-4-6 are not yet in tokenlens upstream (its anthropic catalog tops at claude-opus-4-1-20250805). They stay in a small LOCAL_OVERRIDES table that wins on id collision. A weekly cron (.github/workflows/registry-check.yml) runs scripts/check-overrides.mjs:

  1. Detects when upstream picks up an overridden id → fail with "drop the override".
  2. Diffs tokenlens-sourced pricing/context against packages/core/src/__snapshots__/registry.json → fail on drift, prompt to regenerate the snapshot.

On failure the workflow opens (or comments on) a tracking issue.

Other changes

  • benchmarks/run.mjs gains --filter / --models (also honors BENCH_MODELS env) so the regenerate sweep can be scoped — the matrix grew from 7×N×M to 42×N×M cells.
  • benchmarks/results.json regenerated against the full sweep.
  • Rates tests loosened from exact-dollar assertions → per-model positive-price invariants + canary checks on stable IDs + registry-size floor (>=15) so a broken registry can't silently pass.
  • packages/core adds two npm scripts: snapshot:registry, check:overrides.

Test plan

  • npm run build (all packages compile)
  • npm run lint (biome clean)
  • npm run typecheck
  • npx vitest run — 48 tests pass (was 46; added 2 for renderModelLimits)
  • npm run benchmarks — drift check passes against regenerated results.json
  • npm run check:overrides -w @tokenometer/coreOK — 42 models, 3 overrides intact, snapshot in sync.
  • CLI smoke: echo "..." | node packages/cli/dist/index.js - --model claude-opus-4-7,gpt-4o produces the expected table + new Limits: block + summary
  • First scheduled run of registry-check.yml (Mondays 12:00 UTC) — verify via workflow_dispatch after merge

Open follow-ups (out of scope)

  • The web bundle grows by the size of the bundled models.dev provider data (~33KB raw across the 3 providers). For an even smaller browser footprint, tokenlens ships an async fetchModels() path that could be opted into for packages/web only.
  • Adding gpt-3.5-turbo / gpt-4 / gpt-4-turbo (now in KNOWN_MODELS) reuses the existing o200k_base openai dispatch in tokenize.ts. Those older models are technically cl100k_base. Pre-existing bug surfaced by the wider catalog; worth a dedicated fix.

🤖 Generated with Claude Code

Replace the hand-maintained RATES/MODELS tables in packages/core with
a registry built at module load from @tokenlens/models (anthropic +
openai + google subpaths). The 7-entry curated set grows to 42 stable
models with no additional manual maintenance, and every model now
carries contextWindow + maxOutputTokens + pricingSource metadata
sourced from models.dev.

Three Anthropic models (claude-haiku-4-5, claude-opus-4-7,
claude-sonnet-4-6) are not yet in tokenlens upstream, so they remain
in a small LOCAL_OVERRIDES table that wins on id collision. A weekly
.github/workflows/registry-check.yml runs scripts/check-overrides.mjs
which detects when upstream catches up (action: drop the override) or
when tokenlens-sourced pricing drifts from the checked-in snapshot at
packages/core/src/__snapshots__/registry.json. CI opens a tracking
issue on findings.

Public API (getRate, getModel, KNOWN_MODELS, RateEntry,
ModelDescriptor) is byte-compatible. Consumers gain the new metadata:

- CLI prints a Limits: block under the cost table showing ctx + max
  output per unique model
- web Playground groups model checkboxes by provider with a
  context-window chip suffix (gpt-4o · 128k)
- GitHub Action appends a Limits: line to the sticky PR comment

benchmarks/run.mjs gains a --filter / --models flag (also BENCH_MODELS
env) so the regenerate sweep can be scoped — the matrix grew from
7×N×M to 42×N×M cells. Existing results.json is regenerated to match.

Tests are loosened from exact-dollar assertions to (a) per-model
positive-price invariants, (b) canary checks on stable IDs (gpt-4o,
gpt-4o-mini), (c) a registry-size floor (>=15) so a broken/empty
registry can't pass silently. 48 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
tokenometer Ready Ready Preview, Comment May 9, 2026 3:34am

@faraa2m faraa2m merged commit 009479f into main May 9, 2026
7 checks passed
faraa2m added a commit that referenced this pull request May 9, 2026
- replace `--format md` with real format names (markdown, text)
- drop stale `gemini-2.5-pro` example; add stdin example and `--help` pointer
- correct empirical-mode description: real provider countTokens, no caching claim
- fix CI snippet to match @v0 action inputs (models/formats/budget)
- methodology table: empirical column reflects shipped countTokens dispatch
- note pricing now comes from tokenlens registry (post #8 refactor)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@faraa2m faraa2m deleted the refactor/tokenlens-pricing-registry branch May 9, 2026 04:11
faraa2m added a commit that referenced this pull request May 11, 2026
…ry (#8)

Replace the hand-maintained RATES/MODELS tables in packages/core with
a registry built at module load from @tokenlens/models (anthropic +
openai + google subpaths). The 7-entry curated set grows to 42 stable
models with no additional manual maintenance, and every model now
carries contextWindow + maxOutputTokens + pricingSource metadata
sourced from models.dev.

Three Anthropic models (claude-haiku-4-5, claude-opus-4-7,
claude-sonnet-4-6) are not yet in tokenlens upstream, so they remain
in a small LOCAL_OVERRIDES table that wins on id collision. A weekly
.github/workflows/registry-check.yml runs scripts/check-overrides.mjs
which detects when upstream catches up (action: drop the override) or
when tokenlens-sourced pricing drifts from the checked-in snapshot at
packages/core/src/__snapshots__/registry.json. CI opens a tracking
issue on findings.

Public API (getRate, getModel, KNOWN_MODELS, RateEntry,
ModelDescriptor) is byte-compatible. Consumers gain the new metadata:

- CLI prints a Limits: block under the cost table showing ctx + max
  output per unique model
- web Playground groups model checkboxes by provider with a
  context-window chip suffix (gpt-4o · 128k)
- GitHub Action appends a Limits: line to the sticky PR comment

benchmarks/run.mjs gains a --filter / --models flag (also BENCH_MODELS
env) so the regenerate sweep can be scoped — the matrix grew from
7×N×M to 42×N×M cells. Existing results.json is regenerated to match.

Tests are loosened from exact-dollar assertions to (a) per-model
positive-price invariants, (b) canary checks on stable IDs (gpt-4o,
gpt-4o-mini), (c) a registry-size floor (>=15) so a broken/empty
registry can't pass silently. 48 tests pass.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
faraa2m added a commit that referenced this pull request May 11, 2026
- replace `--format md` with real format names (markdown, text)
- drop stale `gemini-2.5-pro` example; add stdin example and `--help` pointer
- correct empirical-mode description: real provider countTokens, no caching claim
- fix CI snippet to match @v0 action inputs (models/formats/budget)
- methodology table: empirical column reflects shipped countTokens dispatch
- note pricing now comes from tokenlens registry (post #8 refactor)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant