Skip to content

feat: Improve lib with cachedTokens and reasoningTokens#1

Merged
rmescandon merged 3 commits into
rmescandon:mainfrom
lmcalvo:main
Apr 14, 2026
Merged

feat: Improve lib with cachedTokens and reasoningTokens#1
rmescandon merged 3 commits into
rmescandon:mainfrom
lmcalvo:main

Conversation

@lmcalvo
Copy link
Copy Markdown
Contributor

@lmcalvo lmcalvo commented Apr 14, 2026

No description provided.

Comment thread modelcost/calculator.py
Comment thread modelcost/calculator.py
prompt_rate = pricing["prompt"]
completion_rate = pricing["completion"]
cache_read_rate = pricing.get("cache_read", prompt_rate)
cache_creation_rate = pricing.get("cache_creation", prompt_rate)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CORRECTNESS] Reported reasoning token count can differ from billed count.
Why: Billing clamps with effective_reasoning = min(reasoning_tokens, output_tokens) but CostResult.reasoning_tokens keeps the original value, so CLI output can over-report reasoning usage.
Suggestion: Store/report the clamped reasoning value (or include both raw and billed values with clear labels) so displayed usage matches billed usage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically valid observation, but I'd push back on the fix. The clamping (min(reasoning_tokens, output_tokens)) is a defensive guard, not an expected case. If reasoning > output, the caller passed bad data. Reporting the raw value in CostResult is more useful for debugging — the caller can see their own input. Silently changing it would be confusing. The total_cost_usd is always correct regardless.

Comment thread modelcost/calculator.py

results["tokencost"] = _tokencost_source(model, input_tokens, output_tokens)

# Preserve order: litellm -> openrouter -> tokencost
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BEHAVIOR] tokencost path silently drops cache and reasoning token inputs.
Why: The tokencost branch only calls _tokencost_source(model, input_tokens, output_tokens), so cached, cache-creation, and reasoning tokens are ignored while other sources include them.
Suggestion: Either include equivalent token components in tokencost calculations or emit an explicit warning/flag that those token types are excluded for that source.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed. We added a comment in calculator.py:107-110 explaining this, and a note in the README. The tokencost library simply doesn't have these fields — we can't invent pricing it doesn't provide. Adding a warning/flag would add complexity for a known limitation that's already documented.

Copy link
Copy Markdown
Contributor

@robertomier robertomier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compatibilidad/API: al introducir * antes de source en calculate_cost, el 4o argumento posicional deja de funcionar. Impacto: llamadas existentes como calculate_cost(m, i, o, "openrouter") pasan a fallar con TypeError. Sugerencia: manten source como cuarto parametro posicional y mueve los nuevos campos a keyword-only (..., output_tokens, source="litellm", *, cached_input_tokens=0, ...). El resto del ajuste de formula y documentacion va en buena direccion.

@rmescandon rmescandon merged commit 18f7afd into rmescandon:main Apr 14, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants