Skip to content

bug: Turn/Session cost is exactly 2× the actual per-call cost — mathematical doubling in cost accumulation #292

@kenotron

Description

@kenotron

Repo: amplifier-module-provider-anthropic + amplifier-module-hooks-streaming-ui

Evidence (~/Downloads/image (1).png)

Simple session — amplifier-dev bundle, claude-opus-4-7, NO tools, NO delegation:

📊 Token Usage (anthropic/claude-opus-4-7) [3.2s]
 Input: 92,650 (caching...) | Output: 15 | Total: 92,665 | Cost: $0.58
💰 Turn: $1.16 | Session: $1.16

📊 Token Usage (anthropic/claude-opus-4-7) [3.7s]
 Input: 92,683 (94% cached) | Output: 110 | Total: 92,793 | Cost: $0.08
💰 Turn: $0.15 | Session: $1.31
Turn Per-call Cost: 💰 Turn: Ratio
"hi" $0.58 $1.16 exactly 2.00×
"tip of the day" $0.08 $0.15 ≈ 2× (actual ~$0.076)

This is mathematical doubling, not visual confusion

  • Cost: $0.58 comes from usage.cost_usd in the content_block:end event — stamped once by compute_cost() inside _convert_to_chat_response()
  • Turn: $1.16 comes from collect_contributions("session.cost") which reads _totals["cost_usd"]
  • For _totals["cost_usd"] to be $1.16 when one API call cost $0.58, _add_cost($0.58) must have been called twice

What we know from code inspection

_add_cost is called exactly once in _convert_to_chat_response() (line 3395 in amplifier_module_provider_anthropic/__init__.py). _convert_to_chat_response is called once in complete() (line 2669). So either:

  1. complete() is being called twice by the orchestrator — possible if the _fallback_on_overload path retries after a partial failure, or if the streaming orchestrator makes two calls (stream-to-display then parse-for-tools). This is the most likely cause.

  2. A second session.cost contributor is registered with the same value — e.g., if mount() is called twice and both contributors somehow reflect the same _totals. The Rust coordinator APPENDS contributors (confirmed: coordinator.rs: .push(entry)), so two registrations would both be collected. No second session.cost contributor was found in code inspection.

Why Cost: shows half the Turn

Cost: in the per-call line shows chat_response.usage.cost_usd — the cost of the final call only. Turn: shows the accumulated _totals["cost_usd"] across all calls in the turn. If the orchestrator makes 2 calls per user turn, _totals accumulates both while the display only shows the last.

Investigation needed

Add instrumentation to confirm:

def _add_cost(cost) -> None:
    import traceback
    logger.warning("_add_cost called: cost=%s, stack=\n%s", cost, ''.join(traceback.format_stack()))
    if cost is not None:
        _totals["cost_usd"] = (_totals["cost_usd"] or Decimal("0")) + cost
        _totals["has_data"] = True

This will reveal whether _add_cost is called once or twice per user turn, and from which code path.

Impact

Every Turn and Session cost shown to users is 2× the actual API cost. The Session total accumulates this doubling, so long sessions show dramatically inflated costs.

Note on previous display fixes

PR/fix on feat/m0-cost-management removed Cost: from the per-call token line (issue #291). That hides the symptom but does NOT fix the Turn: doubling — users will still see doubled Turn/Session costs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions