Skip to content

observability: per-stage token and cost breakdown in budget_summary #13

@Mathews-Tom

Description

@Mathews-Tom

Context

A 2026-04-19 smoke run produced this budget_summary.per_agent in the session JSON:

"per_agent": {
  "apprentice_pipeline": {
    "tokens_used": 3987,
    "cost_usd": 0.059805,
    "calls": 1,
    "duration_seconds": 47.27
  }
}

All six pipeline stages (discovery, implementation, instrumentation, visualization, assessment, review) are aggregated into a single row labeled apprentice_pipeline.

Problem

When a generated artifact fails quality review, there is no way to diagnose which stage is responsible without re-running with manual instrumentation. Per-stage breakdown is also needed to:

  • Tune individual prompt costs
  • Detect when one stage dominates the budget
  • Compare model choices stage-by-stage (e.g., Haiku for assessment, gpt-5.4 for implementation)
  • Validate that gates actually ran

Proposed fix

budget_summary.per_agent should report one row per stage defined in src/apprentice/stages/:

"per_agent": {
  "discovery":       {"tokens_used": N, "cost_usd": N, "calls": N, "duration_seconds": N},
  "implementation":  {...},
  "instrumentation": {...},
  "visualization":   {...},
  "assessment":      {...},
  "review":          {...}
}

Additionally, record per-stage gate verdicts (gate_name → pass/fail/skipped) in the session JSON so a reader can confirm which gates executed and in what order — the current session has no evidence of gate ordering.

Related

Observability is the "no magic" principle applied to apprentice itself. Currently apprentice is opaque about its own pipeline — ironic given the project it serves.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions