feat(token-usage): per-turn rows with ΔET and compounded ET in step summary#33628
Conversation
… step summary - parseTokenUsageJsonl: populate entries[] array with per-turn deltaET - generateTokenUsageSummary: show one row per turn (# | Model | tokens | ΔET | ET | Duration) in chronological order, plus aggregate totals row - Update/add tests for entries array, deltaET computation, new header, and chronological ordering Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #33628 does not have the 'implementation' label and has 0 new lines of code in business logic directories (threshold: 100). |
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ PR Code Quality Reviewer completed the code quality review. |
There was a problem hiding this comment.
Pull request overview
Updates the Token Usage step summary to show one row per API call (turn) in chronological order, including per-turn effective tokens (ΔET) and a running compounded ET total, while keeping model-level aggregates for compatibility.
Changes:
- Extend
parseTokenUsageJsonlto record per-turnentries[]includingdurationMsand computeddeltaET. - Rework
generateTokenUsageSummaryto render a per-turn table withΔETand runningET, plus a totals row and existing footer disclosures. - Update/add tests to validate
entries[]ordering and presence of the new columns.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/parse_mcp_gateway_log.cjs | Adds per-turn entries[] and updates token usage markdown summary to per-turn rows with ΔET and running ET. |
| actions/setup/js/parse_mcp_gateway_log.test.cjs | Adjusts tests for the new table schema and adds coverage for entries[] and per-turn deltaET. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 2
| const entries = summary.entries || []; | ||
| let compoundedET = 0; | ||
| for (let i = 0; i < entries.length; i++) { | ||
| const entry = entries[i]; | ||
| const deltaET = Math.round(entry.deltaET || 0); | ||
| compoundedET += deltaET; | ||
| lines.push( | ||
| `| ${model} | ${usage.inputTokens.toLocaleString()} | ${usage.outputTokens.toLocaleString()} | ${usage.cacheReadTokens.toLocaleString()} | ${usage.cacheWriteTokens.toLocaleString()} | ${et} | ${usage.requests} | ${formatDurationMs(usage.durationMs)} |` | ||
| `| ${i + 1} | ${entry.model} | ${entry.inputTokens.toLocaleString()} | ${entry.outputTokens.toLocaleString()} | ${entry.cacheReadTokens.toLocaleString()} | ${entry.cacheWriteTokens.toLocaleString()} | ${formatET(deltaET)} | ${formatET(compoundedET)} | ${formatDurationMs(entry.durationMs)} |` |
| expect(md).toContain("**Total**"); | ||
| // The last entry's compounded ET equals totalEffectiveTokens so must appear in the table | ||
| const totalRounded = Math.round(summary.totalEffectiveTokens); | ||
| expect(totalRounded).toBeGreaterThan(0); |
There was a problem hiding this comment.
Code Quality Review
✅ Approved — Clean refactoring with excellent test coverage
Summary
This PR successfully transforms token usage reporting from aggregated model summaries to chronological per-turn rows showing both delta (ΔET) and compounded effective tokens (ET). The implementation is well-structured and maintains backward compatibility.
What Was Reviewed
- Logic correctness: ✅ Entry ordering, deltaET calculation, compounding logic all correct
- Test coverage: ✅ Comprehensive tests for new entries array, deltaET computation, and sum validation
- Backward compatibility: ✅ byModel aggregates preserved
- Code clarity: ✅ JSDoc updated, variable names clear
Observations (Non-blocking)
Minor architectural note
The compoundedET running total is calculated during rendering in generateTokenUsageSummary(). This works correctly but couples calculation with view formatting. Consider computing this during parsing if the compounded sequence becomes a first-class concept used elsewhere.
This is fine for display-only use. No action needed unless compounded ET is used in other calculations.
Test Quality
✅ Excellent additions:
- populates per-turn entries array in order — validates chronological preservation
- computes deltaET for each entry — ensures per-turn ET calculation
- sum of entry deltaET equals totalEffectiveTokens — critical invariant check
- compounded ET equals sum of per-turn delta ET values — validates rendering math
Verdict
No issues found. The code is production-ready with strong test coverage and clear intent. Nice work on the progressive disclosure improvement for token usage visibility.
🔎 Code quality review by PR Code Quality Reviewer · ● 1.6M
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #33628 does not have the 'implementation' label and has 0 new lines of code in default business logic directories (threshold: 100). |
|
✅ PR Code Quality Reviewer completed the code quality review. |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd and /zoom-out based on this feature enhancement that changes token usage rendering from aggregated-by-model to per-turn chronological display.
Key Themes
/tdd findings:
- Missing edge case tests: Empty entries array, single entry (no compounding), large entry counts
- Test-implementation gap: The "compounded ET equals sum" test doesn't verify the actual rendered markdown values
- Magic numbers: Floating-point precision
5intoBeCloseTo()is undocumented
/zoom-out findings:
- Separation of concerns: Compounding arithmetic happens during rendering rather than in data preparation
- Visual consistency: Empty cells in the Totals row break table scannability
- JSDoc completeness: Type annotation for
entries[]could be more complete
Positive Highlights ✅
- Backward compatibility: The
byModelfield is preserved for existing consumers - Clear naming:
deltaETvsETmakes the distinction between per-turn and compounded values immediately obvious - Chronological ordering: This change unlocks per-turn analysis — excellent product decision
- Test names are descriptive: Tests read as specifications ("populates per-turn entries array in order")
Verdict
This is a solid enhancement with good test coverage. The comments above suggest improvements to edge case handling and separation of concerns, but none are blocking. The chronological rendering is a meaningful UX improvement for understanding token consumption patterns.
The issues raised are about making the code more maintainable and resilient to future changes, not about correctness of the current implementation.
Skills applied: /tdd emphasizes test-driven development and comprehensive edge case coverage; /zoom-out provides architectural perspective on code organization and maintainability.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 4.7M
| const summary = parseTokenUsageJsonl(lines.join("\n")); | ||
| const sumDelta = summary.entries.reduce((acc, e) => acc + e.deltaET, 0); | ||
| expect(sumDelta).toBeCloseTo(summary.totalEffectiveTokens, 5); | ||
| }); |
There was a problem hiding this comment.
[/tdd] Missing edge case test for empty entries array. The new chronological rendering should handle zero entries gracefully.
Consider adding:
test("handles empty entries array", () => {
const summary = { totalRequests: 0, entries: [] };
const md = generateTokenUsageSummary(summary);
expect(md).toBe("");
});
test("renders single entry without compounding confusion", () => {
const content = JSON.stringify({ model: "m", input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 100 });
const summary = parseTokenUsageJsonl(content);
const md = generateTokenUsageSummary(summary);
// First and only row: ΔET should equal ET
expect(summary.entries[0].deltaET).toBe(summary.totalEffectiveTokens);
});Edge cases are where compounding logic breaks in practice.
| expect(summary.entries[0].deltaET).toBeGreaterThan(0); | ||
| }); | ||
|
|
||
| test("sum of entry deltaET equals totalEffectiveTokens", () => { |
There was a problem hiding this comment.
[/tdd] The precision value 5 in toBeCloseTo() should be documented or derived from the actual ET calculation precision.
Why 5 decimal places? Is this the known precision of computeEffectiveTokens()? Consider:
// Document the precision contract
const ET_PRECISION = 5; // matches computeEffectiveTokens rounding
expect(sumDelta).toBeCloseTo(summary.totalEffectiveTokens, ET_PRECISION);Magic numbers in tests reduce future maintainability — if the ET calculation changes precision, this test should fail clearly.
| `| ${i + 1} | ${entry.model} | ${entry.inputTokens.toLocaleString()} | ${entry.outputTokens.toLocaleString()} | ${entry.cacheReadTokens.toLocaleString()} | ${entry.cacheWriteTokens.toLocaleString()} | ${formatET(deltaET)} | ${formatET(compoundedET)} | ${formatDurationMs(entry.durationMs)} |` | ||
| ); | ||
| } | ||
|
|
There was a problem hiding this comment.
[/zoom-out] The Totals row has empty cells (| **Total** | |) which breaks the visual rhythm of the table.
Consider using semantic placeholders for clarity:
lines.push(
`| **Total** | — | **${summary.totalInputTokens.toLocaleString()}** | ... | — | **${totalET}** | **${formatDurationMs(summary.totalDurationMs)}** |`
);Alternatively, use Σ (sum symbol) in the ΔET column to indicate "sum of deltas = total ET". Empty cells reduce scannability when users are trying to understand the table structure.
| expect(md).toContain("●"); | ||
| expect(md).not.toContain("Cache efficiency"); | ||
| }); | ||
|
|
There was a problem hiding this comment.
[/tdd] This test name says "compounded ET equals sum of per-turn delta ET values" but the test body only checks toBeGreaterThan(0) — it doesn't actually verify the compounding arithmetic.
Consider parsing the markdown and verifying the actual ET column progression:
test("compounded ET equals sum of per-turn delta ET values", () => {
const lines = [
JSON.stringify({ model: "m", input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 100 }),
JSON.stringify({ model: "m", input_tokens: 200, output_tokens: 100, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 200 }),
];
const summary = parseTokenUsageJsonl(lines.join("\n"));
// Verify the arithmetic: row 2 ET = row 1 deltaET + row 2 deltaET
const expectedET1 = Math.round(summary.entries[0].deltaET);
const expectedET2 = Math.round(summary.entries[0].deltaET + summary.entries[1].deltaET);
const md = generateTokenUsageSummary(summary);
expect(md).toContain(formatET(expectedET1)); // First row ET
expect(md).toContain(formatET(expectedET2)); // Second row compounded ET
});The test should verify what the user sees in the table, not just internal state.
| const et = formatET(Math.round(usage.effectiveTokens || 0)); | ||
| lines.push("| # | Model | Input | Output | Cache Read | Cache Write | ΔET | ET | Duration |"); | ||
| lines.push("|--:|-------|------:|-------:|-----------:|------------:|----:|---:|---------:|"); | ||
|
|
There was a problem hiding this comment.
[/zoom-out] The compoundedET variable is mutated during rendering, coupling display logic with arithmetic. While this works (markdown is generated once), it violates separation of concerns.
Consider pre-computing compounded values in the data structure:
// After computing deltaET for each entry:
let runningTotal = 0;
for (const entry of summary.entries) {
runningTotal += Math.round(entry.deltaET || 0);
entry.compoundedET = runningTotal;
}
// Then rendering becomes pure:
for (let i = 0; i < entries.length; i++) {
const entry = entries[i];
lines.push(
`| ${i + 1} | ${entry.model} | ... | ${formatET(entry.deltaET)} | ${formatET(entry.compoundedET)} | ...`
);
}This makes the rendering logic stateless and easier to test independently.
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
✅ Test Quality Sentinel: 50/100. All tests verify behavioral contracts (excellent!), but attention needed: (1) Test inflation at 5:1 ratio exceeds 2:1 threshold, (2) Zero error/edge case coverage across all 6 new tests. Consider adding tests for malformed inputs, missing fields, and boundary conditions to improve robustness.
There was a problem hiding this comment.
Code Quality Review Summary
This PR introduces a valuable improvement to token usage reporting by showing per-turn progression with ΔET and compounded ET columns. The implementation is well-tested and follows good coding practices.
The existing review comments (7 total) comprehensively cover all significant issues I identified during my review. Rather than duplicate feedback, I recommend addressing those comments, particularly:
Critical (Must Fix)
- Rounding drift (line 156 in parse_mcp_gateway_log.cjs) — Accumulate unrounded deltaET values and round only for display to maintain consistency with totalEffectiveTokens
Important (Should Fix)
- Weak test assertion (line 1477) — Test doesn't verify actual compounded ET arithmetic
- Missing edge cases (line 1399) — Add tests for empty entries and single entry scenarios
- Separation of concerns (line 148) — Pre-compute compounded values in data structure rather than mutating during rendering
Nice to Have
- Magic number documentation (line 1391) — Document the precision contract for toBeCloseTo()
- Table formatting (line 159) — Use semantic placeholders instead of empty cells
Overall: The PR is well-executed. Once the rounding drift issue is addressed, this will be a solid addition to the codebase.
🔎 Code quality review by PR Code Quality Reviewer · ● 2.4M
|
🧪 Test Quality Sentinel completed test quality analysis. |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd and /zoom-out based on the feature addition with table rendering refactor in this PR.
Key Themes
Test Coverage (/tdd):
- ✅ Good foundation: tests added for new
entries[]array, chronological ordering, and delta ET computation ⚠️ Weak assertion in "compounded ET equals sum" test — validates presence but not correctness⚠️ Missing edge case: emptyentries[]array (common when MCP gateway log is absent)
Architecture (/zoom-out):
- ✅ Backward compatibility preserved:
byModelaggregates still populated - ✅ Clear JSDoc updates explaining the new chronological rendering
⚠️ Performance: separate loop fordeltaETcomputation could be merged into initial parsing⚠️ Naming:compoundedETvs.totalEffectiveTokens— same concept, different names⚠️ UX: empty Model cell in Total row may appear as a rendering bug
Positive Highlights
- ✅ Excellent user value: per-turn progression is much more useful than aggregated-by-model view
- ✅ Clean separation: parsing logic remains distinct from rendering logic
- ✅ Test names are descriptive and read as specifications
- ✅
toBeCloseToused correctly for floating-point comparisons
Verdict
No blocking issues — this is solid work. The suggestions are optimizations and test hardening for future maintainability. Approving with optional improvements.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 4.7M
| const md = generateTokenUsageSummary(summary); | ||
| // Total row and last data row ET should both show the overall total ET | ||
| expect(md).toContain("**Total**"); | ||
| // The last entry's compounded ET equals totalEffectiveTokens so must appear in the table |
There was a problem hiding this comment.
[/tdd] This test assertion is too weak — it only verifies totalRounded > 0 but doesn't validate that the compounded ET in the table actually matches expectations.
Consider asserting the actual compounded values:
const delta1 = summary.entries[0].deltaET;
const delta2 = summary.entries[1].deltaET;
const expectedCompounded = Math.round(delta1 + delta2);
expect(expectedCompounded).toBe(Math.round(summary.totalEffectiveTokens));
// Also verify the compounded value appears in the markdown output
const formatETExpected = formatET(expectedCompounded);
expect(md).toContain(formatETExpected);Without this, a bug in the compounding logic could pass the test.
| } | ||
|
|
||
| const totalET = formatET(Math.round(summary.totalEffectiveTokens || 0)); | ||
| lines.push( |
There was a problem hiding this comment.
[/zoom-out] The empty cell in the Total row for the Model column may confuse users.
Consider either:
- Adding explicit text:
| **Total** | **(all models)** | ... - Or a comment explaining why it's empty
The | **Total** | | pattern creates a visually ambiguous table cell that might appear as a rendering bug.
| @@ -1378,7 +1408,7 @@ not-json | |||
| test("renders header and table columns", () => { | |||
There was a problem hiding this comment.
[/tdd] Missing edge case test: what happens when summary.entries is empty?
Consider adding:
test("handles empty entries array gracefully", () => {
const summary = parseTokenUsageJsonl("");
const md = generateTokenUsageSummary(summary);
expect(md).toBe(""); // Should return empty string for no data
});Empty data is a common real-world scenario (e.g., MCP gateway log doesn't exist or contains no token usage).
| lines.push("| # | Model | Input | Output | Cache Read | Cache Write | ΔET | ET | Duration |"); | ||
| lines.push("|--:|-------|------:|-------:|-----------:|------------:|----:|---:|---------:|"); | ||
|
|
||
| const entries = summary.entries || []; |
There was a problem hiding this comment.
[/zoom-out] Naming inconsistency: the local variable compoundedET (line 149) represents the same concept as summary.totalEffectiveTokens (line 123), just accumulated progressively.
For clarity, consider renaming to runningET or cumulativeET to emphasize that it's a running total, not a final "compounded" value. This makes the relationship between the loop variable and the final totalEffectiveTokens more obvious.
| } | ||
| summary.totalEffectiveTokens = totalEffectiveTokens; | ||
|
|
||
| // Compute per-turn delta ET |
There was a problem hiding this comment.
[/zoom-out] Performance opportunity: the deltaET computation could be done during the initial loop (line 106) rather than in a separate pass.
// Inside the main loop after line 106:
const deltaET = computeEffectiveTokens(model, inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens);
summary.entries.push({ model, inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens, durationMs, deltaET });This eliminates a full second iteration over entries[], which could matter for long-running workflows with many API calls.
🧪 Test Quality Sentinel ReportTest Quality Score: 43/100🔶 Needs improvement
Test Classification DetailsView detailed per-test classification (6 tests)
Flagged Tests — Requires Review
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 43/100. Test quality is acceptable — 17% of new tests are implementation tests (threshold: 30%). Strong behavioral coverage (83% design tests) with good verification of chronological ordering, deltaET calculations, and mathematical invariants. Consider adding error case tests to improve resilience.
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in the latest commit. The |
The Token Usage step summary showed a single aggregated row per model, hiding per-turn progression. This replaces it with one row per API call in chronological order, showing both the per-turn delta ET and the running compounded ET.
Changes
parse_mcp_gateway_log.cjsparseTokenUsageJsonl— addsentries[]to the returned summary; each entry recordsmodel, token counts,durationMs, anddeltaET(per-turn ET viacomputeEffectiveTokens).byModelaggregates preserved for backward compat.generateTokenUsageSummary— new table schema:parse_mcp_gateway_log.test.cjsentries[]population order,deltaETcomputation, and sum-of-deltas ==totalEffectiveTokens