Skip to content

feat(token-usage): per-turn rows with ΔET and compounded ET in step summary#33628

Merged
pelikhan merged 2 commits into
mainfrom
copilot/improve-token-usage-rendering
May 20, 2026
Merged

feat(token-usage): per-turn rows with ΔET and compounded ET in step summary#33628
pelikhan merged 2 commits into
mainfrom
copilot/improve-token-usage-rendering

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 20, 2026

The Token Usage step summary showed a single aggregated row per model, hiding per-turn progression. This replaces it with one row per API call in chronological order, showing both the per-turn delta ET and the running compounded ET.

Changes

parse_mcp_gateway_log.cjs

  • parseTokenUsageJsonl — adds entries[] to the returned summary; each entry records model, token counts, durationMs, and deltaET (per-turn ET via computeEffectiveTokens). byModel aggregates preserved for backward compat.

  • generateTokenUsageSummary — new table schema:

    # Model Input Output Cache Read Cache Write ΔET ET Duration
    1 gpt-… 23,192 285 0 0 1.2K 1.2K 5.6s
    2 gpt-… 23,533 123 0 0 625 1.8K 3.4s
    Total sum ET dur
    • ΔET — effective tokens for that turn
    • ET — compounded running total through that turn
    • Totals row, ● footer, and ET weight disclosure retained

parse_mcp_gateway_log.test.cjs

  • Updated header assertion, chronological-order test (replaces sort-by-tokens test), and ΔET/ET column checks
  • Added tests: entries[] population order, deltaET computation, and sum-of-deltas == totalEffectiveTokens

… step summary

- parseTokenUsageJsonl: populate entries[] array with per-turn deltaET
- generateTokenUsageSummary: show one row per turn (# | Model | tokens | ΔET | ET | Duration) in chronological order, plus aggregate totals row
- Update/add tests for entries array, deltaET computation, new header, and chronological ordering

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan May 20, 2026 20:45
@pelikhan pelikhan marked this pull request as ready for review May 20, 2026 21:34
Copilot AI review requested due to automatic review settings May 20, 2026 21:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #33628 does not have the 'implementation' label and has 0 new lines of code in business logic directories (threshold: 100).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Code Quality Reviewer completed the code quality review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Token Usage step summary to show one row per API call (turn) in chronological order, including per-turn effective tokens (ΔET) and a running compounded ET total, while keeping model-level aggregates for compatibility.

Changes:

  • Extend parseTokenUsageJsonl to record per-turn entries[] including durationMs and computed deltaET.
  • Rework generateTokenUsageSummary to render a per-turn table with ΔET and running ET, plus a totals row and existing footer disclosures.
  • Update/add tests to validate entries[] ordering and presence of the new columns.
Show a summary per file
File Description
actions/setup/js/parse_mcp_gateway_log.cjs Adds per-turn entries[] and updates token usage markdown summary to per-turn rows with ΔET and running ET.
actions/setup/js/parse_mcp_gateway_log.test.cjs Adjusts tests for the new table schema and adds coverage for entries[] and per-turn deltaET.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment on lines +149 to +156
const entries = summary.entries || [];
let compoundedET = 0;
for (let i = 0; i < entries.length; i++) {
const entry = entries[i];
const deltaET = Math.round(entry.deltaET || 0);
compoundedET += deltaET;
lines.push(
`| ${model} | ${usage.inputTokens.toLocaleString()} | ${usage.outputTokens.toLocaleString()} | ${usage.cacheReadTokens.toLocaleString()} | ${usage.cacheWriteTokens.toLocaleString()} | ${et} | ${usage.requests} | ${formatDurationMs(usage.durationMs)} |`
`| ${i + 1} | ${entry.model} | ${entry.inputTokens.toLocaleString()} | ${entry.outputTokens.toLocaleString()} | ${entry.cacheReadTokens.toLocaleString()} | ${entry.cacheWriteTokens.toLocaleString()} | ${formatET(deltaET)} | ${formatET(compoundedET)} | ${formatDurationMs(entry.durationMs)} |`
expect(md).toContain("**Total**");
// The last entry's compounded ET equals totalEffectiveTokens so must appear in the table
const totalRounded = Math.round(summary.totalEffectiveTokens);
expect(totalRounded).toBeGreaterThan(0);
@github-actions github-actions Bot mentioned this pull request May 20, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Review

Approved — Clean refactoring with excellent test coverage

Summary

This PR successfully transforms token usage reporting from aggregated model summaries to chronological per-turn rows showing both delta (ΔET) and compounded effective tokens (ET). The implementation is well-structured and maintains backward compatibility.

What Was Reviewed

  • Logic correctness: ✅ Entry ordering, deltaET calculation, compounding logic all correct
  • Test coverage: ✅ Comprehensive tests for new entries array, deltaET computation, and sum validation
  • Backward compatibility: ✅ byModel aggregates preserved
  • Code clarity: ✅ JSDoc updated, variable names clear

Observations (Non-blocking)

Minor architectural note

The compoundedET running total is calculated during rendering in generateTokenUsageSummary(). This works correctly but couples calculation with view formatting. Consider computing this during parsing if the compounded sequence becomes a first-class concept used elsewhere.

This is fine for display-only use. No action needed unless compounded ET is used in other calculations.

Test Quality

✅ Excellent additions:

  • populates per-turn entries array in order — validates chronological preservation
  • computes deltaET for each entry — ensures per-turn ET calculation
  • sum of entry deltaET equals totalEffectiveTokens — critical invariant check
  • compounded ET equals sum of per-turn delta ET values — validates rendering math

Verdict

No issues found. The code is production-ready with strong test coverage and clear intent. Nice work on the progressive disclosure improvement for token usage visibility.

🔎 Code quality review by PR Code Quality Reviewer · ● 1.6M

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #33628 does not have the 'implementation' label and has 0 new lines of code in default business logic directories (threshold: 100).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Code Quality Reviewer completed the code quality review.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /zoom-out based on this feature enhancement that changes token usage rendering from aggregated-by-model to per-turn chronological display.

Key Themes

/tdd findings:

  • Missing edge case tests: Empty entries array, single entry (no compounding), large entry counts
  • Test-implementation gap: The "compounded ET equals sum" test doesn't verify the actual rendered markdown values
  • Magic numbers: Floating-point precision 5 in toBeCloseTo() is undocumented

/zoom-out findings:

  • Separation of concerns: Compounding arithmetic happens during rendering rather than in data preparation
  • Visual consistency: Empty cells in the Totals row break table scannability
  • JSDoc completeness: Type annotation for entries[] could be more complete

Positive Highlights ✅

  • Backward compatibility: The byModel field is preserved for existing consumers
  • Clear naming: deltaET vs ET makes the distinction between per-turn and compounded values immediately obvious
  • Chronological ordering: This change unlocks per-turn analysis — excellent product decision
  • Test names are descriptive: Tests read as specifications ("populates per-turn entries array in order")

Verdict

This is a solid enhancement with good test coverage. The comments above suggest improvements to edge case handling and separation of concerns, but none are blocking. The chronological rendering is a meaningful UX improvement for understanding token consumption patterns.

The issues raised are about making the code more maintainable and resilient to future changes, not about correctness of the current implementation.


Skills applied: /tdd emphasizes test-driven development and comprehensive edge case coverage; /zoom-out provides architectural perspective on code organization and maintainability.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 4.7M

const summary = parseTokenUsageJsonl(lines.join("\n"));
const sumDelta = summary.entries.reduce((acc, e) => acc + e.deltaET, 0);
expect(sumDelta).toBeCloseTo(summary.totalEffectiveTokens, 5);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Missing edge case test for empty entries array. The new chronological rendering should handle zero entries gracefully.

Consider adding:

test("handles empty entries array", () => {
  const summary = { totalRequests: 0, entries: [] };
  const md = generateTokenUsageSummary(summary);
  expect(md).toBe("");
});

test("renders single entry without compounding confusion", () => {
  const content = JSON.stringify({ model: "m", input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 100 });
  const summary = parseTokenUsageJsonl(content);
  const md = generateTokenUsageSummary(summary);
  // First and only row: ΔET should equal ET
  expect(summary.entries[0].deltaET).toBe(summary.totalEffectiveTokens);
});

Edge cases are where compounding logic breaks in practice.

expect(summary.entries[0].deltaET).toBeGreaterThan(0);
});

test("sum of entry deltaET equals totalEffectiveTokens", () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The precision value 5 in toBeCloseTo() should be documented or derived from the actual ET calculation precision.

Why 5 decimal places? Is this the known precision of computeEffectiveTokens()? Consider:

// Document the precision contract
const ET_PRECISION = 5; // matches computeEffectiveTokens rounding
expect(sumDelta).toBeCloseTo(summary.totalEffectiveTokens, ET_PRECISION);

Magic numbers in tests reduce future maintainability — if the ET calculation changes precision, this test should fail clearly.

`| ${i + 1} | ${entry.model} | ${entry.inputTokens.toLocaleString()} | ${entry.outputTokens.toLocaleString()} | ${entry.cacheReadTokens.toLocaleString()} | ${entry.cacheWriteTokens.toLocaleString()} | ${formatET(deltaET)} | ${formatET(compoundedET)} | ${formatDurationMs(entry.durationMs)} |`
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] The Totals row has empty cells (| **Total** | |) which breaks the visual rhythm of the table.

Consider using semantic placeholders for clarity:

lines.push(
  `| **Total** | — | **${summary.totalInputTokens.toLocaleString()}** | ... | — | **${totalET}** | **${formatDurationMs(summary.totalDurationMs)}** |`
);

Alternatively, use Σ (sum symbol) in the ΔET column to indicate "sum of deltas = total ET". Empty cells reduce scannability when users are trying to understand the table structure.

expect(md).toContain("●");
expect(md).not.toContain("Cache efficiency");
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] This test name says "compounded ET equals sum of per-turn delta ET values" but the test body only checks toBeGreaterThan(0) — it doesn't actually verify the compounding arithmetic.

Consider parsing the markdown and verifying the actual ET column progression:

test("compounded ET equals sum of per-turn delta ET values", () => {
  const lines = [
    JSON.stringify({ model: "m", input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 100 }),
    JSON.stringify({ model: "m", input_tokens: 200, output_tokens: 100, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 200 }),
  ];
  const summary = parseTokenUsageJsonl(lines.join("\n"));
  
  // Verify the arithmetic: row 2 ET = row 1 deltaET + row 2 deltaET
  const expectedET1 = Math.round(summary.entries[0].deltaET);
  const expectedET2 = Math.round(summary.entries[0].deltaET + summary.entries[1].deltaET);
  
  const md = generateTokenUsageSummary(summary);
  expect(md).toContain(formatET(expectedET1)); // First row ET
  expect(md).toContain(formatET(expectedET2)); // Second row compounded ET
});

The test should verify what the user sees in the table, not just internal state.

const et = formatET(Math.round(usage.effectiveTokens || 0));
lines.push("| # | Model | Input | Output | Cache Read | Cache Write | ΔET | ET | Duration |");
lines.push("|--:|-------|------:|-------:|-----------:|------------:|----:|---:|---------:|");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] The compoundedET variable is mutated during rendering, coupling display logic with arithmetic. While this works (markdown is generated once), it violates separation of concerns.

Consider pre-computing compounded values in the data structure:

// After computing deltaET for each entry:
let runningTotal = 0;
for (const entry of summary.entries) {
  runningTotal += Math.round(entry.deltaET || 0);
  entry.compoundedET = runningTotal;
}

// Then rendering becomes pure:
for (let i = 0; i < entries.length; i++) {
  const entry = entries[i];
  lines.push(
    `| ${i + 1} | ${entry.model} | ... | ${formatET(entry.deltaET)} | ${formatET(entry.compoundedET)} | ...`
  );
}

This makes the rendering logic stateless and easier to test independently.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 50/100. All tests verify behavioral contracts (excellent!), but attention needed: (1) Test inflation at 5:1 ratio exceeds 2:1 threshold, (2) Zero error/edge case coverage across all 6 new tests. Consider adding tests for malformed inputs, missing fields, and boundary conditions to improve robustness.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Review Summary

This PR introduces a valuable improvement to token usage reporting by showing per-turn progression with ΔET and compounded ET columns. The implementation is well-tested and follows good coding practices.

The existing review comments (7 total) comprehensively cover all significant issues I identified during my review. Rather than duplicate feedback, I recommend addressing those comments, particularly:

Critical (Must Fix)

  • Rounding drift (line 156 in parse_mcp_gateway_log.cjs) — Accumulate unrounded deltaET values and round only for display to maintain consistency with totalEffectiveTokens

Important (Should Fix)

  • Weak test assertion (line 1477) — Test doesn't verify actual compounded ET arithmetic
  • Missing edge cases (line 1399) — Add tests for empty entries and single entry scenarios
  • Separation of concerns (line 148) — Pre-compute compounded values in data structure rather than mutating during rendering

Nice to Have

  • Magic number documentation (line 1391) — Document the precision contract for toBeCloseTo()
  • Table formatting (line 159) — Use semantic placeholders instead of empty cells

Overall: The PR is well-executed. Once the rounding drift issue is addressed, this will be a solid addition to the codebase.

🔎 Code quality review by PR Code Quality Reviewer · ● 2.4M

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🧪 Test Quality Sentinel completed test quality analysis.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /zoom-out based on the feature addition with table rendering refactor in this PR.

Key Themes

Test Coverage (/tdd):

  • ✅ Good foundation: tests added for new entries[] array, chronological ordering, and delta ET computation
  • ⚠️ Weak assertion in "compounded ET equals sum" test — validates presence but not correctness
  • ⚠️ Missing edge case: empty entries[] array (common when MCP gateway log is absent)

Architecture (/zoom-out):

  • ✅ Backward compatibility preserved: byModel aggregates still populated
  • ✅ Clear JSDoc updates explaining the new chronological rendering
  • ⚠️ Performance: separate loop for deltaET computation could be merged into initial parsing
  • ⚠️ Naming: compoundedET vs. totalEffectiveTokens — same concept, different names
  • ⚠️ UX: empty Model cell in Total row may appear as a rendering bug

Positive Highlights

  • ✅ Excellent user value: per-turn progression is much more useful than aggregated-by-model view
  • ✅ Clean separation: parsing logic remains distinct from rendering logic
  • ✅ Test names are descriptive and read as specifications
  • toBeCloseTo used correctly for floating-point comparisons

Verdict

No blocking issues — this is solid work. The suggestions are optimizations and test hardening for future maintainability. Approving with optional improvements.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 4.7M

const md = generateTokenUsageSummary(summary);
// Total row and last data row ET should both show the overall total ET
expect(md).toContain("**Total**");
// The last entry's compounded ET equals totalEffectiveTokens so must appear in the table
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] This test assertion is too weak — it only verifies totalRounded > 0 but doesn't validate that the compounded ET in the table actually matches expectations.

Consider asserting the actual compounded values:

const delta1 = summary.entries[0].deltaET;
const delta2 = summary.entries[1].deltaET;
const expectedCompounded = Math.round(delta1 + delta2);
expect(expectedCompounded).toBe(Math.round(summary.totalEffectiveTokens));
// Also verify the compounded value appears in the markdown output
const formatETExpected = formatET(expectedCompounded);
expect(md).toContain(formatETExpected);

Without this, a bug in the compounding logic could pass the test.

}

const totalET = formatET(Math.round(summary.totalEffectiveTokens || 0));
lines.push(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] The empty cell in the Total row for the Model column may confuse users.

Consider either:

  1. Adding explicit text: | **Total** | **(all models)** | ...
  2. Or a comment explaining why it's empty

The | **Total** | | pattern creates a visually ambiguous table cell that might appear as a rendering bug.

@@ -1378,7 +1408,7 @@ not-json
test("renders header and table columns", () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Missing edge case test: what happens when summary.entries is empty?

Consider adding:

test("handles empty entries array gracefully", () => {
  const summary = parseTokenUsageJsonl("");
  const md = generateTokenUsageSummary(summary);
  expect(md).toBe(""); // Should return empty string for no data
});

Empty data is a common real-world scenario (e.g., MCP gateway log doesn't exist or contains no token usage).

lines.push("| # | Model | Input | Output | Cache Read | Cache Write | ΔET | ET | Duration |");
lines.push("|--:|-------|------:|-------:|-----------:|------------:|----:|---:|---------:|");

const entries = summary.entries || [];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Naming inconsistency: the local variable compoundedET (line 149) represents the same concept as summary.totalEffectiveTokens (line 123), just accumulated progressively.

For clarity, consider renaming to runningET or cumulativeET to emphasize that it's a running total, not a final "compounded" value. This makes the relationship between the loop variable and the final totalEffectiveTokens more obvious.

}
summary.totalEffectiveTokens = totalEffectiveTokens;

// Compute per-turn delta ET
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Performance opportunity: the deltaET computation could be done during the initial loop (line 106) rather than in a separate pass.

// Inside the main loop after line 106:
const deltaET = computeEffectiveTokens(model, inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens);
summary.entries.push({ model, inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens, durationMs, deltaET });

This eliminates a full second iteration over entries[], which could matter for long-running workflows with many API calls.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 43/100

🔶 Needs improvement

Metric Value
New/modified tests analyzed 6
✅ Design tests (behavioral contracts) 5 (83%)
⚠️ Implementation tests (low value) 1 (17%)
Tests with error/edge cases 0 (0%)
Duplicate test clusters 0
Test inflation detected Yes (5:1 ratio)
🚨 Coding-guideline violations 0

Test Classification Details

View detailed per-test classification (6 tests)
Test File Classification Issues Detected
populates per-turn entries array in order actions/setup/js/parse_mcp_gateway_log.test.cjs:1370 ✅ Design None - verifies ordering contract
computes deltaET for each entry actions/setup/js/parse_mcp_gateway_log.test.cjs:1384 ✅ Design None - verifies field computation
sum of entry deltaET equals totalEffectiveTokens actions/setup/js/parse_mcp_gateway_log.test.cjs:1391 ✅ Design None - verifies mathematical invariant
renders rows in chronological (input) order actions/setup/js/parse_mcp_gateway_log.test.cjs:1428 ✅ Design None - enforces behavioral change
includes ΔET and ET columns in table actions/setup/js/parse_mcp_gateway_log.test.cjs:1440 ⚠️ Implementation Only checks header text, not values
compounded ET equals sum of per-turn delta ET values actions/setup/js/parse_mcp_gateway_log.test.cjs:1465 ✅ Design None - verifies cumulative calculation

Flagged Tests — Requires Review

⚠️ includes ΔET and ET columns in table (actions/setup/js/parse_mcp_gateway_log.test.cjs:1440)

Classification: Implementation test
Issue: Only asserts that the header text "| ΔET |" exists in the markdown output, not that the column contains correct deltaET values for each row.

What design invariant does this test enforce?
None — it verifies literal header text, not observable behavior. A bug where deltaET values are all zero would pass this test.

What would break if deleted?
Only if the column header text changed cosmetically (e.g., "ΔET" → "Delta ET"). Behavioral regressions in the deltaET calculation would not be caught.

Suggested improvement: Replace with an assertion that verifies actual deltaET values appear in the rendered table rows. Example:

test("renders deltaET values for each row", () => {
  const lines = [
    JSON.stringify({ model: "m1", input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 100 }),
    JSON.stringify({ model: "m2", input_tokens: 200, output_tokens: 100, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 200 }),
  ];
  const summary = parseTokenUsageJsonl(lines.join("\n"));
  const md = generateTokenUsageSummary(summary);
  
  // Verify deltaET values appear (not just the header)
  const deltaET1 = summary.entries[0].deltaET;
  const deltaET2 = summary.entries[1].deltaET;
  expect(md).toContain(String(Math.round(deltaET1)));
  expect(md).toContain(String(Math.round(deltaET2)));
});

Red Flags Summary

1. ⚠️ Test Inflation (5:1 ratio)

  • Production file: +27 lines, -18 lines (net +9)
  • Test file: +54 lines, -9 lines (net +45)
  • Ratio: 5:1 (exceeds 2:1 guideline)

Context: The feature adds per-turn token tracking with deltaET and compounded ET calculations. While the implementation is concise, thorough testing of the new behavior requires multiple scenarios (ordering, calculations, rendering). The tests verify distinct aspects rather than duplicating patterns.

Severity: Medium — The high ratio is justified by the complexity of behavioral verification needed, but watch for further growth.

2. ⚠️ Zero Error Coverage

None of the 6 tests include error cases or edge conditions:

  • ❌ No tests for malformed JSONL input (invalid JSON, missing required fields)
  • ❌ No tests for empty entries array
  • ❌ No tests for zero or negative token values
  • ❌ No tests for division-by-zero scenarios in deltaET calculation
  • ❌ No tests for numeric precision edge cases (very large token counts)

Severity: Medium — Happy-path coverage is strong, but error resilience is completely untested.

Recommendation: Add at least 2-3 error case tests before merging:

test("handles malformed JSONL gracefully", () => {
  const content = "not valid json\n{\"model\":\"m\"}";
  const summary = parseTokenUsageJsonl(content);
  // Should not throw, should skip invalid lines
  expect(summary).toBeDefined();
});

test("handles empty entries array", () => {
  const summary = parseTokenUsageJsonl("");
  expect(summary.entries).toEqual([]);
  expect(summary.totalEffectiveTokens).toBe(0);
});

test("handles zero token values", () => {
  const content = JSON.stringify({ model: "m", input_tokens: 0, output_tokens: 0, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 0 });
  const summary = parseTokenUsageJsonl(content);
  expect(summary.entries[0].deltaET).toBe(0);
});

Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 6 tests (vitest)

Score Calculation

behavioral_ratio = (5 / 6) * 40 = 33.3 points
edge_case_ratio  = (0 / 6) * 30 = 0 points
duplication_penalty = 0 * 5 = 0 points
inflation_penalty = 10 points (5:1 exceeds 2:1 threshold)

score = 33.3 + 0 + (20 - 0) + (10 - 10)
score = 43.3 → 43/100

Thresholds:

  • ✅ Score ≥ 80: Excellent
  • ⚠️ Score 60-79: Acceptable
  • 🔶 Score 40-59: Needs improvement
  • ❌ Score < 40: Poor quality

Verdict

⚠️ Check passed (17% implementation tests, below 30% threshold; no coding violations)

The tests demonstrate strong behavioral coverage (83% design tests), verifying observable contracts like chronological ordering, deltaET calculations, and mathematical invariants. However, the score is penalized by:

  1. Zero error coverage — All 6 tests are happy-path only
  2. Test inflation — 5:1 ratio exceeds the 2:1 guideline
  3. One low-value test — Header text checking without value validation

Recommendation: Add error case tests for malformed input, empty arrays, and boundary conditions to improve resilience before merging.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · ● 5.7M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 43/100. Test quality is acceptable — 17% of new tests are implementation tests (threshold: 30%). Strong behavioral coverage (83% design tests) with good verification of chronological ordering, deltaET calculations, and mathematical invariants. Consider adding error case tests to improve resilience.

@pelikhan
Copy link
Copy Markdown
Collaborator

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot lint js see https://github.com/github/gh-aw/actions/runs/26191345722/job/77060347892?pr=33628

Fixed in the latest commit. The @returns JSDoc on parseTokenUsageJsonl was missing entries: Array in the return type, causing TypeScript to infer callers lacked that field when passed to generateTokenUsageSummary.

@pelikhan pelikhan merged commit 10e1b8b into main May 20, 2026
@pelikhan pelikhan deleted the copilot/improve-token-usage-rendering branch May 20, 2026 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants