feat(token-usage): per-turn rows with ΔET and compounded ET in step summary by Copilot · Pull Request #33628 · github/gh-aw

Copilot · 2026-05-20T20:43:44Z

The Token Usage step summary showed a single aggregated row per model, hiding per-turn progression. This replaces it with one row per API call in chronological order, showing both the per-turn delta ET and the running compounded ET.

Changes

`parse_mcp_gateway_log.cjs`

parseTokenUsageJsonl — adds entries[] to the returned summary; each entry records model, token counts, durationMs, and deltaET (per-turn ET via computeEffectiveTokens). byModel aggregates preserved for backward compat.
generateTokenUsageSummary — new table schema:

# Model Input Output Cache Read Cache Write ΔET ET Duration

1 gpt-… 23,192 285 0 0 1.2K 1.2K 5.6s

2 gpt-… 23,533 123 0 0 625 1.8K 3.4s

Total sum … … … ET dur
- ΔET — effective tokens for that turn
- ET — compounded running total through that turn
- Totals row, ● footer, and ET weight disclosure retained

`parse_mcp_gateway_log.test.cjs`

Updated header assertion, chronological-order test (replaces sort-by-tokens test), and ΔET/ET column checks
Added tests: entries[] population order, deltaET computation, and sum-of-deltas == totalEffectiveTokens

… step summary - parseTokenUsageJsonl: populate entries[] array with per-turn deltaET - generateTokenUsageSummary: show one row per turn (# | Model | tokens | ΔET | ET | Duration) in chronological order, plus aggregate totals row - Update/add tests for entries array, deltaET computation, new header, and chronological ordering Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-05-20T21:36:24Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #33628 does not have the 'implementation' label and has 0 new lines of code in business logic directories (threshold: 100).

github-actions · 2026-05-20T21:36:25Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-05-20T21:36:26Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-05-20T21:36:32Z

✅ PR Code Quality Reviewer completed the code quality review.

Copilot

Pull request overview

Updates the Token Usage step summary to show one row per API call (turn) in chronological order, including per-turn effective tokens (ΔET) and a running compounded ET total, while keeping model-level aggregates for compatibility.

Changes:

Extend parseTokenUsageJsonl to record per-turn entries[] including durationMs and computed deltaET.
Rework generateTokenUsageSummary to render a per-turn table with ΔET and running ET, plus a totals row and existing footer disclosures.
Update/add tests to validate entries[] ordering and presence of the new columns.

Show a summary per file

File	Description
actions/setup/js/parse_mcp_gateway_log.cjs	Adds per-turn `entries[]` and updates token usage markdown summary to per-turn rows with ΔET and running ET.
actions/setup/js/parse_mcp_gateway_log.test.cjs	Adjusts tests for the new table schema and adds coverage for `entries[]` and per-turn `deltaET`.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 2

+  const entries = summary.entries || [];
+  let compoundedET = 0;
+  for (let i = 0; i < entries.length; i++) {
+    const entry = entries[i];
+    const deltaET = Math.round(entry.deltaET || 0);
+    compoundedET += deltaET;
    lines.push(
-      `| ${model} | ${usage.inputTokens.toLocaleString()} | ${usage.outputTokens.toLocaleString()} | ${usage.cacheReadTokens.toLocaleString()} | ${usage.cacheWriteTokens.toLocaleString()} | ${et} | ${usage.requests} | ${formatDurationMs(usage.durationMs)} |`
+      `| ${i + 1} | ${entry.model} | ${entry.inputTokens.toLocaleString()} | ${entry.outputTokens.toLocaleString()} | ${entry.cacheReadTokens.toLocaleString()} | ${entry.cacheWriteTokens.toLocaleString()} | ${formatET(deltaET)} | ${formatET(compoundedET)} | ${formatDurationMs(entry.durationMs)} |`


+      expect(md).toContain("**Total**");
+      // The last entry's compounded ET equals totalEffectiveTokens so must appear in the table
+      const totalRounded = Math.round(summary.totalEffectiveTokens);
+      expect(totalRounded).toBeGreaterThan(0);


github-actions

Code Quality Review

✅ Approved — Clean refactoring with excellent test coverage

Summary

This PR successfully transforms token usage reporting from aggregated model summaries to chronological per-turn rows showing both delta (ΔET) and compounded effective tokens (ET). The implementation is well-structured and maintains backward compatibility.

What Was Reviewed

Logic correctness: ✅ Entry ordering, deltaET calculation, compounding logic all correct
Test coverage: ✅ Comprehensive tests for new entries array, deltaET computation, and sum validation
Backward compatibility: ✅ byModel aggregates preserved
Code clarity: ✅ JSDoc updated, variable names clear

Observations (Non-blocking)

Minor architectural note

The compoundedET running total is calculated during rendering in generateTokenUsageSummary(). This works correctly but couples calculation with view formatting. Consider computing this during parsing if the compounded sequence becomes a first-class concept used elsewhere.

This is fine for display-only use. No action needed unless compounded ET is used in other calculations.

Test Quality

✅ Excellent additions:

populates per-turn entries array in order — validates chronological preservation
computes deltaET for each entry — ensures per-turn ET calculation
sum of entry deltaET equals totalEffectiveTokens — critical invariant check
compounded ET equals sum of per-turn delta ET values — validates rendering math

Verdict

No issues found. The code is production-ready with strong test coverage and clear intent. Nice work on the progressive disclosure improvement for token usage visibility.

🔎 Code quality review by PR Code Quality Reviewer · ● 1.6M

github-actions · 2026-05-20T21:40:40Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #33628 does not have the 'implementation' label and has 0 new lines of code in default business logic directories (threshold: 100).

github-actions · 2026-05-20T21:41:11Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions

Skills-Based Review 🧠

Applied /tdd and /zoom-out based on this feature enhancement that changes token usage rendering from aggregated-by-model to per-turn chronological display.

Key Themes

/tdd findings:

Missing edge case tests: Empty entries array, single entry (no compounding), large entry counts
Test-implementation gap: The "compounded ET equals sum" test doesn't verify the actual rendered markdown values
Magic numbers: Floating-point precision 5 in toBeCloseTo() is undocumented

/zoom-out findings:

Separation of concerns: Compounding arithmetic happens during rendering rather than in data preparation
Visual consistency: Empty cells in the Totals row break table scannability
JSDoc completeness: Type annotation for entries[] could be more complete

Positive Highlights ✅

Backward compatibility: The byModel field is preserved for existing consumers
Clear naming: deltaET vs ET makes the distinction between per-turn and compounded values immediately obvious
Chronological ordering: This change unlocks per-turn analysis — excellent product decision
Test names are descriptive: Tests read as specifications ("populates per-turn entries array in order")

Verdict

This is a solid enhancement with good test coverage. The comments above suggest improvements to edge case handling and separation of concerns, but none are blocking. The chronological rendering is a meaningful UX improvement for understanding token consumption patterns.

The issues raised are about making the code more maintainable and resilient to future changes, not about correctness of the current implementation.

Skills applied: /tdd emphasizes test-driven development and comprehensive edge case coverage; /zoom-out provides architectural perspective on code organization and maintainability.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 4.7M

github-actions · 2026-05-20T21:41:41Z

+      const summary = parseTokenUsageJsonl(lines.join("\n"));
+      const sumDelta = summary.entries.reduce((acc, e) => acc + e.deltaET, 0);
+      expect(sumDelta).toBeCloseTo(summary.totalEffectiveTokens, 5);
+    });


[/tdd] Missing edge case test for empty entries array. The new chronological rendering should handle zero entries gracefully.

Consider adding:

test("handles empty entries array", () => { const summary = { totalRequests: 0, entries: [] }; const md = generateTokenUsageSummary(summary); expect(md).toBe(""); }); test("renders single entry without compounding confusion", () => { const content = JSON.stringify({ model: "m", input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 100 }); const summary = parseTokenUsageJsonl(content); const md = generateTokenUsageSummary(summary); // First and only row: ΔET should equal ET expect(summary.entries[0].deltaET).toBe(summary.totalEffectiveTokens); });

Edge cases are where compounding logic breaks in practice.

github-actions · 2026-05-20T21:41:42Z

+      expect(summary.entries[0].deltaET).toBeGreaterThan(0);
+    });
+
+    test("sum of entry deltaET equals totalEffectiveTokens", () => {


[/tdd] The precision value 5 in toBeCloseTo() should be documented or derived from the actual ET calculation precision.

Why 5 decimal places? Is this the known precision of computeEffectiveTokens()? Consider:

// Document the precision contract const ET_PRECISION = 5; // matches computeEffectiveTokens rounding expect(sumDelta).toBeCloseTo(summary.totalEffectiveTokens, ET_PRECISION);

Magic numbers in tests reduce future maintainability — if the ET calculation changes precision, this test should fail clearly.

github-actions · 2026-05-20T21:41:42Z

+      `| ${i + 1} | ${entry.model} | ${entry.inputTokens.toLocaleString()} | ${entry.outputTokens.toLocaleString()} | ${entry.cacheReadTokens.toLocaleString()} | ${entry.cacheWriteTokens.toLocaleString()} | ${formatET(deltaET)} | ${formatET(compoundedET)} | ${formatDurationMs(entry.durationMs)} |`
    );
  }



[/zoom-out] The Totals row has empty cells (| **Total** | |) which breaks the visual rhythm of the table.

Consider using semantic placeholders for clarity:

lines.push( `| **Total** | — | **${summary.totalInputTokens.toLocaleString()}** | ... | — | **${totalET}** | **${formatDurationMs(summary.totalDurationMs)}** |` );

Alternatively, use Σ (sum symbol) in the ΔET column to indicate "sum of deltas = total ET". Empty cells reduce scannability when users are trying to understand the table structure.

github-actions · 2026-05-20T21:41:42Z

      expect(md).toContain("●");
      expect(md).not.toContain("Cache efficiency");
    });
+


[/tdd] This test name says "compounded ET equals sum of per-turn delta ET values" but the test body only checks toBeGreaterThan(0) — it doesn't actually verify the compounding arithmetic.

Consider parsing the markdown and verifying the actual ET column progression:

test("compounded ET equals sum of per-turn delta ET values", () => { const lines = [ JSON.stringify({ model: "m", input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 100 }), JSON.stringify({ model: "m", input_tokens: 200, output_tokens: 100, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 200 }), ]; const summary = parseTokenUsageJsonl(lines.join("\n")); // Verify the arithmetic: row 2 ET = row 1 deltaET + row 2 deltaET const expectedET1 = Math.round(summary.entries[0].deltaET); const expectedET2 = Math.round(summary.entries[0].deltaET + summary.entries[1].deltaET); const md = generateTokenUsageSummary(summary); expect(md).toContain(formatET(expectedET1)); // First row ET expect(md).toContain(formatET(expectedET2)); // Second row compounded ET });

The test should verify what the user sees in the table, not just internal state.

github-actions · 2026-05-20T21:41:42Z

-    const et = formatET(Math.round(usage.effectiveTokens || 0));
+  lines.push("| # | Model | Input | Output | Cache Read | Cache Write | ΔET | ET | Duration |");
+  lines.push("|--:|-------|------:|-------:|-----------:|------------:|----:|---:|---------:|");
+


[/zoom-out] The compoundedET variable is mutated during rendering, coupling display logic with arithmetic. While this works (markdown is generated once), it violates separation of concerns.

Consider pre-computing compounded values in the data structure:

// After computing deltaET for each entry: let runningTotal = 0; for (const entry of summary.entries) { runningTotal += Math.round(entry.deltaET || 0); entry.compoundedET = runningTotal; } // Then rendering becomes pure: for (let i = 0; i < entries.length; i++) { const entry = entries[i]; lines.push( `| ${i + 1} | ${entry.model} | ... | ${formatET(entry.deltaET)} | ${formatET(entry.compoundedET)} | ...` ); }

This makes the rendering logic stateless and easier to test independently.

github-actions · 2026-05-20T21:42:42Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions

✅ Test Quality Sentinel: 50/100. All tests verify behavioral contracts (excellent!), but attention needed: (1) Test inflation at 5:1 ratio exceeds 2:1 threshold, (2) Zero error/edge case coverage across all 6 new tests. Consider adding tests for malformed inputs, missing fields, and boundary conditions to improve robustness.

github-actions

Code Quality Review Summary

This PR introduces a valuable improvement to token usage reporting by showing per-turn progression with ΔET and compounded ET columns. The implementation is well-tested and follows good coding practices.

The existing review comments (7 total) comprehensively cover all significant issues I identified during my review. Rather than duplicate feedback, I recommend addressing those comments, particularly:

Critical (Must Fix)

Rounding drift (line 156 in parse_mcp_gateway_log.cjs) — Accumulate unrounded deltaET values and round only for display to maintain consistency with totalEffectiveTokens

Important (Should Fix)

Weak test assertion (line 1477) — Test doesn't verify actual compounded ET arithmetic
Missing edge cases (line 1399) — Add tests for empty entries and single entry scenarios
Separation of concerns (line 148) — Pre-compute compounded values in data structure rather than mutating during rendering

Nice to Have

Magic number documentation (line 1391) — Document the precision contract for toBeCloseTo()
Table formatting (line 159) — Use semantic placeholders instead of empty cells

Overall: The PR is well-executed. Once the rounding drift issue is addressed, this will be a solid addition to the codebase.

🔎 Code quality review by PR Code Quality Reviewer · ● 2.4M

github-actions · 2026-05-20T21:45:54Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions

Skills-Based Review 🧠

Applied /tdd and /zoom-out based on the feature addition with table rendering refactor in this PR.

Key Themes

Test Coverage (/tdd):

✅ Good foundation: tests added for new entries[] array, chronological ordering, and delta ET computation
⚠️ Weak assertion in "compounded ET equals sum" test — validates presence but not correctness
⚠️ Missing edge case: empty entries[] array (common when MCP gateway log is absent)

Architecture (/zoom-out):

✅ Backward compatibility preserved: byModel aggregates still populated
✅ Clear JSDoc updates explaining the new chronological rendering
⚠️ Performance: separate loop for deltaET computation could be merged into initial parsing
⚠️ Naming: compoundedET vs. totalEffectiveTokens — same concept, different names
⚠️ UX: empty Model cell in Total row may appear as a rendering bug

Positive Highlights

✅ Excellent user value: per-turn progression is much more useful than aggregated-by-model view
✅ Clean separation: parsing logic remains distinct from rendering logic
✅ Test names are descriptive and read as specifications
✅ toBeCloseTo used correctly for floating-point comparisons

Verdict

No blocking issues — this is solid work. The suggestions are optimizations and test hardening for future maintainability. Approving with optional improvements.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 4.7M

github-actions · 2026-05-20T21:48:35Z

+      const md = generateTokenUsageSummary(summary);
+      // Total row and last data row ET should both show the overall total ET
+      expect(md).toContain("**Total**");
+      // The last entry's compounded ET equals totalEffectiveTokens so must appear in the table


[/tdd] This test assertion is too weak — it only verifies totalRounded > 0 but doesn't validate that the compounded ET in the table actually matches expectations.

Consider asserting the actual compounded values:

const delta1 = summary.entries[0].deltaET; const delta2 = summary.entries[1].deltaET; const expectedCompounded = Math.round(delta1 + delta2); expect(expectedCompounded).toBe(Math.round(summary.totalEffectiveTokens)); // Also verify the compounded value appears in the markdown output const formatETExpected = formatET(expectedCompounded); expect(md).toContain(formatETExpected);

Without this, a bug in the compounding logic could pass the test.

github-actions · 2026-05-20T21:48:35Z

  }

  const totalET = formatET(Math.round(summary.totalEffectiveTokens || 0));
  lines.push(


[/zoom-out] The empty cell in the Total row for the Model column may confuse users.

Consider either:

Adding explicit text: | **Total** | **(all models)** | ...

Or a comment explaining why it's empty

The | **Total** | | pattern creates a visually ambiguous table cell that might appear as a rendering bug.

github-actions · 2026-05-20T21:48:35Z

@@ -1378,7 +1408,7 @@ not-json
    test("renders header and table columns", () => {


[/tdd] Missing edge case test: what happens when summary.entries is empty?

Consider adding:

test("handles empty entries array gracefully", () => { const summary = parseTokenUsageJsonl(""); const md = generateTokenUsageSummary(summary); expect(md).toBe(""); // Should return empty string for no data });

Empty data is a common real-world scenario (e.g., MCP gateway log doesn't exist or contains no token usage).

github-actions · 2026-05-20T21:48:35Z

+  lines.push("| # | Model | Input | Output | Cache Read | Cache Write | ΔET | ET | Duration |");
+  lines.push("|--:|-------|------:|-------:|-----------:|------------:|----:|---:|---------:|");
+
+  const entries = summary.entries || [];


[/zoom-out] Naming inconsistency: the local variable compoundedET (line 149) represents the same concept as summary.totalEffectiveTokens (line 123), just accumulated progressively.

For clarity, consider renaming to runningET or cumulativeET to emphasize that it's a running total, not a final "compounded" value. This makes the relationship between the loop variable and the final totalEffectiveTokens more obvious.

github-actions · 2026-05-20T21:48:35Z

  }
  summary.totalEffectiveTokens = totalEffectiveTokens;

+  // Compute per-turn delta ET


[/zoom-out] Performance opportunity: the deltaET computation could be done during the initial loop (line 106) rather than in a separate pass.

// Inside the main loop after line 106: const deltaET = computeEffectiveTokens(model, inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens); summary.entries.push({ model, inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens, durationMs, deltaET });

This eliminates a full second iteration over entries[], which could matter for long-running workflows with many API calls.

github-actions · 2026-05-20T21:52:56Z

🧪 Test Quality Sentinel Report

Test Quality Score: 43/100

🔶 Needs improvement

Metric	Value
New/modified tests analyzed	6
✅ Design tests (behavioral contracts)	5 (83%)
⚠️ Implementation tests (low value)	1 (17%)
Tests with error/edge cases	0 (0%)
Duplicate test clusters	0
Test inflation detected	Yes (5:1 ratio)
🚨 Coding-guideline violations	0

Test Classification Details

View detailed per-test classification (6 tests)

Test	File	Classification	Issues Detected
`populates per-turn entries array in order`	`actions/setup/js/parse_mcp_gateway_log.test.cjs:1370`	✅ Design	None - verifies ordering contract
`computes deltaET for each entry`	`actions/setup/js/parse_mcp_gateway_log.test.cjs:1384`	✅ Design	None - verifies field computation
`sum of entry deltaET equals totalEffectiveTokens`	`actions/setup/js/parse_mcp_gateway_log.test.cjs:1391`	✅ Design	None - verifies mathematical invariant
`renders rows in chronological (input) order`	`actions/setup/js/parse_mcp_gateway_log.test.cjs:1428`	✅ Design	None - enforces behavioral change
`includes ΔET and ET columns in table`	`actions/setup/js/parse_mcp_gateway_log.test.cjs:1440`	⚠️ Implementation	Only checks header text, not values
`compounded ET equals sum of per-turn delta ET values`	`actions/setup/js/parse_mcp_gateway_log.test.cjs:1465`	✅ Design	None - verifies cumulative calculation

Flagged Tests — Requires Review

⚠️ `includes ΔET and ET columns in table` (`actions/setup/js/parse_mcp_gateway_log.test.cjs:1440`)

Classification: Implementation test
Issue: Only asserts that the header text "| ΔET |" exists in the markdown output, not that the column contains correct deltaET values for each row.

What design invariant does this test enforce?
None — it verifies literal header text, not observable behavior. A bug where deltaET values are all zero would pass this test.

What would break if deleted?
Only if the column header text changed cosmetically (e.g., "ΔET" → "Delta ET"). Behavioral regressions in the deltaET calculation would not be caught.

Suggested improvement: Replace with an assertion that verifies actual deltaET values appear in the rendered table rows. Example:

test("renders deltaET values for each row", () => {
  const lines = [
    JSON.stringify({ model: "m1", input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 100 }),
    JSON.stringify({ model: "m2", input_tokens: 200, output_tokens: 100, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 200 }),
  ];
  const summary = parseTokenUsageJsonl(lines.join("\n"));
  const md = generateTokenUsageSummary(summary);
  
  // Verify deltaET values appear (not just the header)
  const deltaET1 = summary.entries[0].deltaET;
  const deltaET2 = summary.entries[1].deltaET;
  expect(md).toContain(String(Math.round(deltaET1)));
  expect(md).toContain(String(Math.round(deltaET2)));
});

Red Flags Summary

1. ⚠️ Test Inflation (5:1 ratio)

Production file: +27 lines, -18 lines (net +9)
Test file: +54 lines, -9 lines (net +45)
Ratio: 5:1 (exceeds 2:1 guideline)

Context: The feature adds per-turn token tracking with deltaET and compounded ET calculations. While the implementation is concise, thorough testing of the new behavior requires multiple scenarios (ordering, calculations, rendering). The tests verify distinct aspects rather than duplicating patterns.

Severity: Medium — The high ratio is justified by the complexity of behavioral verification needed, but watch for further growth.

2. ⚠️ Zero Error Coverage

None of the 6 tests include error cases or edge conditions:

❌ No tests for malformed JSONL input (invalid JSON, missing required fields)
❌ No tests for empty entries array
❌ No tests for zero or negative token values
❌ No tests for division-by-zero scenarios in deltaET calculation
❌ No tests for numeric precision edge cases (very large token counts)

Severity: Medium — Happy-path coverage is strong, but error resilience is completely untested.

Recommendation: Add at least 2-3 error case tests before merging:

test("handles malformed JSONL gracefully", () => {
  const content = "not valid json\n{\"model\":\"m\"}";
  const summary = parseTokenUsageJsonl(content);
  // Should not throw, should skip invalid lines
  expect(summary).toBeDefined();
});

test("handles empty entries array", () => {
  const summary = parseTokenUsageJsonl("");
  expect(summary.entries).toEqual([]);
  expect(summary.totalEffectiveTokens).toBe(0);
});

test("handles zero token values", () => {
  const content = JSON.stringify({ model: "m", input_tokens: 0, output_tokens: 0, cache_read_tokens: 0, cache_write_tokens: 0, duration_ms: 0 });
  const summary = parseTokenUsageJsonl(content);
  expect(summary.entries[0].deltaET).toBe(0);
});

Language Support

Tests analyzed:

🟨 JavaScript (*.test.cjs): 6 tests (vitest)

Score Calculation

behavioral_ratio = (5 / 6) * 40 = 33.3 points
edge_case_ratio  = (0 / 6) * 30 = 0 points
duplication_penalty = 0 * 5 = 0 points
inflation_penalty = 10 points (5:1 exceeds 2:1 threshold)

score = 33.3 + 0 + (20 - 0) + (10 - 10)
score = 43.3 → 43/100

Thresholds:

✅ Score ≥ 80: Excellent
⚠️ Score 60-79: Acceptable
🔶 Score 40-59: Needs improvement
❌ Score < 40: Poor quality

Verdict

⚠️ Check passed (17% implementation tests, below 30% threshold; no coding violations)

The tests demonstrate strong behavioral coverage (83% design tests), verifying observable contracts like chronological ordering, deltaET calculations, and mathematical invariants. However, the score is penalized by:

Zero error coverage — All 6 tests are happy-path only
Test inflation — 5:1 ratio exceeds the 2:1 guideline
One low-value test — Header text checking without value validation

Recommendation: Add error case tests for malformed input, empty arrays, and boundary conditions to improve resilience before merging.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · ● 5.7M · ◷

github-actions

✅ Test Quality Sentinel: 43/100. Test quality is acceptable — 17% of new tests are implementation tests (threshold: 30%). Strong behavioral coverage (83% design tests) with good verification of chronological ordering, deltaET calculations, and mathematical invariants. Consider adding error case tests to improve resilience.

pelikhan · 2026-05-20T22:27:30Z

@copilot lint js see https://github.com/github/gh-aw/actions/runs/26191345722/job/77060347892?pr=33628

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-20T22:33:11Z

@copilot lint js see https://github.com/github/gh-aw/actions/runs/26191345722/job/77060347892?pr=33628

Fixed in the latest commit. The @returns JSDoc on parseTokenUsageJsonl was missing entries: Array in the return type, causing TypeScript to infer callers lacked that field when passed to generateTokenUsageSummary.

Copilot AI assigned Copilot and pelikhan May 20, 2026

Copilot finished work on behalf of pelikhan May 20, 2026 20:45

Copilot created this pull request from a session on behalf of pelikhan May 20, 2026 20:45 View session

Copilot AI requested a review from pelikhan May 20, 2026 20:45

pelikhan marked this pull request as ready for review May 20, 2026 21:34

Copilot AI review requested due to automatic review settings May 20, 2026 21:35

Copilot started reviewing on behalf of pelikhan May 20, 2026 21:36 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 20, 2026

[aw] No-Op Runs #32279

Closed

github-actions Bot approved these changes May 20, 2026

View reviewed changes

github-actions Bot reviewed May 20, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

github-actions Bot reviewed May 20, 2026

View reviewed changes

github-actions Bot added the ai-inspected label May 20, 2026

github-actions Bot approved these changes May 20, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 20, 2026 22:27 View session

fix(token-usage): add entries to @returns JSDoc to fix TS type error

9c71c61

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 20, 2026 22:34

pelikhan merged commit 10e1b8b into main May 20, 2026

pelikhan deleted the copilot/improve-token-usage-rendering branch May 20, 2026 22:41

github-actions Bot mentioned this pull request May 21, 2026

[docs] Update documentation for features from 2026-05-21 #33745

Merged

#	Model	Input	Output	Cache Read	Cache Write	ΔET	ET	Duration
1	gpt-…	23,192	285	0	0	1.2K	1.2K	5.6s
2	gpt-…	23,533	123	0	0	625	1.8K	3.4s
Total		sum	…	…	…		ET	dur

		@@ -1378,7 +1408,7 @@ not-json
		test("renders header and table columns", () => {

Conversation

Copilot AI commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

parse_mcp_gateway_log.cjs

parse_mcp_gateway_log.test.cjs

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Quality Review

Summary

What Was Reviewed

Observations (Non-blocking)

Test Quality

Verdict

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights ✅

Verdict

Uh oh!

github-actions Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Quality Review Summary

Critical (Must Fix)

Important (Should Fix)

Nice to Have

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Copilot AI commented May 20, 2026 •

edited

Loading

`parse_mcp_gateway_log.cjs`

`parse_mcp_gateway_log.test.cjs`

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

⚠️ `includes ΔET and ET columns in table` (`actions/setup/js/parse_mcp_gateway_log.test.cjs:1440`)