Skip to content

Update token usage step summary to report AI credits#37374

Merged
pelikhan merged 2 commits into
mainfrom
copilot/update-token-usage-section
Jun 6, 2026
Merged

Update token usage step summary to report AI credits#37374
pelikhan merged 2 commits into
mainfrom
copilot/update-token-usage-section

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jun 6, 2026

The JavaScript-generated token usage section now lands in GITHUB_STEP_SUMMARY and presents cost in AI credits rather than effective tokens. The summary table drops the effective-token columns and updates its legend to match the current pricing model.

  • Step summary wiring

    • Append the token usage section directly to GITHUB_STEP_SUMMARY when the path is available.
    • Render the section with a compact ### heading and collapsible <details> block so it reads cleanly in workflow runs.
  • Cost model terminology

    • Replace summary-facing “effective tokens” cost language with “AI credits”.
    • Update the legend to define per-request and running AI credit totals using the new model.
  • Token usage table

    • Remove the ΔET and ET columns.
    • Keep per-request and cumulative cost visibility via ΔAI Credits and AI Credits.
  • Coverage

    • Extend the targeted JS tests to verify:
      • summary content is written to GITHUB_STEP_SUMMARY
      • the updated AI credits table shape
      • the new legend wording
### Token Usage

<details>
<summary>Per-request AI credits and token totals</summary>

| # | Alias | Input | Output | Cache Read | Cache Write | ΔAI Credits | AI Credits | Duration |
|--:|-------|------:|-------:|-----------:|------------:|-------------:|-----------:|---------:|
</details>

Copilot AI and others added 2 commits June 6, 2026 18:02
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title Update token usage step summary to use AI credits Update token usage step summary to report AI credits Jun 6, 2026
Copilot AI requested a review from pelikhan June 6, 2026 18:05
@pelikhan pelikhan marked this pull request as ready for review June 6, 2026 18:06
Copilot AI review requested due to automatic review settings June 6, 2026 18:06
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #37374 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100). No custom .design-gate.yml config present. Neither enforcement condition is met.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

⚠️ PR Code Quality Reviewer failed during code quality review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the GitHub Actions step summary token-usage reporting to write directly to GITHUB_STEP_SUMMARY (when available) and to present cost using the “AI credits” pricing model, including a simplified table layout and updated legend text.

Changes:

  • Adds helpers to build and append a token-usage section to GITHUB_STEP_SUMMARY, with a compact ### heading and <details> wrapper.
  • Updates token usage markdown to remove effective-token columns and rename cost columns to AI credits, including a new legend.
  • Extends/adjusts JS tests to validate step-summary wiring, the new table shape, and updated wording.
Show a summary per file
File Description
actions/setup/js/parse_token_usage.cjs Adds step-summary section builder + writer that appends to GITHUB_STEP_SUMMARY or falls back to the Actions summary API.
actions/setup/js/parse_token_usage.test.cjs Updates summary assertions and adds coverage for the GITHUB_STEP_SUMMARY append path and section wrapper.
actions/setup/js/parse_mcp_gateway_log.cjs Updates token usage markdown generation to AI credits columns and adds an AI credits legend.
actions/setup/js/parse_mcp_gateway_log.test.cjs Adjusts expectations to match the new AI credits-focused table/legend output.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 3

Comment on lines +101 to +103
function buildStepSummarySection(title, markdown) {
return `### ${title}\n\n<details>\n<summary>Per-request AI credits and token totals</summary>\n\n${markdown}</details>\n\n`;
}
Comment on lines +165 to +166
* Renders one row per request in chronological order with per-request AI credits,
* a running AI credits total, followed by an aggregate totals row and legend.
);

lines.push("");
lines.push("Legend: `Alias` shows the model shorthand used in the table. `ΔAI Credits` is the per-request cost, and `AI Credits` is the running total computed with the current AI credits pricing model.");
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions github-actions Bot mentioned this pull request Jun 6, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /zoom-out — approving with minor suggestions on test coverage and naming.

📋 Key Themes & Highlights

Key Themes

  • Test naming drift: One test in parse_mcp_gateway_log.test.cjs kept its old ET-column name after the ET columns were removed; its assertions no longer match what the name implies.
  • Direct unit test gap: appendStepSummarySection is exported but only tested through main(); both its code paths (file write vs. core.summary fallback) merit a focused unit test.
  • Assertion completeness: Two addRaw assertions both match the same single call; adding toHaveBeenCalledTimes(1) would guard against accidental double-writes.

Non-diff note

The module-level JSDoc on line 11 of parse_token_usage.cjs still reads "via core.summary.addDetails" — worth updating to reflect the new direct file-append + addRaw path.

Positive Highlights

  • ✅ Clean extraction of buildStepSummarySection / appendStepSummarySection separates formatting from I/O concerns
  • ✅ New end-to-end GITHUB_STEP_SUMMARY integration test is thorough and uses a real temp file
  • ✅ Terminology rename ("effective tokens" → "AI credits") is consistent across code, comments, and tests
  • ✅ Column removal is clean — formatET import dropped, compoundedET variable removed, table header/total-row aligned correctly

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.2M · 366.1 AIC · ⌖ 13.4 AIC

expect(md).not.toContain("effective token");
});

test("compounded ET equals sum of per-turn delta ET values", () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Stale test name — the ET columns were removed, so this test no longer checks ET values in the markdown.

The current assertions only verify **Total** exists and that totalEffectiveTokens > 0 on the summary object — neither of which is specific to the old ET-column behaviour. The name implies a table-content invariant that is no longer asserted.

💡 Suggested rename

Rename to something that reflects what is actually being asserted:

test("totalEffectiveTokens is positive (AIC-only table)", () => {

Also remove or update the inline comment at line 1697 — // The last entry's compounded ET equals totalEffectiveTokens so must appear in the table — since ET no longer appears in the table.

extractRequestId,
readDedupedTokenUsage,
getSummaryTitle,
buildStepSummarySection,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] appendStepSummarySection is exported but not imported or directly unit-tested.

buildStepSummarySection correctly gets its own unit test (line ~546), but the companion function appendStepSummarySection is only exercised indirectly through main(). Since it has two distinct code paths — direct file write vs. core.summary fallback — it's worth adding a focused test for each branch.

💡 Suggested approach

Import the function:

const {
  buildStepSummarySection,
  appendStepSummarySection,
  ...
} = require("./parse_token_usage.cjs");

Then add:

test("appendStepSummarySection writes to GITHUB_STEP_SUMMARY when set", async () => {
  const summaryPath = path.join(tmpDir, "summary.md");
  process.env.GITHUB_STEP_SUMMARY = summaryPath;
  await appendStepSummarySection("Title", "| col |");
  expect(fs.readFileSync(summaryPath, "utf8")).toContain("### Title");
});

test("appendStepSummarySection falls back to core.summary when path is empty", async () => {
  process.env.GITHUB_STEP_SUMMARY = "";
  await appendStepSummarySection("Title", "| col |");
  expect(mockCore.summary.addRaw).toHaveBeenCalled();
  expect(mockCore.summary.write).toHaveBeenCalled();
});

@@ -159,7 +175,8 @@ describe("parse_token_usage", () => {

await main();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Both assertions match the same single addRaw call, so neither guards against an accidental double-write regression.

Since appendStepSummarySection calls addRaw exactly once with the full section, adding a toHaveBeenCalledTimes(1) check would catch any future refactor that accidentally splits the section into multiple calls.

💡 Suggested addition
expect(mockCore.summary.addRaw).toHaveBeenCalledTimes(1);
expect(mockCore.summary.addRaw).toHaveBeenCalledWith(
  expect.stringContaining("### Token Usage"),
  true
);

The second assertion alone is sufficient to verify content; the toHaveBeenCalledTimes(1) closes the gap.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 73/100 — Acceptable

Analyzed 12 test(s) across 2 JavaScript test files: 12 design tests, 0 implementation tests, 0 guideline violations. Test inflation detected in one file.

📊 Metrics & Test Classification (12 tests analyzed)
Metric Value
New/modified tests analyzed 12
✅ Design tests (behavioral contracts) 12 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 5 (42%)
Duplicate test clusters 0
Test inflation detected ✅ Yes — parse_token_usage.test.cjs (+71 test lines / +32 production lines ≈ 2.22:1)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
renders header and table columns parse_mcp_gateway_log.test.cjs ✅ Design Happy path; column-name assertions verify observable output
does not include effective token columns in table parse_mcp_gateway_log.test.cjs ✅ Design Negative assertions verify removed columns stay absent
includes AI credits columns in table header parse_mcp_gateway_log.test.cjs ✅ Design Positive + negative assertions; verifies renamed columns
includes an AI credits legend parse_mcp_gateway_log.test.cjs ✅ Design Verifies legend section content and absence of old terminology
does not include cache efficiency or effective token wording parse_mcp_gateway_log.test.cjs ✅ Design Edge-case: high-cache-read scenario; negative assertions
appends token usage summary to step summary parse_token_usage.test.cjs ✅ Design Updated to verify addRaw heading + content; happy path
uses custom title from env var parse_token_usage.test.cjs ✅ Design Tests custom env var path; no error case
appends token usage section to GITHUB_STEP_SUMMARY when configured parse_token_usage.test.cjs ✅ Design New. Tests alternative code path (env var set); reads back actual file content; asserts addRaw not called
renders model aliases in step summary parse_token_usage.test.cjs ✅ Design Updated to addRaw; multi-model scenario
renders multiple models in step summary parse_token_usage.test.cjs ✅ Design Updated to addRaw; multi-model scenario
renders three models in step summary parse_token_usage.test.cjs ✅ Design Updated to addRaw; three-model scenario
buildStepSummarySection wraps markdown in a heading and details block parse_token_usage.test.cjs ✅ Design New. Verifies structure of formatting helper output

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 0 tests — no Go test files changed
  • 🟨 JavaScript (*.test.cjs, *.test.js): 12 tests (vitest)
⚠️ Flagged Tests — Requires Review (2 issues)

⚠️ Test inflation in parse_token_usage.test.cjs

Classification: Test inflation (soft flag — not a hard failure)
Issue: 71 lines added to the test file vs. 32 lines added to the production file — a 2.22:1 ratio, exceeding the 2:1 threshold. The bulk of the growth comes from the new GITHUB_STEP_SUMMARY test, which includes extensive vi.fn() setup boilerplate.
What would break if deleted? The new code path (writing to the file-system step summary when GITHUB_STEP_SUMMARY is set) would have no test coverage.
Suggested improvement: Consider extracting the fs-mock setup into a shared helper to reduce per-test boilerplate and bring the ratio down while preserving coverage.


i️ No error-path tests added

Issue: None of the 12 changed tests include error-path coverage (.toThrow(), .toThrowError(), .rejects). The new GITHUB_STEP_SUMMARY code path has no test for what happens when fs.appendFileSync throws (e.g., unwritable path).
Suggested improvement: Add a test case for the error path in GITHUB_STEP_SUMMARY handling — e.g., fs.appendFileSync throws and the summary falls back or surfaces the error.

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). No coding-guideline violations detected.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §27069914142

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.3M · 403.3 AIC · ⌖ 19.7 AIC ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 73/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). All 12 analyzed tests verify observable behavioral contracts. Minor suggestions: consider error-path coverage for the new GITHUB_STEP_SUMMARY code path.

@pelikhan pelikhan merged commit 0361c4c into main Jun 6, 2026
58 of 65 checks passed
@pelikhan pelikhan deleted the copilot/update-token-usage-section branch June 6, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants