Update token usage step summary to report AI credits by Copilot · Pull Request #37374 · github/gh-aw

Copilot · 2026-06-06T18:04:58Z

The JavaScript-generated token usage section now lands in GITHUB_STEP_SUMMARY and presents cost in AI credits rather than effective tokens. The summary table drops the effective-token columns and updates its legend to match the current pricing model.

Step summary wiring
- Append the token usage section directly to GITHUB_STEP_SUMMARY when the path is available.
- Render the section with a compact ### heading and collapsible <details> block so it reads cleanly in workflow runs.
Cost model terminology
- Replace summary-facing “effective tokens” cost language with “AI credits”.
- Update the legend to define per-request and running AI credit totals using the new model.
Token usage table
- Remove the ΔET and ET columns.
- Keep per-request and cumulative cost visibility via ΔAI Credits and AI Credits.
Coverage
- Extend the targeted JS tests to verify:
  - summary content is written to GITHUB_STEP_SUMMARY
  - the updated AI credits table shape
  - the new legend wording

### Token Usage

<details>
<summary>Per-request AI credits and token totals</summary>

| # | Alias | Input | Output | Cache Read | Cache Write | ΔAI Credits | AI Credits | Duration |
|--:|-------|------:|-------:|-----------:|------------:|-------------:|-----------:|---------:|
</details>

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-06-06T18:08:52Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #37374 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100). No custom .design-gate.yml config present. Neither enforcement condition is met.

github-actions · 2026-06-06T18:08:52Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-06T18:09:00Z

⚠️ PR Code Quality Reviewer failed during code quality review.

Copilot

Pull request overview

This PR updates the GitHub Actions step summary token-usage reporting to write directly to GITHUB_STEP_SUMMARY (when available) and to present cost using the “AI credits” pricing model, including a simplified table layout and updated legend text.

Changes:

Adds helpers to build and append a token-usage section to GITHUB_STEP_SUMMARY, with a compact ### heading and <details> wrapper.
Updates token usage markdown to remove effective-token columns and rename cost columns to AI credits, including a new legend.
Extends/adjusts JS tests to validate step-summary wiring, the new table shape, and updated wording.

Show a summary per file

File	Description
actions/setup/js/parse_token_usage.cjs	Adds step-summary section builder + writer that appends to `GITHUB_STEP_SUMMARY` or falls back to the Actions summary API.
actions/setup/js/parse_token_usage.test.cjs	Updates summary assertions and adds coverage for the `GITHUB_STEP_SUMMARY` append path and section wrapper.
actions/setup/js/parse_mcp_gateway_log.cjs	Updates token usage markdown generation to AI credits columns and adds an AI credits legend.
actions/setup/js/parse_mcp_gateway_log.test.cjs	Adjusts expectations to match the new AI credits-focused table/legend output.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 4/4 changed files
Comments generated: 3

+function buildStepSummarySection(title, markdown) {
+  return `### ${title}\n\n<details>\n<summary>Per-request AI credits and token totals</summary>\n\n${markdown}</details>\n\n`;
+}


+ * Renders one row per request in chronological order with per-request AI credits,
+ * a running AI credits total, followed by an aggregate totals row and legend.


  );

+  lines.push("");
+  lines.push("Legend: `Alias` shows the model shorthand used in the table. `ΔAI Credits` is the per-request cost, and `AI Credits` is the running total computed with the current AI credits pricing model.");


github-actions · 2026-06-06T18:09:39Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions

Skills-Based Review 🧠

Applied /tdd and /zoom-out — approving with minor suggestions on test coverage and naming.

📋 Key Themes & Highlights

Key Themes

Test naming drift: One test in parse_mcp_gateway_log.test.cjs kept its old ET-column name after the ET columns were removed; its assertions no longer match what the name implies.
Direct unit test gap: appendStepSummarySection is exported but only tested through main(); both its code paths (file write vs. core.summary fallback) merit a focused unit test.
Assertion completeness: Two addRaw assertions both match the same single call; adding toHaveBeenCalledTimes(1) would guard against accidental double-writes.

Non-diff note

The module-level JSDoc on line 11 of parse_token_usage.cjs still reads "via core.summary.addDetails" — worth updating to reflect the new direct file-append + addRaw path.

Positive Highlights

✅ Clean extraction of buildStepSummarySection / appendStepSummarySection separates formatting from I/O concerns
✅ New end-to-end GITHUB_STEP_SUMMARY integration test is thorough and uses a real temp file
✅ Terminology rename ("effective tokens" → "AI credits") is consistent across code, comments, and tests
✅ Column removal is clean — formatET import dropped, compoundedET variable removed, table header/total-row aligned correctly

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.2M · 366.1 AIC · ⌖ 13.4 AIC

github-actions · 2026-06-06T18:21:20Z

+      expect(md).not.toContain("effective token");
    });

    test("compounded ET equals sum of per-turn delta ET values", () => {


[/tdd] Stale test name — the ET columns were removed, so this test no longer checks ET values in the markdown.

The current assertions only verify **Total** exists and that totalEffectiveTokens > 0 on the summary object — neither of which is specific to the old ET-column behaviour. The name implies a table-content invariant that is no longer asserted.

💡 Suggested rename

Rename to something that reflects what is actually being asserted:

test("totalEffectiveTokens is positive (AIC-only table)", () => {

Also remove or update the inline comment at line 1697 — // The last entry's compounded ET equals totalEffectiveTokens so must appear in the table — since ET no longer appears in the table.

github-actions · 2026-06-06T18:21:21Z

+  extractRequestId,
+  readDedupedTokenUsage,
+  getSummaryTitle,
+  buildStepSummarySection,


[/tdd] appendStepSummarySection is exported but not imported or directly unit-tested.

buildStepSummarySection correctly gets its own unit test (line ~546), but the companion function appendStepSummarySection is only exercised indirectly through main(). Since it has two distinct code paths — direct file write vs. core.summary fallback — it's worth adding a focused test for each branch.

💡 Suggested approach

Import the function:

const { buildStepSummarySection, appendStepSummarySection, ... } = require("./parse_token_usage.cjs");

Then add:

test("appendStepSummarySection writes to GITHUB_STEP_SUMMARY when set", async () => { const summaryPath = path.join(tmpDir, "summary.md"); process.env.GITHUB_STEP_SUMMARY = summaryPath; await appendStepSummarySection("Title", "| col |"); expect(fs.readFileSync(summaryPath, "utf8")).toContain("### Title"); }); test("appendStepSummarySection falls back to core.summary when path is empty", async () => { process.env.GITHUB_STEP_SUMMARY = ""; await appendStepSummarySection("Title", "| col |"); expect(mockCore.summary.addRaw).toHaveBeenCalled(); expect(mockCore.summary.write).toHaveBeenCalled(); });

github-actions · 2026-06-06T18:21:21Z

@@ -159,7 +175,8 @@ describe("parse_token_usage", () => {

      await main();



[/tdd] Both assertions match the same single addRaw call, so neither guards against an accidental double-write regression.

Since appendStepSummarySection calls addRaw exactly once with the full section, adding a toHaveBeenCalledTimes(1) check would catch any future refactor that accidentally splits the section into multiple calls.

💡 Suggested addition

expect(mockCore.summary.addRaw).toHaveBeenCalledTimes(1); expect(mockCore.summary.addRaw).toHaveBeenCalledWith( expect.stringContaining("### Token Usage"), true );

The second assertion alone is sufficient to verify content; the toHaveBeenCalledTimes(1) closes the gap.

github-actions · 2026-06-06T18:22:49Z

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 73/100 — Acceptable

Analyzed 12 test(s) across 2 JavaScript test files: 12 design tests, 0 implementation tests, 0 guideline violations. Test inflation detected in one file.

📊 Metrics & Test Classification (12 tests analyzed)

Metric	Value
New/modified tests analyzed	12
✅ Design tests (behavioral contracts)	12 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	5 (42%)
Duplicate test clusters	0
Test inflation detected	✅ Yes — `parse_token_usage.test.cjs` (+71 test lines / +32 production lines ≈ 2.22:1)
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Issues Detected
`renders header and table columns`	`parse_mcp_gateway_log.test.cjs`	✅ Design	Happy path; column-name assertions verify observable output
`does not include effective token columns in table`	`parse_mcp_gateway_log.test.cjs`	✅ Design	Negative assertions verify removed columns stay absent
`includes AI credits columns in table header`	`parse_mcp_gateway_log.test.cjs`	✅ Design	Positive + negative assertions; verifies renamed columns
`includes an AI credits legend`	`parse_mcp_gateway_log.test.cjs`	✅ Design	Verifies legend section content and absence of old terminology
`does not include cache efficiency or effective token wording`	`parse_mcp_gateway_log.test.cjs`	✅ Design	Edge-case: high-cache-read scenario; negative assertions
`appends token usage summary to step summary`	`parse_token_usage.test.cjs`	✅ Design	Updated to verify `addRaw` heading + content; happy path
`uses custom title from env var`	`parse_token_usage.test.cjs`	✅ Design	Tests custom env var path; no error case
`appends token usage section to GITHUB_STEP_SUMMARY when configured`	`parse_token_usage.test.cjs`	✅ Design	New. Tests alternative code path (env var set); reads back actual file content; asserts `addRaw` not called
`renders model aliases in step summary`	`parse_token_usage.test.cjs`	✅ Design	Updated to `addRaw`; multi-model scenario
`renders multiple models in step summary`	`parse_token_usage.test.cjs`	✅ Design	Updated to `addRaw`; multi-model scenario
`renders three models in step summary`	`parse_token_usage.test.cjs`	✅ Design	Updated to `addRaw`; three-model scenario
`buildStepSummarySection wraps markdown in a heading and details block`	`parse_token_usage.test.cjs`	✅ Design	New. Verifies structure of formatting helper output

Language Support

Tests analyzed:

🐹 Go (*_test.go): 0 tests — no Go test files changed
🟨 JavaScript (*.test.cjs, *.test.js): 12 tests (vitest)

⚠️ Flagged Tests — Requires Review (2 issues)

⚠️ Test inflation in `parse_token_usage.test.cjs`

Classification: Test inflation (soft flag — not a hard failure)
Issue: 71 lines added to the test file vs. 32 lines added to the production file — a 2.22:1 ratio, exceeding the 2:1 threshold. The bulk of the growth comes from the new GITHUB_STEP_SUMMARY test, which includes extensive vi.fn() setup boilerplate.
What would break if deleted? The new code path (writing to the file-system step summary when GITHUB_STEP_SUMMARY is set) would have no test coverage.
Suggested improvement: Consider extracting the fs-mock setup into a shared helper to reduce per-test boilerplate and bring the ratio down while preserving coverage.

i️ No error-path tests added

Issue: None of the 12 changed tests include error-path coverage (.toThrow(), .toThrowError(), .rejects). The new GITHUB_STEP_SUMMARY code path has no test for what happens when fs.appendFileSync throws (e.g., unwritable path).
Suggested improvement: Add a test case for the error path in GITHUB_STEP_SUMMARY handling — e.g., fs.appendFileSync throws and the summary falls back or surfaces the error.

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). No coding-guideline violations detected.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §27069914142

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.3M · 403.3 AIC · ⌖ 19.7 AIC · ◷

github-actions

✅ Test Quality Sentinel: 73/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). All 12 analyzed tests verify observable behavioral contracts. Minor suggestions: consider error-path coverage for the new GITHUB_STEP_SUMMARY code path.

Copilot AI and others added 2 commits June 6, 2026 18:02

Update token usage summary wording

a087ecd

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Ensure token usage summary reaches step summary

217bc03

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan Jun 6, 2026

Copilot AI changed the title ~~Update token usage step summary to use AI credits~~ Update token usage step summary to report AI credits Jun 6, 2026

Copilot created this pull request from a session on behalf of pelikhan June 6, 2026 18:05 View session

Copilot finished work on behalf of pelikhan June 6, 2026 18:05

Copilot AI requested a review from pelikhan June 6, 2026 18:05

pelikhan marked this pull request as ready for review June 6, 2026 18:06

Copilot AI review requested due to automatic review settings June 6, 2026 18:06

Copilot started reviewing on behalf of pelikhan June 6, 2026 18:07 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 6, 2026

[aw] No-Op Runs #37180

Closed

github-actions Bot approved these changes Jun 6, 2026

View reviewed changes

pelikhan merged commit 0361c4c into main Jun 6, 2026
58 of 65 checks passed

pelikhan deleted the copilot/update-token-usage-section branch June 6, 2026 18:26

		* Renders one row per request in chronological order with per-request AI credits,
		* a running AI credits total, followed by an aggregate totals row and legend.

		@@ -159,7 +175,8 @@ describe("parse_token_usage", () => {

		await main();

Conversation

Copilot AI commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Non-diff note

Positive Highlights

Uh oh!

github-actions Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 6, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

⚠️ Test inflation in parse_token_usage.test.cjs

i️ No error-path tests added

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 6, 2026 •

edited

Loading

github-actions Bot commented Jun 6, 2026 •

edited

Loading

github-actions Bot commented Jun 6, 2026 •

edited

Loading

github-actions Bot commented Jun 6, 2026 •

edited

Loading

github-actions Bot commented Jun 6, 2026 •

edited

Loading

⚠️ Test inflation in `parse_token_usage.test.cjs`