Add structured diagnostics to the daily workflow ET guardrail by Copilot · Pull Request #36164 · github/gh-aw

Copilot · 2026-06-01T01:29:54Z

The daily effective-token guardrail did not emit enough runtime detail to explain what it queried or why a run was counted or skipped. This change adds structured logs around workflow run discovery, artifact selection/download, and ET accumulation so workflow output is debuggable without re-running blindly.

What changed
- Added a small structured logging helper for the daily ET guardrail with a stable [daily-workflow-et] prefix.
- Logged the resolved execution context up front: workflow, actor, threshold, and API budget.
- Logged workflow run pagination and the run IDs returned for each query page.
- Logged candidate run selection before artifact inspection, including truncation due to rate-limit headroom.
- Logged artifact enumeration, chosen artifact ID/name, download location, resolved token-usage file, and computed ET per run.
- Logged ET accumulation state after each counted run and the final inspection summary.
Guardrails in the logging
- Made log serialization defensive so diagnostics do not fail the guardrail path.
- Tightened malformed-artifact handling to avoid ambiguous logs when artifact metadata is incomplete.
- Kept the new output additive; guardrail decisions and summary behavior remain unchanged.
Tests
- Updated the JS test coverage to assert the structured log message format without relying on brittle string ordering.

Example

logDailyGuardrail("Updated current ET state", {
  runId: run.id,
  runEffectiveTokens,
  currentEffectiveTokens: totalEffectiveTokens,
  threshold,
  countedRunIds: countedRuns.map(item => item.id),
});

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

This PR improves debuggability of the daily effective-token (ET) workflow guardrail by adding consistently prefixed, structured diagnostics across workflow-run discovery, artifact selection/download, and per-run ET accumulation, plus a test to validate the log format.

Changes:

Added formatDailyGuardrailLogMessage / logDailyGuardrail helpers and emitted structured log lines at key inspection steps.
Hardened artifact selection by skipping artifacts missing required metadata (e.g., missing id) and logging the reason.
Updated JS tests to assert the structured log format in a stable way (prefix + JSON payload).

Show a summary per file

File	Description
actions/setup/js/check_daily_effective_workflow_guardrail.cjs	Adds structured diagnostic logging around run pagination, artifact discovery/download, and ET accumulation.
actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs	Adds coverage for the structured log message formatter.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 3

+  const artifactSummaries = artifacts.map(item => ({ id: item?.id ?? null, name: item?.name || "" }));
+  logDailyGuardrail("Listed workflow artifacts", {
+    runId,
+    artifactCount: artifacts.length,
+    artifacts: artifactSummaries,
+  });


+    logDailyGuardrail("Received workflow runs page", {
+      page,
+      runCount: runs.length,
+      runIds: runs.map(run => run?.id).filter(Boolean),
+    });


+  logDailyGuardrail("Prepared candidate workflow runs for artifact inspection", {
+    candidateRunsCount: candidateRuns.length,
+    candidateRunIds: candidateRuns.map(run => run.id),
+    maxInspectableRuns,
+    truncatedByRateLimit,
+  });


github-actions · 2026-06-01T03:01:54Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-01T03:01:56Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #36164 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100). Neither enforcement condition is met.

github-actions · 2026-06-01T03:01:59Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-06-01T03:02:00Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 754.1K

github-actions · 2026-06-01T03:05:25Z

+      logDailyGuardrail("Updated current ET state", {
+        runId: run.id,
+        runEffectiveTokens,
+        currentEffectiveTokens: totalEffectiveTokens,


O(n2) log output inside the per-run loop: countedRunIds: countedRuns.map(item => item.id) remaps the full (and ever-growing) countedRuns array on every iteration, producing quadratic work and log volume as counted runs accumulate.

💡 Suggested fix

Log only the most recently added run ID, not the full list on each iteration:

logDailyGuardrail("Updated current ET state", { runId: run.id, runEffectiveTokens, currentEffectiveTokens: totalEffectiveTokens, threshold, countedRunsCount: countedRuns.length, // just the count });

The full countedRunIds list is already emitted once at the end in the "Completed ET inspection window" log, so repeating it on every iteration adds no diagnostic value and compounds the log-volume concern already flagged in existing review threads.

github-actions · 2026-06-01T03:06:47Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 100/100 — Excellent

Analyzed 7 test(s): 7 design, 0 implementation, 0 guideline violation(s).

📊 Metrics & Test Classification (7 tests analyzed)

Metric	Value
New/modified tests analyzed	7
✅ Design tests (behavioral contracts)	7 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	6 (86%)
Duplicate test clusters	0
Test inflation detected	No (16 test lines added vs 116 production lines — ratio 0.14)
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Issues Detected
`skips workflow_call, repository_dispatch, and workflow_dispatch with aw_context`	`actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs`	✅ Design	—
`matches both firewall-audit-logs and unified agent artifacts`	`actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs`	✅ Design	—
`sums effective tokens from explicit token-usage entries`	`actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs`	✅ Design	—
`computes aggregate ET statistics for prior runs`	`actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs`	✅ Design	—
`caps inspection when GitHub API rate limit headroom is low`	`actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs`	✅ Design	—
`formats structured daily ET log messages`	`actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs`	✅ Design	—
`renders a daily ET details summary with stats and prior runs`	`actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs`	✅ Design	—

Language Support

Tests analyzed:

🐹 Go (*_test.go): 0 tests
🟨 JavaScript (*.test.cjs, *.test.js): 7 tests (vitest)

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). All 7 tests verify observable behavioral contracts with strong edge-case and boundary coverage.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.1M · ◷

github-actions

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%).

github-actions

Skills-Based Review 🧠

Applied /tdd and /zoom-out — requesting changes on test-coverage gaps for the two bundled behaviour changes.

📋 Key Themes & Highlights

Key Themes

Behaviour changes without tests: Two silent-return guards (item?.name && and !artifact.id) alter existing control flow but have no test coverage. These are the blocking items.
Log verbosity at scale: runIds per-page (up to 100 IDs × multiple pages) and countedRunIds per-iteration (growing array) could generate unexpectedly large log output in high-volume repos.
Error context lost in catch: The formatter's catch block discards the original serialisation error, making future debugging harder.

Positive Highlights

✅ Stable [daily-workflow-et] prefix makes log lines grep-friendly and easy to filter.
✅ Defensive try/catch around JSON.stringify is exactly right — diagnostics must not break the guardrail path.
✅ formatDailyGuardrailLogMessage is correctly exported and unit-tested; the test's JSON.parse assertion is a clean way to verify structured output without brittle string matching.
✅ Structured log at the final "Completed ET inspection window" gives a complete picture in one place.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.8M

github-actions · 2026-06-01T03:09:03Z

+  }
+  if (!artifact.id) {
+    logDailyGuardrail("Skipping guardrail artifact without an id", {
+      runId,


[/tdd] The new artifact.id null-guard is a behaviour change with no test coverage — an artifact whose name matches but whose id is falsy will silently yield 0 ET instead of proceeding to downloadArtifact.

💡 Suggested test

Add a unit test that stubs listArtifacts to return [{ id: undefined, name: "guardrail-artifact" }] and asserts that getRunEffectiveTokens returns 0 and emits the "Skipping guardrail artifact without an id" log message.

This pins both the guard and the new structured log emitted on that path.

github-actions · 2026-06-01T03:09:03Z

+  });

-  const artifact = artifacts.find(item => matchesGuardrailArtifactName(item.name));
+  const artifact = artifacts.find(item => item?.name && matchesGuardrailArtifactName(item.name));


[/tdd] Changing the predicate from matchesGuardrailArtifactName(item.name) to item?.name && matchesGuardrailArtifactName(item.name) is a behaviour change bundled into logging work — artifacts with a falsy name are now silently skipped. No test covers this path.

💡 Suggested test

Add a case where listArtifacts returns [{ id: 1, name: "" }, { id: 2, name: "guardrail-artifact" }] and assert only the second artifact is inspected. This prevents the guard from regressing silently.

github-actions · 2026-06-01T03:09:03Z

+        runId: run.id,
+        runEffectiveTokens,
+        currentEffectiveTokens: totalEffectiveTokens,
+        threshold,


[/zoom-out] countedRunIds is re-serialised on every iteration of the inspection loop. With up to maxInspectableRuns runs, the emitted log volume grows O(n2) — the final log line could list dozens of IDs already reported in earlier iterations.

💡 Suggested fix

Log only the newly-added run ID and the current count rather than the full accumulated array each time:

logDailyGuardrail("Updated current ET state", { runId: run.id, runEffectiveTokens, currentEffectiveTokens: totalEffectiveTokens, threshold, countedRunCount: countedRuns.length, // count, not the full array });

The complete list is already captured in the final "Completed ET inspection window" log, so the per-iteration array is redundant.

github-actions · 2026-06-01T03:09:03Z

+    serializedDetails = JSON.stringify(details);
+  } catch {
+    serializedDetails = JSON.stringify({ error: "failed to serialize log details" });
+  }


[/improve-codebase-architecture] The catch block in formatDailyGuardrailLogMessage swallows the original serialisation error completely, making it hard to know why serialisation failed during debugging.

💡 Suggested improvement

} catch (err) { serializedDetails = JSON.stringify({ error: "failed to serialize log details", reason: err instanceof Error ? err.message : String(err), }); }

Adding reason preserves the intent of defensive logging while still surfacing enough context to diagnose circular-reference or BigInt serialisation failures.

github-actions · 2026-06-01T03:09:03Z

+    logDailyGuardrail("Received workflow runs page", {
+      page,
+      runCount: runs.length,
+      runIds: runs.map(run => run?.id).filter(Boolean),


[/zoom-out] runIds: runs.map(run => run?.id).filter(Boolean) can produce an array of up to 100 elements per paginated page. With MAX_WORKFLOW_RUN_PAGES pages this could generate megabytes of log output in a busy repo, and the IDs are already captured in the subsequent "Prepared candidate workflow runs" log.

💡 Suggested fix

Replace the full ID array with a count, matching the style used in nearby logs:

logDailyGuardrail("Received workflow runs page", { page, runCount: runs.length, // runIds omitted — captured in candidateRunIds after filtering });

If per-page IDs are genuinely useful, consider capping the array: runs.slice(0, 20).map(r => r?.id).

github-actions · 2026-06-01T03:09:03Z

+      currentEffectiveTokens: 789,
+    });
+    expect(exports.formatDailyGuardrailLogMessage("Completed ET inspection window")).toBe("[daily-workflow-et] Completed ET inspection window");
+  });


[/tdd] logDailyGuardrail is not exported and therefore cannot be directly unit-tested. The two new guard branches (item?.name && and !artifact.id) are also not covered. The single new test only verifies the formatter, leaving the runtime paths dark.

💡 Suggestions

Export logDailyGuardrail (or extract its core.info call behind a seam) so tests can assert it was called with the right message.

Add tests for the guard branches in a getRunEffectiveTokens-level test: one where artifact.name is "" and one where artifact.id is undefined, each asserting the function returns 0.

Split the two formatDailyGuardrailLogMessage assertions into separate it() blocks — one for the details path, one for the no-details path — so failure messages are unambiguous.

Copilot AI and others added 7 commits June 1, 2026 00:13

Add daily ET guardrail logging

a8e35e1

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Format guardrail logging test

444d873

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Harden guardrail logging test

5aeda1f

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Resolve guardrail logging review nits

c47ca96

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Harden daily ET logging helpers

9f4fccc

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Tighten guardrail artifact checks

7e37cb8

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Polish guardrail log fields

95f8426

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan Jun 1, 2026

Copilot created this pull request from a session on behalf of pelikhan June 1, 2026 01:29 View session

pelikhan marked this pull request as ready for review June 1, 2026 01:30

Copilot AI review requested due to automatic review settings June 1, 2026 01:30

Copilot started reviewing on behalf of pelikhan June 1, 2026 01:30 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 1, 2026

[Contribution Check Report] Contribution Check — 2026-06-01 #36165

Open

pelikhan merged commit 4f17e34 into main Jun 1, 2026
21 of 27 checks passed

pelikhan deleted the copilot/add-logging-for-query-insights branch June 1, 2026 03:01

github-actions Bot mentioned this pull request Jun 1, 2026

[aw] No-Op Runs #36156

Open

github-actions Bot reviewed Jun 1, 2026

View reviewed changes

github-actions Bot approved these changes Jun 1, 2026

View reviewed changes

github-actions Bot requested changes Jun 1, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 1, 2026

[blog] Weekly blog post – 2026-06-01 #36178

Merged

Conversation

Copilot AI commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 1, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 1, 2026 •

edited

Loading

github-actions Bot commented Jun 1, 2026 •

edited

Loading

github-actions Bot commented Jun 1, 2026 •

edited

Loading

github-actions Bot commented Jun 1, 2026 •

edited

Loading