Skip to content

Add structured diagnostics to the daily workflow ET guardrail#36164

Merged
pelikhan merged 7 commits into
mainfrom
copilot/add-logging-for-query-insights
Jun 1, 2026
Merged

Add structured diagnostics to the daily workflow ET guardrail#36164
pelikhan merged 7 commits into
mainfrom
copilot/add-logging-for-query-insights

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jun 1, 2026

The daily effective-token guardrail did not emit enough runtime detail to explain what it queried or why a run was counted or skipped. This change adds structured logs around workflow run discovery, artifact selection/download, and ET accumulation so workflow output is debuggable without re-running blindly.

  • What changed

    • Added a small structured logging helper for the daily ET guardrail with a stable [daily-workflow-et] prefix.
    • Logged the resolved execution context up front: workflow, actor, threshold, and API budget.
    • Logged workflow run pagination and the run IDs returned for each query page.
    • Logged candidate run selection before artifact inspection, including truncation due to rate-limit headroom.
    • Logged artifact enumeration, chosen artifact ID/name, download location, resolved token-usage file, and computed ET per run.
    • Logged ET accumulation state after each counted run and the final inspection summary.
  • Guardrails in the logging

    • Made log serialization defensive so diagnostics do not fail the guardrail path.
    • Tightened malformed-artifact handling to avoid ambiguous logs when artifact metadata is incomplete.
    • Kept the new output additive; guardrail decisions and summary behavior remain unchanged.
  • Tests

    • Updated the JS test coverage to assert the structured log message format without relying on brittle string ordering.
  • Example

    logDailyGuardrail("Updated current ET state", {
      runId: run.id,
      runEffectiveTokens,
      currentEffectiveTokens: totalEffectiveTokens,
      threshold,
      countedRunIds: countedRuns.map(item => item.id),
    });

Copilot AI and others added 7 commits June 1, 2026 00:13
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan marked this pull request as ready for review June 1, 2026 01:30
Copilot AI review requested due to automatic review settings June 1, 2026 01:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves debuggability of the daily effective-token (ET) workflow guardrail by adding consistently prefixed, structured diagnostics across workflow-run discovery, artifact selection/download, and per-run ET accumulation, plus a test to validate the log format.

Changes:

  • Added formatDailyGuardrailLogMessage / logDailyGuardrail helpers and emitted structured log lines at key inspection steps.
  • Hardened artifact selection by skipping artifacts missing required metadata (e.g., missing id) and logging the reason.
  • Updated JS tests to assert the structured log format in a stable way (prefix + JSON payload).
Show a summary per file
File Description
actions/setup/js/check_daily_effective_workflow_guardrail.cjs Adds structured diagnostic logging around run pagination, artifact discovery/download, and ET accumulation.
actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs Adds coverage for the structured log message formatter.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 3

Comment on lines +98 to +103
const artifactSummaries = artifacts.map(item => ({ id: item?.id ?? null, name: item?.name || "" }));
logDailyGuardrail("Listed workflow artifacts", {
runId,
artifactCount: artifacts.length,
artifacts: artifactSummaries,
});
Comment on lines +360 to +364
logDailyGuardrail("Received workflow runs page", {
page,
runCount: runs.length,
runIds: runs.map(run => run?.id).filter(Boolean),
});
Comment on lines +387 to +392
logDailyGuardrail("Prepared candidate workflow runs for artifact inspection", {
candidateRunsCount: candidateRuns.length,
candidateRunIds: candidateRuns.map(run => run.id),
maxInspectableRuns,
truncatedByRateLimit,
});
@pelikhan pelikhan merged commit 4f17e34 into main Jun 1, 2026
21 of 27 checks passed
@pelikhan pelikhan deleted the copilot/add-logging-for-query-insights branch June 1, 2026 03:01
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #36164 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100). Neither enforcement condition is met.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions github-actions Bot mentioned this pull request Jun 1, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 754.1K

logDailyGuardrail("Updated current ET state", {
runId: run.id,
runEffectiveTokens,
currentEffectiveTokens: totalEffectiveTokens,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O(n2) log output inside the per-run loop: countedRunIds: countedRuns.map(item => item.id) remaps the full (and ever-growing) countedRuns array on every iteration, producing quadratic work and log volume as counted runs accumulate.

💡 Suggested fix

Log only the most recently added run ID, not the full list on each iteration:

logDailyGuardrail("Updated current ET state", {
  runId: run.id,
  runEffectiveTokens,
  currentEffectiveTokens: totalEffectiveTokens,
  threshold,
  countedRunsCount: countedRuns.length,  // just the count
});

The full countedRunIds list is already emitted once at the end in the "Completed ET inspection window" log, so repeating it on every iteration adds no diagnostic value and compounds the log-volume concern already flagged in existing review threads.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 100/100 — Excellent

Analyzed 7 test(s): 7 design, 0 implementation, 0 guideline violation(s).

📊 Metrics & Test Classification (7 tests analyzed)
Metric Value
New/modified tests analyzed 7
✅ Design tests (behavioral contracts) 7 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 6 (86%)
Duplicate test clusters 0
Test inflation detected No (16 test lines added vs 116 production lines — ratio 0.14)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
skips workflow_call, repository_dispatch, and workflow_dispatch with aw_context actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs ✅ Design
matches both firewall-audit-logs and unified agent artifacts actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs ✅ Design
sums effective tokens from explicit token-usage entries actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs ✅ Design
computes aggregate ET statistics for prior runs actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs ✅ Design
caps inspection when GitHub API rate limit headroom is low actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs ✅ Design
formats structured daily ET log messages actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs ✅ Design
renders a daily ET details summary with stats and prior runs actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs ✅ Design

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 0 tests
  • 🟨 JavaScript (*.test.cjs, *.test.js): 7 tests (vitest)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All 7 tests verify observable behavioral contracts with strong edge-case and boundary coverage.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.1M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%).

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /zoom-out — requesting changes on test-coverage gaps for the two bundled behaviour changes.

📋 Key Themes & Highlights

Key Themes

  • Behaviour changes without tests: Two silent-return guards (item?.name && and !artifact.id) alter existing control flow but have no test coverage. These are the blocking items.
  • Log verbosity at scale: runIds per-page (up to 100 IDs × multiple pages) and countedRunIds per-iteration (growing array) could generate unexpectedly large log output in high-volume repos.
  • Error context lost in catch: The formatter's catch block discards the original serialisation error, making future debugging harder.

Positive Highlights

  • ✅ Stable [daily-workflow-et] prefix makes log lines grep-friendly and easy to filter.
  • ✅ Defensive try/catch around JSON.stringify is exactly right — diagnostics must not break the guardrail path.
  • formatDailyGuardrailLogMessage is correctly exported and unit-tested; the test's JSON.parse assertion is a clean way to verify structured output without brittle string matching.
  • ✅ Structured log at the final "Completed ET inspection window" gives a complete picture in one place.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.8M

}
if (!artifact.id) {
logDailyGuardrail("Skipping guardrail artifact without an id", {
runId,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The new artifact.id null-guard is a behaviour change with no test coverage — an artifact whose name matches but whose id is falsy will silently yield 0 ET instead of proceeding to downloadArtifact.

💡 Suggested test

Add a unit test that stubs listArtifacts to return [{ id: undefined, name: "guardrail-artifact" }] and asserts that getRunEffectiveTokens returns 0 and emits the "Skipping guardrail artifact without an id" log message.

This pins both the guard and the new structured log emitted on that path.

});

const artifact = artifacts.find(item => matchesGuardrailArtifactName(item.name));
const artifact = artifacts.find(item => item?.name && matchesGuardrailArtifactName(item.name));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Changing the predicate from matchesGuardrailArtifactName(item.name) to item?.name && matchesGuardrailArtifactName(item.name) is a behaviour change bundled into logging work — artifacts with a falsy name are now silently skipped. No test covers this path.

💡 Suggested test

Add a case where listArtifacts returns [{ id: 1, name: "" }, { id: 2, name: "guardrail-artifact" }] and assert only the second artifact is inspected. This prevents the guard from regressing silently.

runId: run.id,
runEffectiveTokens,
currentEffectiveTokens: totalEffectiveTokens,
threshold,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] countedRunIds is re-serialised on every iteration of the inspection loop. With up to maxInspectableRuns runs, the emitted log volume grows O(n2) — the final log line could list dozens of IDs already reported in earlier iterations.

💡 Suggested fix

Log only the newly-added run ID and the current count rather than the full accumulated array each time:

logDailyGuardrail("Updated current ET state", {
  runId: run.id,
  runEffectiveTokens,
  currentEffectiveTokens: totalEffectiveTokens,
  threshold,
  countedRunCount: countedRuns.length,  // count, not the full array
});

The complete list is already captured in the final "Completed ET inspection window" log, so the per-iteration array is redundant.

serializedDetails = JSON.stringify(details);
} catch {
serializedDetails = JSON.stringify({ error: "failed to serialize log details" });
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] The catch block in formatDailyGuardrailLogMessage swallows the original serialisation error completely, making it hard to know why serialisation failed during debugging.

💡 Suggested improvement
} catch (err) {
  serializedDetails = JSON.stringify({
    error: "failed to serialize log details",
    reason: err instanceof Error ? err.message : String(err),
  });
}

Adding reason preserves the intent of defensive logging while still surfacing enough context to diagnose circular-reference or BigInt serialisation failures.

logDailyGuardrail("Received workflow runs page", {
page,
runCount: runs.length,
runIds: runs.map(run => run?.id).filter(Boolean),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] runIds: runs.map(run => run?.id).filter(Boolean) can produce an array of up to 100 elements per paginated page. With MAX_WORKFLOW_RUN_PAGES pages this could generate megabytes of log output in a busy repo, and the IDs are already captured in the subsequent "Prepared candidate workflow runs" log.

💡 Suggested fix

Replace the full ID array with a count, matching the style used in nearby logs:

logDailyGuardrail("Received workflow runs page", {
  page,
  runCount: runs.length,
  // runIds omitted — captured in candidateRunIds after filtering
});

If per-page IDs are genuinely useful, consider capping the array: runs.slice(0, 20).map(r => r?.id).

currentEffectiveTokens: 789,
});
expect(exports.formatDailyGuardrailLogMessage("Completed ET inspection window")).toBe("[daily-workflow-et] Completed ET inspection window");
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] logDailyGuardrail is not exported and therefore cannot be directly unit-tested. The two new guard branches (item?.name && and !artifact.id) are also not covered. The single new test only verifies the formatter, leaving the runtime paths dark.

💡 Suggestions
  1. Export logDailyGuardrail (or extract its core.info call behind a seam) so tests can assert it was called with the right message.
  2. Add tests for the guard branches in a getRunEffectiveTokens-level test: one where artifact.name is "" and one where artifact.id is undefined, each asserting the function returns 0.
  3. Split the two formatDailyGuardrailLogMessage assertions into separate it() blocks — one for the details path, one for the no-details path — so failure messages are unambiguous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants