Skip to content

fix: daily ET guardrail step never fails activation job; improve conclusion reporting#36497

Merged
pelikhan merged 5 commits into
mainfrom
copilot/review-js-check-daily-tokens
Jun 2, 2026
Merged

fix: daily ET guardrail step never fails activation job; improve conclusion reporting#36497
pelikhan merged 5 commits into
mainfrom
copilot/review-js-check-daily-tokens

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jun 2, 2026

The check_daily_effective_workflow_guardrail step had no top-level error handling, so any transient GitHub API failure would throw an unhandled exception, fail the activation job, and leave daily_effective_workflow_exceeded=false — letting the agent run against a broken guardrail. Separately, agent_failure_comment.md was missing {daily_effective_workflow_exceeded_context}, so repeat-failure comments had no ET guidance, and the exceeded template lacked actionable help.

Changes

  • check_daily_effective_workflow_guardrail.cjs — wrap all GitHub API calls after the initial guard checks in a try/catch; on error, emit a warning and return with daily_effective_workflow_exceeded=false so the step always exits cleanly. Also documents the bypass-on-error contract in the JSDoc.

  • daily_effective_workflow_exceeded.md — replace the single-paragraph template with three <details> progressive-disclosure sections: How to raise the daily limit, What the guardrail is, and How to disable it, each with concrete frontmatter snippets and CLI commands.

  • agent_failure_comment.md — add the missing {daily_effective_workflow_exceeded_context} placeholder after {stale_lock_file_failed_context} so repeat-failure comments include ET guidance, not just newly created issue bodies.

  • Tests — add a main() error-recovery test asserting the step resolves without throwing when API calls fail; update template assertions to cover the three new <details> headings.

Copilot AI and others added 2 commits June 2, 2026 17:20
…prove conclusion reporting

- Wrap all GitHub API calls in check_daily_effective_workflow_guardrail.cjs in a
  top-level try-catch so transient API errors never fail the activation job step.
  On error the step logs a warning and exits cleanly with the default
  daily_effective_workflow_exceeded=false output, letting the agent run rather than
  producing a confusing activation failure.
- Enhance daily_effective_workflow_exceeded.md with progressive disclosure:
  three <details> sections covering how to raise the daily limit, what the
  guardrail is, and how to disable it entirely.
- Add {daily_effective_workflow_exceeded_context} placeholder to
  agent_failure_comment.md so repeat-failure comments include the ET context
  (previously the context was only present in new-issue bodies).
- Update check_daily_effective_workflow_guardrail.test.cjs with a test asserting
  main() resolves (does not throw) when GitHub API calls fail.
- Update handle_agent_failure_daily_effective_workflow.test.cjs to assert the
  three progressive-disclosure sections are present in the rendered template.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title fix: ensure daily ET guardrail step never fails activation job and improve conclusion reporting fix: daily ET guardrail step never fails activation job; improve conclusion reporting Jun 2, 2026
Copilot AI requested a review from pelikhan June 2, 2026 17:22
@pelikhan pelikhan marked this pull request as ready for review June 2, 2026 17:24
Copilot AI review requested due to automatic review settings June 2, 2026 17:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the daily per-workflow effective-token (ET) guardrail step so transient GitHub API failures no longer fail the activation job, and improves the user-facing messaging when the guardrail is exceeded (including ensuring repeat-failure comments include the same guidance).

Changes:

  • Wrap the guardrail’s GitHub API interactions in a top-level try/catch so the activation job step exits cleanly on API errors and preserves the safe default (daily_effective_workflow_exceeded=false).
  • Expand the “daily ET guardrail exceeded” markdown into actionable, progressive-disclosure guidance with concrete frontmatter and CLI commands.
  • Add coverage for the new markdown structure and for the guardrail’s error-recovery behavior; include the guardrail context placeholder in repeat-failure comments.
Show a summary per file
File Description
actions/setup/js/check_daily_effective_workflow_guardrail.cjs Adds top-level error recovery around GitHub API calls and documents the bypass-on-error contract.
actions/setup/md/daily_effective_workflow_exceeded.md Replaces the single-paragraph message with <details> sections explaining raise/disable/what-it-is.
actions/setup/md/agent_failure_comment.md Adds {daily_effective_workflow_exceeded_context} so repeat-failure comments include ET guidance.
actions/setup/js/handle_agent_failure_daily_effective_workflow.test.cjs Updates assertions to cover the new progressive-disclosure headings/content.
actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs Adds a test ensuring main() resolves (doesn’t throw) when GitHub API calls fail.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 1

Comment on lines +322 to +325
const workflowID = process.env.GH_AW_WORKFLOW_ID || "";
const workflowName = process.env.GH_AW_WORKFLOW_NAME || workflowID || "workflow";
const runUrl = process.env.GH_AW_RUN_URL || currentRun.data.html_url || "";
const actorLogin = process.env.GITHUB_TRIGGERING_ACTOR || currentRun.data.triggering_actor?.login || currentRun.data.actor?.login || process.env.GITHUB_ACTOR || "";
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #36497 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (requires_adr_by_default_volume=false, threshold=100). Neither enforcement condition is met.

@github-actions github-actions Bot mentioned this pull request Jun 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 91/100 — Excellent

Analyzed 10 test(s): 10 design (behavioral contracts), 0 implementation, 0 guideline violations.

📊 Metrics & Test Classification (10 tests analyzed)
Metric Value
New/modified tests analyzed 10
✅ Design tests (behavioral contracts) 10 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 7 (70%)
Duplicate test clusters 0
Test inflation detected No (test +57 lines / prod +157 lines = 0.36 ratio)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
skips workflow_call, repository_dispatch, and workflow_dispatch with aw_context check_daily_effective_workflow_guardrail.test.cjs ✅ Design Covers both true and false return paths
matches both firewall-audit-logs and unified agent artifacts check_daily_effective_workflow_guardrail.test.cjs ✅ Design Verifies both matching and non-matching artifact names
sums effective tokens from explicit token-usage entries check_daily_effective_workflow_guardrail.test.cjs ✅ Design Asserts on aggregated return value
computes aggregate ET statistics for prior runs check_daily_effective_workflow_guardrail.test.cjs ✅ Design Verifies full stats object shape
caps inspection when GitHub API rate limit headroom is low check_daily_effective_workflow_guardrail.test.cjs ✅ Design Tests boundary values (110 vs 120 headroom)
formats structured daily ET log messages check_daily_effective_workflow_guardrail.test.cjs ✅ Design Verifies both with-context and no-context output
renders a daily ET details summary with stats and prior runs check_daily_effective_workflow_guardrail.test.cjs ✅ Design Exercises truncatedByRateLimit edge case
main() does not fail the step when GitHub API calls throw check_daily_effective_workflow_guardrail.test.cjs ✅ Design Core error-resilience contract: step must not throw; output stays "false"; warning is emitted
renders the daily workflow ET guardrail context when exceeded handle_agent_failure_daily_effective_workflow.test.cjs ✅ Design Verifies rendered markdown content when guardrail triggers
returns empty string when the guardrail did not trigger handle_agent_failure_daily_effective_workflow.test.cjs ✅ Design Edge case: no-op when exceeded = false

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 0 tests
  • 🟨 JavaScript (*.test.cjs): 10 tests (vitest)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). The new test for main() error resilience directly validates the behavioral contract described in the PR title ("guardrail step never fails activation job").

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.2M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 91/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 10 new tests verify behavioral contracts, including the critical error-resilience contract for main() that directly corresponds to the fix in this PR.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — approving with minor suggestions on test structure and observability.

📋 Key Themes & Highlights

Key Themes

  • Test assertions in try block: The new regression test puts its expect() calls inside the try, so if the first assertion throws, subsequent ones are skipped silently. Minor structural fix recommended.
  • Missing "exceeded=true" regression test: The big refactor (wrapping all of main() in a single outer try/catch) isn't validated from the happy path — a future logic bug could silently prevent daily_effective_workflow_exceeded from being set to "true" without a test catching it.
  • Permanent vs. transient bypass: The outer catch correctly swallows all errors by design, but persistent configuration errors (bad token, missing permission) will bypass the guardrail on every run with only a job-log warning as signal.

Positive Highlights

  • ✅ Root cause correctly identified: absence of outer try/catch was the real issue, not any individual API call
  • ✅ The safe-bypass contract (fail → false, not fail → job failure) is clearly documented in both the JSDoc and inline comments
  • ✅ The agent_failure_comment.md one-liner fix is exactly right — no overengineering
  • daily_effective_workflow_exceeded.md upgrade is excellent: progressive disclosure, concrete YAML snippets, caution callout for the disable path
  • ✅ Template regression tests updated in lockstep with the template changes

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.7M


try {
// Should resolve without throwing even though the API calls throw
await expect(exports.main()).resolves.toBeUndefined();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Assertions inside the try block mean that if the first expect throws, later assertions are silently skipped, reducing test confidence.

💡 Suggested refactor

Move the await expect(...) call outside the try/finally (or keep the try/finally only for cleanup), so Jest sees all assertion failures independently:

let threw = false;
try {
  await exports.main();
} catch (err) {
  threw = true;
} finally {
  delete global.core;
  delete global.github;
  delete global.context;
  delete process.env.GH_AW_MAX_DAILY_EFFECTIVE_TOKENS;
  delete process.env.GH_AW_GITHUB_TOKEN;
}
expect(threw).toBe(false);
expect(coreOutputs["daily_effective_workflow_exceeded"]).toBe("false");
expect(coreWarnings.some(w => /unexpected error.*skipped/i.test(w))).toBe(true);

This way all three assertions are always evaluated regardless of which one fails.

delete process.env.GH_AW_GITHUB_TOKEN;
}
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The refactor wraps the entire success path in the outer try/catch too, but there's no regression test verifying that a normal execution where the threshold is exceeded still sets daily_effective_workflow_exceeded to "true". A logic bug inside the try block (e.g., an early return or swallowed error) could silently prevent the output from being set without any test catching it.

💡 What to add

Add a companion test that mocks a successful API response returning runs whose total ET exceeds the threshold, and asserts:

it("main() sets daily_effective_workflow_exceeded=true when threshold is exceeded", async () => {
  // ... mock APIs returning one run with runEffectiveTokens > threshold ...
  await exports.main();
  expect(coreOutputs["daily_effective_workflow_exceeded"]).toBe("true");
});

This guards against the outer catch accidentally consuming an exception that would have set the output.

core.info(`Daily workflow ET guardrail not exceeded (${totalEffectiveTokens}/${threshold}).`);
return;
core.warning(`Daily workflow ET guardrail exceeded for ${workflowName}: ${totalEffectiveTokens}/${threshold}.`);
} catch (error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The outer catch treats all errors — including permanent ones like missing permissions or misconfigured tokens — the same as transient network blips. A persistent configuration error will silently bypass the guardrail on every run, with only a core.warning in the job log as the signal.

💡 Suggestion

The current design is explicitly documented as intentional (safe bypass > broken activation), so this is not a blocker. However, consider adding a core.error() call alongside the warning for non-retryable error classes, or emitting a core.summary line so the bypass is visible in the Actions run summary rather than only in the raw log:

} catch (error) {
  core.warning(`Daily workflow ET guardrail encountered an unexpected error and will be skipped: ${getErrorMessage(error)}`);
  // Make the bypass visible in the run summary so operators notice persistent failures
  await core.summary.addRaw(`> [!WARNING]\n> ET guardrail skipped due to error: ${getErrorMessage(error)}`).write().catch(() => {});
}

This preserves the safe-bypass contract while making a stuck guardrail discoverable without digging into logs.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is sound — wrapping GitHub API calls in a try-catch so that transient failures never block the activation job is the right call, and the comments explaining the "safe bypass" intent are clear. The inline comments flag the most actionable issues; no blocking correctness bugs found.

### Review themes

Test mock completeness (medium)mockCore is missing error/debug/notice stubs. A missing stub causes a TypeError that gets silently caught by the very try-catch under test, producing a warning that matches the regex and making the test pass vacuously. See inline comment on line 133.

Global state cleanup (medium) — The finally block uses delete rather than save-and-restore, which is fragile if beforeEach ever starts populating these globals. See inline comment on line 172.

Observability gap (low)daily_effective_workflow_exceeded=false is ambiguous: it could mean "under budget" or "evaluation errored and was skipped". A dedicated daily_effective_workflow_guardrail_error output would make cost audits and dashboards significantly more reliable. See inline comment on catch block.

Coverage gap (low) — There is no test for the case where an exception is thrown after core.setOutput("daily_effective_workflow_exceeded", "true") has already been set (i.e., appendDailyEffectiveWorkflowSummary throws on the exceeded path). The behavior is actually correct — exceeded stays "true" and the warning fires — but this scenario is worth an explicit test since it represents the most operationally significant partial-state.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 327.1K

// The step should catch the error and NOT rethrow it, keeping daily_effective_workflow_exceeded at "false".
const coreOutputs = {};
const coreWarnings = [];
const mockCore = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete mockCore stub may cause vacuously-passing tests. mockCore is missing error, debug, and notice methods. If any production code path inside the try block calls one of them, the TypeError: core.xxx is not a function will be caught by the outer catch and produce a warning like "Daily workflow ET guardrail encountered an unexpected error and will be skipped: core.error is not a function" — which matches the regex /unexpected error.*skipped/i. The test then passes while hiding the actual regression.

💡 Suggested fix

Add no-op stubs for the missing methods:

const mockCore = {
  setOutput: (key, value) => { coreOutputs[key] = value; },
  info: () => {},
  warning: msg => coreWarnings.push(msg),
  error: () => {},      // prevent silent swallow of TypeError
  debug: () => {},
  notice: () => {},
};

This ensures that any accidental call to these methods causes a type-safe no-op rather than a silently-caught TypeError that fools the regex assertion.

// A warning must be emitted describing the error
expect(coreWarnings.some(w => /unexpected error.*skipped/i.test(w))).toBe(true);
} finally {
delete global.core;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global state deleted rather than restored — fragile isolation. delete global.core permanently removes the property rather than restoring whatever was there before. If a future beforeEach or test-helper ever pre-populates these globals, this finally block will silently trash them for subsequent tests, causing confusing cross-test contamination.

💡 Suggested fix

Save and restore:

const savedCore = global.core;
const savedGithub = global.github;
const savedContext = global.context;

global.core = mockCore;
global.github = mockGithub;
global.context = mockContext;

try {
  // assertions...
} finally {
  if (savedCore === undefined) delete global.core; else global.core = savedCore;
  if (savedGithub === undefined) delete global.github; else global.github = savedGithub;
  if (savedContext === undefined) delete global.context; else global.context = savedContext;
  delete process.env.GH_AW_MAX_DAILY_EFFECTIVE_TOKENS;
  delete process.env.GH_AW_GITHUB_TOKEN;
}

Alternatively, use afterEach for cleanup so Vitest handles the lifecycle correctly.

// Treat any unexpected error as a non-blocking skip so the step never fails the
// activation job. The output stays at the default "false", allowing the agent to
// run. The guardrail is effectively bypassed for this invocation.
core.warning(`Daily workflow ET guardrail encountered an unexpected error and will be skipped: ${getErrorMessage(error)}`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent bypass is indistinguishable from a clean pass in workflow outputs. When the catch fires, daily_effective_workflow_exceeded stays at "false" — the same value it has when the guardrail genuinely evaluated to "not exceeded". Any downstream observability (dashboards, audits) that reads this output cannot tell whether the guardrail passed or failed silently. The only signal is a transient ::warning:: in the step log that is easily missed.

💡 Suggested improvement

Consider adding a dedicated output that makes the skip explicit:

} catch (error) {
  core.warning(`Daily workflow ET guardrail encountered an unexpected error and will be skipped: ${getErrorMessage(error)}`);
  core.setOutput("daily_effective_workflow_guardrail_error", "true"); // new
}

Downstream steps and dashboards can then distinguish three states:

  • exceeded=true → guardrail fired, block the agent
  • exceeded=false, guardrail_error=false → evaluated cleanly, under budget
  • exceeded=false, guardrail_error=true → evaluation failed, agent allowed as safe fallback

This is especially important for post-incident cost audits where "did the guardrail skip silently?" is a key question.

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Jun 2, 2026

@copilot review comments and review comments. apply copilot-review skill

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Jun 2, 2026

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jun 2, 2026

@copilot lint js and fix https://github.com/github/gh-aw/actions/runs/26838723127/job/79139596655?pr=36497

Fixed in f7e508b. I formatted actions/setup/js/check_daily_effective_workflow_guardrail.test.cjs with Prettier and re-ran npm run lint:cjs and the targeted guardrail test successfully.

@pelikhan pelikhan merged commit 220e424 into main Jun 2, 2026
12 checks passed
@pelikhan pelikhan deleted the copilot/review-js-check-daily-tokens branch June 2, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants