feat: detect max-ai-credits guardrail exhaustion from firewall audit log by Copilot · Pull Request #37936 · github/gh-aw

Copilot · 2026-06-08T20:55:12Z

The conclusion job had no dedicated detection path for when the aw-harness aborts a session due to max-ai-credits budget exhaustion — distinct from the existing AWF proxy rate-limit rejection. Detection must come from the firewall audit log, not agent stderr.

Spec (`specs/aw-harness.md`)

§6.1.1: Adds max-ai-credits as a normative harness.budget key (was only in the non-normative example)
§11.2.2: Requires the harness to append a budget_exceeded audit entry to /tmp/gh-aw/sandbox/firewall/audit/log.jsonl with max_ai_credits_exceeded: true on hard-limit abort — uniform detection path regardless of which budget key is active
T-AW-003, §11.4 safeguard rejig docs #1, traceability matrix: updated to match

`ai_credits_context.cjs`

parseMaxAICreditsExceededFromAuditEntry(): detects max_ai_credits_exceeded: true field (snake_case and camelCase) or a budget_exceeded event with reason: "hard_limit" + forced_termination: true; top-level fields only to avoid false positives from nested provider responses
parseMaxAICreditsExceededFromAuditLog(): single-pass with content-guard fast-path
resolveAICreditsFailureState() now returns maxAICreditsExceeded

`handle_agent_failure.cjs`

maxAICreditsExceeded wired into: skip-failure guard, buildFailureMatchCategories() ("max_ai_credits_exceeded" dedup key), both issue and comment template contexts
Both aiCreditsRateLimitError (proxy rejection) and maxAICreditsExceeded (harness abort) are OR-ed when building the display context, so both detection paths render through the unified ai_credits_rate_limit_error.md template

Templates

ai_credits_rate_limit_error.md updated with generic language ("hit the configured max-ai-credits guardrail") covering both proxy rejection and harness abort, and now includes a tip to increase the max-ai-credits limit

Tests

14 new tests in ai_credits_context.test.cjs (field detection, event detection, negative cases, multi-entry log) and 4 in handle_agent_failure_max_ai_credits_exceeded.test.cjs (metrics table, missing data, no-overage, disabled flag).

…ll log - Update aw-harness.md spec: add max-ai-credits as valid harness.budget key (§6.1.1), require writing budget_exceeded audit entry to firewall log (§11.2.2), update T-AW-003 and §11.4 safeguard #1 - Add parseMaxAICreditsExceededFromAuditEntry/parseMaxAICreditsExceededFromAuditLog to ai_credits_context.cjs; update parseAuditLogCombined and resolveAICreditsFailureState to include maxAICreditsExceeded flag - Add buildMaxAICreditsExceededContext to handle_agent_failure.cjs; wire flag into failure categories, skip-condition, and both comment/issue template contexts - Add max_ai_credits_exceeded.md message template - Add {max_ai_credits_exceeded_context} to agent_failure_issue.md and agent_failure_comment.md - Add ai_credits_context.test.cjs (14 tests) and handle_agent_failure_max_ai_credits_exceeded.test.cjs (4 tests) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2026-06-08T20:59:09Z

@copilot

look for existing message template for "max-ai-credits" and unify
apply pr-finisher skill

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

…ror template Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-06-08T21:25:02Z

@copilot

look for existing message template for "max-ai-credits" and unify

apply pr-finisher skill

Done in the latest commit. The separate max_ai_credits_exceeded.md template is removed and unified with the existing ai_credits_rate_limit_error.md:

ai_credits_rate_limit_error.md updated to use generic language ("hit the configured max-ai-credits guardrail") covering both proxy rejection and harness abort, and gained the "Increase the limit" tip from the old template
buildMaxAICreditsExceededContext() removed; both aiCreditsRateLimitError and maxAICreditsExceeded are OR-ed when building the single context block
{max_ai_credits_exceeded_context} slot removed from agent_failure_issue.md and agent_failure_comment.md
Tests updated accordingly — all 338 JS test files pass

Copilot

Pull request overview

This PR adds a dedicated detection path for aw-harness “max-ai-credits” budget exhaustion by reading the firewall audit log (instead of agent stderr), and wires that signal into the conclusion job’s failure handling and messaging.

Changes:

Extend the harness spec to make harness.budget.max-ai-credits normative and require a budget_exceeded firewall audit log entry on hard-limit abort.
Add firewall audit-log parsing for max_ai_credits_exceeded / budget_exceeded hard-limit signals and surface a maxAICreditsExceeded failure state.
Wire the new failure state into handle_agent_failure and update the existing AI-credits guardrail template text; add targeted tests.

Show a summary per file

File	Description
specs/aw-harness.md	Updates normative budget keys and specifies firewall-audit-based detection requirements for hard-limit budget aborts.
actions/setup/md/ai_credits_rate_limit_error.md	Generalizes the user-facing message to cover both proxy rejection and harness abort, plus adds a configuration tip.
actions/setup/js/ai_credits_context.cjs	Adds `max_ai_credits_exceeded` detection (explicit field or `budget_exceeded` hard-limit event) and exposes it via `resolveAICreditsFailureState()`.
actions/setup/js/handle_agent_failure.cjs	Threads `maxAICreditsExceeded` into skip-logic, categorization, and AI-credits display context.
actions/setup/js/ai_credits_context.test.cjs	Adds unit tests covering positive/negative detection cases and multi-entry logs.
actions/setup/js/handle_agent_failure_max_ai_credits_exceeded.test.cjs	Adds tests verifying the rendered AI-credits guardrail messaging when the new condition is present.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 6/6 changed files
Comments generated: 2

- When effective tokens reach the **hard limit** (`max-effective-tokens`), the cost-tracker extension **MUST** abort the session immediately by invoking the session's abort API. The harness **MUST NOT** allow additional turns to proceed after the hard limit is reached.
+- When the budget metric reaches the **soft limit** (default: 80% of the configured limit), the cost-tracker extension **MUST** inject a steering message via `session.steer()` informing the agent that it is approaching the budget and **SHOULD** conclude its work soon.
+- When the budget metric reaches the **hard limit**, the cost-tracker extension **MUST** abort the session immediately by invoking the session's abort API. The harness **MUST NOT** allow additional turns to proceed after the hard limit is reached.
 - Upon hard-limit abort, the harness **MUST** emit a `budget_exceeded` JSONL event to stderr containing the final cumulative token count and the configured limit.


pelikhan · 2026-06-08T21:34:10Z

@copilot apply pr finisher skill

github-actions · 2026-06-08T21:42:18Z

🧠 Matt Pocock Skills Reviewer failed during the skills-based review.

github-actions · 2026-06-08T21:42:29Z

⚠️ PR Code Quality Reviewer failed during code quality review.

github-actions · 2026-06-08T21:42:34Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-06-08T21:42:53Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #37936 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100). Neither Condition A nor Condition B is met.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-06-08T21:51:37Z

@copilot apply pr finisher skill

Applied the pr-finisher pass and pushed 806099f with the follow-up fixes:

removed the unused parseMaxAICreditsExceededFromAuditLog import in handle_agent_failure.cjs
updated §11.2.2 wording to be metric-agnostic (final cumulative budget metric value)

Also re-ran local validation (make fmt, make lint, make test-unit, make test, make recompile).

github-actions · 2026-06-08T21:53:27Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 82/100 — Excellent

Analyzed 18 test(s) across 2 new JavaScript (vitest) test files: 18 design tests (behavioral contracts), 0 implementation tests, 0 coding-guideline violations.

📊 Metrics & Test Classification (18 tests analyzed)

Metric	Value
New/modified tests analyzed	18
✅ Design tests (behavioral contracts)	18 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	13 (72%)
Duplicate test clusters	0
Test inflation detected	⚠️ Yes — both test files exceed 2:1 ratio vs production lines added
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Notes
`detects max_ai_credits_exceeded: true field`	`ai_credits_context.test.cjs`	✅ Design	Happy path — verifies snake_case field detection
`detects camelCase maxAiCreditsExceeded: true field`	`ai_credits_context.test.cjs`	✅ Design	Happy path — verifies camelCase variant
`detects budget_exceeded event with hard_limit reason and forced_termination`	`ai_credits_context.test.cjs`	✅ Design	Happy path — event-based signal
`does not detect budget_exceeded event without forced_termination`	`ai_credits_context.test.cjs`	✅ Design	Edge case — required field absent
`does not detect budget_exceeded event without hard_limit reason`	`ai_credits_context.test.cjs`	✅ Design	Edge case — wrong reason value
`returns false when no matching signal is present`	`ai_credits_context.test.cjs`	✅ Design	Edge case — no relevant signals in log
`returns false for missing audit log`	`ai_credits_context.test.cjs`	✅ Design	Edge case — error path (no file)
`returns false for empty audit log`	`ai_credits_context.test.cjs`	✅ Design	Edge case — empty file input
`detects signal anywhere in a multi-entry log`	`ai_credits_context.test.cjs`	✅ Design	Edge case — signal not on first line
`does not detect ai_credits_rate_limit_error as max_ai_credits_exceeded`	`ai_credits_context.test.cjs`	✅ Design	Edge case — boundary between two signal types
`returns maxAICreditsExceeded: true when budget_exceeded event is present`	`ai_credits_context.test.cjs`	✅ Design	Happy path — state object field
`returns maxAICreditsExceeded: false when no signal is present`	`ai_credits_context.test.cjs`	✅ Design	Edge case — absent signal
`maxAICreditsExceeded is independent of aiCreditsRateLimitError`	`ai_credits_context.test.cjs`	✅ Design	Edge case — flag independence boundary
`both flags can be true simultaneously if both signals are present`	`ai_credits_context.test.cjs`	✅ Design	Edge case — combined-state boundary
`shows budget exhaustion message with usage, limit, and overage details`	`handle_agent_failure_max_ai_credits_exceeded.test.cjs`	✅ Design	Happy path — template rendering (9 assertions)
`shows message without metrics rows when no credit data is available`	`handle_agent_failure_max_ai_credits_exceeded.test.cjs`	✅ Design	Edge case — empty string inputs
`does not show overage row when usage does not exceed limit`	`handle_agent_failure_max_ai_credits_exceeded.test.cjs`	✅ Design	Edge case — usage ≤ limit boundary
`returns empty string when max_ai_credits_exceeded is false`	`handle_agent_failure_max_ai_credits_exceeded.test.cjs`	✅ Design	Edge case — false-flag early return

Language Support

Tests analyzed:

🐹 Go (*_test.go): 0 tests — no Go test files changed
🟨 JavaScript (*.test.cjs, *.test.js): 18 tests (vitest)

i️ Test Inflation Notice (non-blocking)

Both new test files exceed the 2:1 lines-added ratio vs their corresponding production files:

Test file	Test lines added	Production lines added	Ratio
`ai_credits_context.test.cjs`	134	53 (`ai_credits_context.cjs`)	2.53:1
`handle_agent_failure_max_ai_credits_exceeded.test.cjs`	58	9 (`handle_agent_failure.cjs`)	6.44:1

Context: The handle_agent_failure.cjs diff is small (9 lines) because it only wires a call to buildAICreditsRateLimitErrorContext; the heavy logic lives in ai_credits_context.cjs. The test file comprehensively covers the template rendering contract. The inflation penalty (−10 pts) is applied mechanically per the rubric, but the tests themselves are high-quality behavioral contracts — not filler.

Suggestion: No action required. The tests justify their size. If desired, future PRs could add JSONL parse-error tests (e.g., malformed JSON lines) to cover production-code defensive paths not yet exercised by any .toThrow/.rejects assertions.

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). No coding-guideline violations detected.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §27168015427

🧪 Test quality analysis by Test Quality Sentinel · 283.7 AIC · ⌖ 28.2 AIC · ◷

github-actions

✅ Test Quality Sentinel: 82/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 18 new tests are behavioral design tests with strong edge-case coverage. No coding-guideline violations detected.

pelikhan · 2026-06-08T21:53:52Z

@copilot lint js and typecheck

https://github.com/github/gh-aw/actions/runs/27169324203/job/80204457725?pr=37936

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-06-08T22:02:52Z

@copilot lint js and typecheck

https://github.com/github/gh-aw/actions/runs/27169324203/job/80204457725?pr=37936

Addressed in b599c54. I pulled the failing CJS job log, fixed the JS typecheck error in actions/setup/js/ai_credits_context.cjs, and re-ran both checks locally:

make lint-cjs ✅
cd actions/setup/js && npm run typecheck ✅

I also ran the targeted tests for this area:

npm run test:js -- ai_credits_context.test.cjs handle_agent_failure_max_ai_credits_exceeded.test.cjs ✅

github-actions · 2026-06-08T22:03:44Z

@copilot review all comments and address unresolved review feedback.

Generated by 👨‍🍳 PR Sous Chef · 54.6 AIC · ⌖ 0.987 AIC · ⊞ 17.1K · ◷

github-actions · 2026-06-08T22:03:45Z

Please summarize the remaining blockers and rerun the requested lint/typecheck checks after the review fix.

Generated by 👨‍🍳 PR Sous Chef · 54.6 AIC · ⌖ 0.987 AIC · ⊞ 17.1K · ◷

Copilot AI assigned Copilot and pelikhan Jun 8, 2026

Copilot AI changed the title ~~feat: implement MaxAI credit exceeded guardrail detection from firewall log~~ feat: detect max-ai-credits guardrail exhaustion from firewall audit log Jun 8, 2026

Copilot finished work on behalf of pelikhan June 8, 2026 20:55

Copilot created this pull request from a session on behalf of pelikhan June 8, 2026 20:55 View session

Copilot AI requested a review from pelikhan June 8, 2026 20:55

Copilot started work on behalf of pelikhan June 8, 2026 20:59 View session

Copilot AI and others added 2 commits June 8, 2026 21:23

Update; rm -rf /

52718d7

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

refactor: unify max_ai_credits_exceeded into ai_credits_rate_limit_er…

35e966d

…ror template Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan June 8, 2026 21:25

pelikhan marked this pull request as ready for review June 8, 2026 21:27

Copilot AI review requested due to automatic review settings June 8, 2026 21:27

Copilot started reviewing on behalf of pelikhan June 8, 2026 21:27 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Copilot started work on behalf of pelikhan June 8, 2026 21:34 View session

github-actions Bot mentioned this pull request Jun 8, 2026

[aw] No-Op Runs #37899

Closed

fix: address review nits for max-ai-credits follow-up

806099f

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan June 8, 2026 21:52

Merge branch 'main' into copilot/implement-maxai-credit-guardrail

254d4b2

github-actions Bot approved these changes Jun 8, 2026

View reviewed changes

Copilot started work on behalf of pelikhan June 8, 2026 21:54 View session

fix: resolve js typecheck error in ai credits audit parser

b599c54

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan June 8, 2026 22:03

pelikhan merged commit b80f577 into main Jun 8, 2026
12 checks passed

pelikhan deleted the copilot/implement-maxai-credit-guardrail branch June 8, 2026 22:15

Conversation

Copilot AI commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Spec (specs/aw-harness.md)

ai_credits_context.cjs

handle_agent_failure.cjs

Templates

Tests

Uh oh!

pelikhan commented Jun 8, 2026

Uh oh!

Copilot AI commented Jun 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

pelikhan commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pelikhan commented Jun 8, 2026

Uh oh!

Copilot AI commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 8, 2026 •

edited

Loading

Spec (`specs/aw-harness.md`)

`ai_credits_context.cjs`

`handle_agent_failure.cjs`

github-actions Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading