Stop harness retries on AI-credit exhaustion and AWF proxy blocking signals by Copilot · Pull Request #38018 · github/gh-aw

Copilot · 2026-06-09T04:02:29Z

Harness retries were treating AI-credit guardrail failures and AWF API proxy blocking as retryable partial failures, causing wasted attempts and noisy failure paths. This change classifies those conditions as terminal so harnesses fail fast instead of retrying.

What changed
- Added a shared detector in actions/setup/js/harness_retry_guard.cjs for:
  - AI-credit exhaustion signals (including max_ai_credits_exceeded / AI-credit rate-limit variants)
  - AWF API proxy request-blocking signals (including DIFC_FILTERED)
- Integrated this detector into retry loops in:
  - claude_harness.cjs
  - copilot_harness.cjs
  - codex_harness.cjs
- On detection, harnesses now stop retrying immediately and log a guard-specific non-retry reason.
- Updated guard-stop log suffixes to provider-neutral wording (non-retryable guard condition) across all three harnesses.
- Expanded guard detector tests to cover additional AI-credit and API-proxy signal variants plus edge cases (non-string input and combined signals), and updated the test module import pattern for Vitest ESM execution.
- Aligned detector JSDoc typing with runtime behavior (output accepts non-string values) and reformatted pattern arrays for readability.
Why this is scoped this way
- The logic is centralized to keep retry policy consistent across all harnesses and avoid drift in error-pattern handling.
Example

const nonRetryableGuard = detectNonRetryableHarnessGuard(result.output);
if (nonRetryableGuard.aiCreditsExceeded || nonRetryableGuard.awfAPIProxyBlockingRequests) {
  log("...not retrying (non-retryable guard condition)");
  break;
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

This PR updates the JavaScript agent harness retry logic to fail fast (stop retrying) when failures are caused by non-retryable guardrail conditions, specifically AI-credit exhaustion and AWF API proxy request blocking, reducing wasted attempts and noisy retry paths.

Changes:

Added a shared detector (detectNonRetryableHarnessGuard) to identify AI-credit exhaustion and AWF proxy-blocking signals.
Integrated the detector into the retry loops for Claude, Codex, and Copilot harnesses to break out early on these terminal conditions.
Added unit tests for the new guard detector.

Show a summary per file

File	Description
actions/setup/js/harness_retry_guard.cjs	New shared regex-based detector for non-retryable “guard” output markers.
actions/setup/js/harness_retry_guard.test.cjs	New unit tests validating detection of AI-credit and AWF proxy-block signals.
actions/setup/js/claude_harness.cjs	Stops retrying when the new detector identifies a terminal guard condition.
actions/setup/js/codex_harness.cjs	Stops retrying when the new detector identifies a terminal guard condition.
actions/setup/js/copilot_harness.cjs	Stops retrying when the new detector identifies a terminal guard condition.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 5/5 changed files
Comments generated: 5

+import { describe, expect, it } from "vitest";
+
+const { detectNonRetryableHarnessGuard } = require("./harness_retry_guard.cjs");
+


+/**
+ * Detect retry guard conditions that should stop harness retries immediately.
+ * @param {string} output
+ * @returns {{ aiCreditsExceeded: boolean, awfAPIProxyBlockingRequests: boolean }}
+ */
+function detectNonRetryableHarnessGuard(output) {
+  const safeOutput = typeof output === "string" ? output : "";


+          const reasons = [];
+          if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded");
+          if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests");
+          log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`);


+      const reasons = [];
+      if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded");
+      if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests");
+      log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`);


+      const reasons = [];
+      if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded");
+      if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests");
+      log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`);


github-actions · 2026-06-09T04:36:03Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-06-09T04:36:26Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #38018 does not have the implementation label and has 0 new lines of code in business logic directories (below the 100-line threshold).

github-actions · 2026-06-09T04:36:32Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-09T04:36:54Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions

Skills-Based Review 🧠

Applied /tdd and /diagnose — commenting with non-blocking suggestions focused on test coverage and diagnosability. The centralised guard design is solid.

📋 Key Themes & Highlights

Key Themes

Test coverage gaps: 7 regex patterns across two signal types, but only 3 are exercised by the positive test cases. The untested patterns (second/third AI-credits variants, second/third AWF proxy variants) could drift silently.
Pattern readability: Both constant arrays are single long lines, making future audits and diffs difficult.
Misleading log label: "persistent AWF guard condition" is used even for AI-credits exhaustion, which is an AI-provider signal — could send an operator in the wrong direction during incident diagnosis.

Positive Highlights

✅ Excellent centralisation — one shared module vs. three independently drifting copies.
✅ Defensive typeof output === "string" ? output : "" guard handles unexpected input gracefully.
✅ Good additive-only approach — zero deletions means no regression risk to existing retry behaviour.
✅ Tests exist for the new module, following established codebase conventions.
✅ PR description clearly explains scope, rationale, and an example — makes the intent reviewable.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 296.5 AIC · ⌖ 13.6 AIC

github-actions · 2026-06-09T04:44:15Z

+  it("detects AI credits exceeded markers", () => {
+    const result = detectNonRetryableHarnessGuard("error: max_ai_credits_exceeded=true");
+    expect(result.aiCreditsExceeded).toBe(true);
+    expect(result.awfAPIProxyBlockingRequests).toBe(false);


[/tdd] Only the first AI-credits pattern (max_ai_credits_exceeded) is exercised. The second and third patterns are untested, so a regression in those regexes would go undetected.

💡 Add tests for the other two patterns

it("detects ai_credits_rate_limit_error signal", () => { const result = detectNonRetryableHarnessGuard("error: ai_credits_rate_limit_error=true"); expect(result.aiCreditsExceeded).toBe(true); }); it("detects broad ai-credits-budget-exceeded signal", () => { const result = detectNonRetryableHarnessGuard("ai credits budget exceeded"); expect(result.aiCreditsExceeded).toBe(true); });

Each regex in AI_CREDITS_EXCEEDED_PATTERNS represents a distinct signal variant. One test per pattern ensures the full detection surface is validated.

github-actions · 2026-06-09T04:44:15Z

+
+  it("detects AWF API proxy blocking request markers", () => {
+    const result = detectNonRetryableHarnessGuard("awf api proxy is blocking requests for this run");
+    expect(result.aiCreditsExceeded).toBe(false);


[/tdd] Only the first AWF proxy pattern (full phrase with "awf") is tested in the positive direction; the second (api-proxy.*blocking requests) and third (api-proxy.*blocked requests) patterns have no dedicated test.

💡 Cover the remaining AWF proxy patterns

it("detects api-proxy blocking requests without awf prefix", () => { const result = detectNonRetryableHarnessGuard("api-proxy is blocking requests"); expect(result.awfAPIProxyBlockingRequests).toBe(true); }); it("detects api-proxy blocked requests variant", () => { const result = detectNonRetryableHarnessGuard("api proxy blocked request"); expect(result.awfAPIProxyBlockingRequests).toBe(true); });

Driving one test per pattern (rather than one test per signal type) keeps each regex independently verifiable and makes failures self-diagnosing.

github-actions · 2026-06-09T04:44:15Z

+    expect(result.aiCreditsExceeded).toBe(false);
+    expect(result.awfAPIProxyBlockingRequests).toBe(false);
+  });
+});


[/tdd] No test covers non-string input (e.g. null, undefined, an object) or combined signals where both flags are true. These are the boundary conditions most likely to surface a real regression.

💡 Add boundary tests

it("returns false for null input", () => { const result = detectNonRetryableHarnessGuard(null); expect(result.aiCreditsExceeded).toBe(false); expect(result.awfAPIProxyBlockingRequests).toBe(false); }); it("returns both flags true when output contains both signals", () => { const combined = "max_ai_credits_exceeded=true; DIFC_FILTERED"; const result = detectNonRetryableHarnessGuard(combined); expect(result.aiCreditsExceeded).toBe(true); expect(result.awfAPIProxyBlockingRequests).toBe(true); });

The combined-signals test also validates that the log message in each harness correctly emits both reasons separated by " and ".

github-actions · 2026-06-09T04:44:15Z

+
+"use strict";
+
+const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i];


[/diagnose] All three AI-credits patterns are on one long line, making them very hard to review for correctness and nearly impossible to diff in future PRs.

💡 One pattern per line

const AI_CREDITS_EXCEEDED_PATTERNS = [ /\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i, ];

Note also that the third pattern lacks a leading \b word-boundary unlike the first two — worth confirming that is intentional (it broadens the match surface).

github-actions · 2026-06-09T04:44:15Z

+
+const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i];
+
+const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [/\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i, /\bDIFC_FILTERED\b/];


[/diagnose] Same readability issue: all four AWF proxy patterns are on one line. Splitting to one-per-line makes each pattern independently reviewable and auditable.

💡 Suggested formatting

const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [ /\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i, /\bDIFC_FILTERED\b/, ];

github-actions · 2026-06-09T04:44:15Z

+      const reasons = [];
+      if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded");
+      if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests");
+      log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`);


[/diagnose] The log message always says "persistent AWF guard condition" even when only aiCreditsExceeded is true — that is an AI-provider condition, not an AWF condition. A reader diagnosing a failure would be pointed in the wrong direction.

💡 Use a more accurate label

Consider a neutral label such as "non-retryable guard condition" or check which flag triggered and use a specific term:

log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (non-retryable guard condition)`);

The same change applies to the equivalent lines in codex_harness.cjs (line 434) and copilot_harness.cjs (line 746).

github-actions

✅ No blocking issues

Pure formatting pass — collapses two multi-line regex arrays to single-line form. Both result lines (231 and 221 chars) fit the projects printWidth: 240` Prettier config. No functional, logical, or interface changes.

🔎 Code quality review by PR Code Quality Reviewer · ⌖ 12.8 AIC

github-actions · 2026-06-09T04:47:36Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 100/100 — Excellent

Analyzed 4 test(s) in 1 newly added JavaScript test file: 4 design tests, 0 implementation tests, 0 guideline violations.

📊 Metrics & Test Classification (4 tests analyzed)

Metric	Value
New/modified tests analyzed	4
✅ Design tests (behavioral contracts)	4 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	4 (100%)
Duplicate test clusters	0
Test inflation detected	No (ratio ≈ 1.04×)
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Issues Detected
`detects AI credits exceeded markers`	`actions/setup/js/harness_retry_guard.test.cjs`	✅ Design	None — asserts positive match AND negative cross-check
`detects AWF API proxy blocking request markers`	`actions/setup/js/harness_retry_guard.test.cjs`	✅ Design	None — asserts positive match AND negative cross-check
`detects DIFC filtered proxy block markers`	`actions/setup/js/harness_retry_guard.test.cjs`	✅ Design	None — tests a distinct JSON-embedded pattern
`returns false when output has no guard markers`	`actions/setup/js/harness_retry_guard.test.cjs`	✅ Design	None — explicit negative/no-match path

Language Support

Tests analyzed:

🟨 JavaScript (*.test.cjs): 4 tests (vitest)

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). No mocks, no guideline violations, all 4 tests assert directly on the module's observable boolean return values.

💡 Optional Hardening Suggestions

The current suite covers the happy path and a clean negative case with excellent density. Consider adding these for deeper confidence in the regex library:

Null/undefined input guard: detectNonRetryableHarnessGuard(null) — the production code already handles this, but a test would lock in that contract.
Empty string: detectNonRetryableHarnessGuard("") — ensures no pattern accidentally matches the empty string.
Mixed signals: A string containing both an AI-credit marker and a proxy-blocking marker — confirms both flags can be true simultaneously.

Three modified production files (claude_harness.cjs, codex_harness.cjs, copilot_harness.cjs) have no corresponding test changes in this PR. If the integration path through those files exercises the new guard, a future PR could add integration-level tests to close that gap.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §27183534573

🧪 Test quality analysis by Test Quality Sentinel · 310.2 AIC · ⌖ 27.2 AIC · ◷

github-actions

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 4 tests are behavioral design tests with no mocks, no guideline violations, and a healthy test-to-production line ratio of ~1.04×.

pelikhan · 2026-06-09T04:48:41Z

@copilot Run pr-finisher skill

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-06-09T05:03:03Z

@copilot Run pr-finisher skill

Done. I ran the pr-finisher pass, addressed the actionable review feedback, and pushed commit e8c2645 with the fixes and validation updates.

Copilot AI and others added 2 commits June 9, 2026 03:57

Add shared non-retry harness guard

4e09c62

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Format shared harness retry guard

9ca6fa8

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan Jun 9, 2026

Copilot AI changed the title ~~Stop harness retries when AI credits/proxy guards trigger~~ Stop harness retries on AI-credit exhaustion and AWF proxy blocking signals Jun 9, 2026

Copilot finished work on behalf of pelikhan June 9, 2026 04:03

Copilot created this pull request from a session on behalf of pelikhan June 9, 2026 04:03 View session

Copilot AI requested a review from pelikhan June 9, 2026 04:03

pelikhan marked this pull request as ready for review June 9, 2026 04:17

Copilot AI review requested due to automatic review settings June 9, 2026 04:17

Copilot started reviewing on behalf of pelikhan June 9, 2026 04:18 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 9, 2026

[aw] No-Op Runs #37963

Open

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

github-actions Bot approved these changes Jun 9, 2026

View reviewed changes

Copilot started work on behalf of pelikhan June 9, 2026 04:48 View session

Address guardrail review feedback in harness retry logic

e8c2645

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan June 9, 2026 05:03

pelikhan merged commit 420bdb5 into main Jun 9, 2026
20 checks passed

pelikhan deleted the copilot/update-harness-ai-credit-detection branch June 9, 2026 05:10

github-actions Bot mentioned this pull request Jun 9, 2026

Impact Efficiency Report - 2026-06-09 #38040

Open

		import { describe, expect, it } from "vitest";

		const { detectNonRetryableHarnessGuard } = require("./harness_retry_guard.cjs");


		"use strict";

		const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]ai[\s_-]credits[\s_-]exceeded\b/i, /\bai[\s_-]credits[\s_-]rate[\s_-]limit[\s_-]error\b/i, /ai[\s_-]credits?.(?:rate[\s-]limit\|limit exceeded\|budget exceeded\|exceeded)/i];


		const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]ai[\s_-]credits[\s_-]exceeded\b/i, /\bai[\s_-]credits[\s_-]rate[\s_-]limit[\s_-]error\b/i, /ai[\s_-]credits?.(?:rate[\s-]limit\|limit exceeded\|budget exceeded\|exceeded)/i];

		const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [/\bawf\b.\bapi[\s_-]proxy\b.\bblocking requests\b/i, /\bapi[\s_-]proxy\b.\bblocking requests\b/i, /\bapi[\s_-]proxy\b.*\bblocked requests?\b/i, /\bDIFC_FILTERED\b/];

Conversation

Copilot AI commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

✅ No blocking issues

Uh oh!

github-actions Bot commented Jun 9, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pelikhan commented Jun 9, 2026

Uh oh!

Copilot AI commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading