Stop harness retries on AI-credit exhaustion and AWF proxy blocking signals#38018
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the JavaScript agent harness retry logic to fail fast (stop retrying) when failures are caused by non-retryable guardrail conditions, specifically AI-credit exhaustion and AWF API proxy request blocking, reducing wasted attempts and noisy retry paths.
Changes:
- Added a shared detector (
detectNonRetryableHarnessGuard) to identify AI-credit exhaustion and AWF proxy-blocking signals. - Integrated the detector into the retry loops for Claude, Codex, and Copilot harnesses to break out early on these terminal conditions.
- Added unit tests for the new guard detector.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/harness_retry_guard.cjs | New shared regex-based detector for non-retryable “guard” output markers. |
| actions/setup/js/harness_retry_guard.test.cjs | New unit tests validating detection of AI-credit and AWF proxy-block signals. |
| actions/setup/js/claude_harness.cjs | Stops retrying when the new detector identifies a terminal guard condition. |
| actions/setup/js/codex_harness.cjs | Stops retrying when the new detector identifies a terminal guard condition. |
| actions/setup/js/copilot_harness.cjs | Stops retrying when the new detector identifies a terminal guard condition. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 5
| import { describe, expect, it } from "vitest"; | ||
|
|
||
| const { detectNonRetryableHarnessGuard } = require("./harness_retry_guard.cjs"); | ||
|
|
| /** | ||
| * Detect retry guard conditions that should stop harness retries immediately. | ||
| * @param {string} output | ||
| * @returns {{ aiCreditsExceeded: boolean, awfAPIProxyBlockingRequests: boolean }} | ||
| */ | ||
| function detectNonRetryableHarnessGuard(output) { | ||
| const safeOutput = typeof output === "string" ? output : ""; |
| const reasons = []; | ||
| if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded"); | ||
| if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests"); | ||
| log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`); |
| const reasons = []; | ||
| if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded"); | ||
| if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests"); | ||
| log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`); |
| const reasons = []; | ||
| if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded"); | ||
| if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests"); | ||
| log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`); |
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #38018 does not have the implementation label and has 0 new lines of code in business logic directories (below the 100-line threshold). |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ PR Code Quality Reviewer completed the code quality review. |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd and /diagnose — commenting with non-blocking suggestions focused on test coverage and diagnosability. The centralised guard design is solid.
📋 Key Themes & Highlights
Key Themes
- Test coverage gaps: 7 regex patterns across two signal types, but only 3 are exercised by the positive test cases. The untested patterns (second/third AI-credits variants, second/third AWF proxy variants) could drift silently.
- Pattern readability: Both constant arrays are single long lines, making future audits and diffs difficult.
- Misleading log label:
"persistent AWF guard condition"is used even for AI-credits exhaustion, which is an AI-provider signal — could send an operator in the wrong direction during incident diagnosis.
Positive Highlights
- ✅ Excellent centralisation — one shared module vs. three independently drifting copies.
- ✅ Defensive
typeof output === "string" ? output : ""guard handles unexpected input gracefully. - ✅ Good additive-only approach — zero deletions means no regression risk to existing retry behaviour.
- ✅ Tests exist for the new module, following established codebase conventions.
- ✅ PR description clearly explains scope, rationale, and an example — makes the intent reviewable.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 296.5 AIC · ⌖ 13.6 AIC
| it("detects AI credits exceeded markers", () => { | ||
| const result = detectNonRetryableHarnessGuard("error: max_ai_credits_exceeded=true"); | ||
| expect(result.aiCreditsExceeded).toBe(true); | ||
| expect(result.awfAPIProxyBlockingRequests).toBe(false); |
There was a problem hiding this comment.
[/tdd] Only the first AI-credits pattern (max_ai_credits_exceeded) is exercised. The second and third patterns are untested, so a regression in those regexes would go undetected.
💡 Add tests for the other two patterns
it("detects ai_credits_rate_limit_error signal", () => {
const result = detectNonRetryableHarnessGuard("error: ai_credits_rate_limit_error=true");
expect(result.aiCreditsExceeded).toBe(true);
});
it("detects broad ai-credits-budget-exceeded signal", () => {
const result = detectNonRetryableHarnessGuard("ai credits budget exceeded");
expect(result.aiCreditsExceeded).toBe(true);
});Each regex in AI_CREDITS_EXCEEDED_PATTERNS represents a distinct signal variant. One test per pattern ensures the full detection surface is validated.
|
|
||
| it("detects AWF API proxy blocking request markers", () => { | ||
| const result = detectNonRetryableHarnessGuard("awf api proxy is blocking requests for this run"); | ||
| expect(result.aiCreditsExceeded).toBe(false); |
There was a problem hiding this comment.
[/tdd] Only the first AWF proxy pattern (full phrase with "awf") is tested in the positive direction; the second (api-proxy.*blocking requests) and third (api-proxy.*blocked requests) patterns have no dedicated test.
💡 Cover the remaining AWF proxy patterns
it("detects api-proxy blocking requests without awf prefix", () => {
const result = detectNonRetryableHarnessGuard("api-proxy is blocking requests");
expect(result.awfAPIProxyBlockingRequests).toBe(true);
});
it("detects api-proxy blocked requests variant", () => {
const result = detectNonRetryableHarnessGuard("api proxy blocked request");
expect(result.awfAPIProxyBlockingRequests).toBe(true);
});Driving one test per pattern (rather than one test per signal type) keeps each regex independently verifiable and makes failures self-diagnosing.
| expect(result.aiCreditsExceeded).toBe(false); | ||
| expect(result.awfAPIProxyBlockingRequests).toBe(false); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
[/tdd] No test covers non-string input (e.g. null, undefined, an object) or combined signals where both flags are true. These are the boundary conditions most likely to surface a real regression.
💡 Add boundary tests
it("returns false for null input", () => {
const result = detectNonRetryableHarnessGuard(null);
expect(result.aiCreditsExceeded).toBe(false);
expect(result.awfAPIProxyBlockingRequests).toBe(false);
});
it("returns both flags true when output contains both signals", () => {
const combined = "max_ai_credits_exceeded=true; DIFC_FILTERED";
const result = detectNonRetryableHarnessGuard(combined);
expect(result.aiCreditsExceeded).toBe(true);
expect(result.awfAPIProxyBlockingRequests).toBe(true);
});The combined-signals test also validates that the log message in each harness correctly emits both reasons separated by " and ".
|
|
||
| "use strict"; | ||
|
|
||
| const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i]; |
There was a problem hiding this comment.
[/diagnose] All three AI-credits patterns are on one long line, making them very hard to review for correctness and nearly impossible to diff in future PRs.
💡 One pattern per line
const AI_CREDITS_EXCEEDED_PATTERNS = [
/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i,
/\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i,
/ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i,
];Note also that the third pattern lacks a leading \b word-boundary unlike the first two — worth confirming that is intentional (it broadens the match surface).
|
|
||
| const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i]; | ||
|
|
||
| const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [/\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i, /\bDIFC_FILTERED\b/]; |
There was a problem hiding this comment.
[/diagnose] Same readability issue: all four AWF proxy patterns are on one line. Splitting to one-per-line makes each pattern independently reviewable and auditable.
💡 Suggested formatting
const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [
/\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i,
/\bapi[\s_-]*proxy\b.*\bblocking requests\b/i,
/\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i,
/\bDIFC_FILTERED\b/,
];| const reasons = []; | ||
| if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded"); | ||
| if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests"); | ||
| log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`); |
There was a problem hiding this comment.
[/diagnose] The log message always says "persistent AWF guard condition" even when only aiCreditsExceeded is true — that is an AI-provider condition, not an AWF condition. A reader diagnosing a failure would be pointed in the wrong direction.
💡 Use a more accurate label
Consider a neutral label such as "non-retryable guard condition" or check which flag triggered and use a specific term:
log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (non-retryable guard condition)`);The same change applies to the equivalent lines in codex_harness.cjs (line 434) and copilot_harness.cjs (line 746).
There was a problem hiding this comment.
✅ No blocking issues
Pure formatting pass — collapses two multi-line regex arrays to single-line form. Both result lines (231 and 221 chars) fit the projects printWidth: 240` Prettier config. No functional, logical, or interface changes.
🔎 Code quality review by PR Code Quality Reviewer · ⌖ 12.8 AIC
🧪 Test Quality Sentinel Report✅ Test Quality Score: 100/100 — Excellent
📊 Metrics & Test Classification (4 tests analyzed)
Test Classification Details
Language SupportTests analyzed:
Verdict
💡 Optional Hardening SuggestionsThe current suite covers the happy path and a clean negative case with excellent density. Consider adding these for deeper confidence in the regex library:
Three modified production files ( 📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators. References: §27183534573
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 4 tests are behavioral design tests with no mocks, no guideline violations, and a healthy test-to-production line ratio of ~1.04×.
|
@copilot Run pr-finisher skill |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done. I ran the pr-finisher pass, addressed the actionable review feedback, and pushed commit |
Harness retries were treating AI-credit guardrail failures and AWF API proxy blocking as retryable partial failures, causing wasted attempts and noisy failure paths. This change classifies those conditions as terminal so harnesses fail fast instead of retrying.
What changed
actions/setup/js/harness_retry_guard.cjsfor:max_ai_credits_exceeded/ AI-credit rate-limit variants)DIFC_FILTERED)claude_harness.cjscopilot_harness.cjscodex_harness.cjsnon-retryable guard condition) across all three harnesses.outputaccepts non-string values) and reformatted pattern arrays for readability.Why this is scoped this way
Example