Skip to content

Stop harness retries on AI-credit exhaustion and AWF proxy blocking signals#38018

Merged
pelikhan merged 3 commits into
mainfrom
copilot/update-harness-ai-credit-detection
Jun 9, 2026
Merged

Stop harness retries on AI-credit exhaustion and AWF proxy blocking signals#38018
pelikhan merged 3 commits into
mainfrom
copilot/update-harness-ai-credit-detection

Conversation

Copilot AI commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Harness retries were treating AI-credit guardrail failures and AWF API proxy blocking as retryable partial failures, causing wasted attempts and noisy failure paths. This change classifies those conditions as terminal so harnesses fail fast instead of retrying.

  • What changed

    • Added a shared detector in actions/setup/js/harness_retry_guard.cjs for:
      • AI-credit exhaustion signals (including max_ai_credits_exceeded / AI-credit rate-limit variants)
      • AWF API proxy request-blocking signals (including DIFC_FILTERED)
    • Integrated this detector into retry loops in:
      • claude_harness.cjs
      • copilot_harness.cjs
      • codex_harness.cjs
    • On detection, harnesses now stop retrying immediately and log a guard-specific non-retry reason.
    • Updated guard-stop log suffixes to provider-neutral wording (non-retryable guard condition) across all three harnesses.
    • Expanded guard detector tests to cover additional AI-credit and API-proxy signal variants plus edge cases (non-string input and combined signals), and updated the test module import pattern for Vitest ESM execution.
    • Aligned detector JSDoc typing with runtime behavior (output accepts non-string values) and reformatted pattern arrays for readability.
  • Why this is scoped this way

    • The logic is centralized to keep retry policy consistent across all harnesses and avoid drift in error-pattern handling.
  • Example

const nonRetryableGuard = detectNonRetryableHarnessGuard(result.output);
if (nonRetryableGuard.aiCreditsExceeded || nonRetryableGuard.awfAPIProxyBlockingRequests) {
  log("...not retrying (non-retryable guard condition)");
  break;
}

Copilot AI and others added 2 commits June 9, 2026 03:57
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title Stop harness retries when AI credits/proxy guards trigger Stop harness retries on AI-credit exhaustion and AWF proxy blocking signals Jun 9, 2026
Copilot AI requested a review from pelikhan June 9, 2026 04:03
@pelikhan pelikhan marked this pull request as ready for review June 9, 2026 04:17
Copilot AI review requested due to automatic review settings June 9, 2026 04:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the JavaScript agent harness retry logic to fail fast (stop retrying) when failures are caused by non-retryable guardrail conditions, specifically AI-credit exhaustion and AWF API proxy request blocking, reducing wasted attempts and noisy retry paths.

Changes:

  • Added a shared detector (detectNonRetryableHarnessGuard) to identify AI-credit exhaustion and AWF proxy-blocking signals.
  • Integrated the detector into the retry loops for Claude, Codex, and Copilot harnesses to break out early on these terminal conditions.
  • Added unit tests for the new guard detector.
Show a summary per file
File Description
actions/setup/js/harness_retry_guard.cjs New shared regex-based detector for non-retryable “guard” output markers.
actions/setup/js/harness_retry_guard.test.cjs New unit tests validating detection of AI-credit and AWF proxy-block signals.
actions/setup/js/claude_harness.cjs Stops retrying when the new detector identifies a terminal guard condition.
actions/setup/js/codex_harness.cjs Stops retrying when the new detector identifies a terminal guard condition.
actions/setup/js/copilot_harness.cjs Stops retrying when the new detector identifies a terminal guard condition.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 5

Comment on lines +1 to +4
import { describe, expect, it } from "vitest";

const { detectNonRetryableHarnessGuard } = require("./harness_retry_guard.cjs");

Comment on lines +9 to +15
/**
* Detect retry guard conditions that should stop harness retries immediately.
* @param {string} output
* @returns {{ aiCreditsExceeded: boolean, awfAPIProxyBlockingRequests: boolean }}
*/
function detectNonRetryableHarnessGuard(output) {
const safeOutput = typeof output === "string" ? output : "";
Comment thread actions/setup/js/copilot_harness.cjs Outdated
const reasons = [];
if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded");
if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests");
log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`);
Comment thread actions/setup/js/codex_harness.cjs Outdated
const reasons = [];
if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded");
if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests");
log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`);
Comment thread actions/setup/js/claude_harness.cjs Outdated
const reasons = [];
if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded");
if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests");
log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`);
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #38018 does not have the implementation label and has 0 new lines of code in business logic directories (below the 100-line threshold).

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions github-actions Bot mentioned this pull request Jun 9, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /diagnose — commenting with non-blocking suggestions focused on test coverage and diagnosability. The centralised guard design is solid.

📋 Key Themes & Highlights

Key Themes

  • Test coverage gaps: 7 regex patterns across two signal types, but only 3 are exercised by the positive test cases. The untested patterns (second/third AI-credits variants, second/third AWF proxy variants) could drift silently.
  • Pattern readability: Both constant arrays are single long lines, making future audits and diffs difficult.
  • Misleading log label: "persistent AWF guard condition" is used even for AI-credits exhaustion, which is an AI-provider signal — could send an operator in the wrong direction during incident diagnosis.

Positive Highlights

  • ✅ Excellent centralisation — one shared module vs. three independently drifting copies.
  • ✅ Defensive typeof output === "string" ? output : "" guard handles unexpected input gracefully.
  • ✅ Good additive-only approach — zero deletions means no regression risk to existing retry behaviour.
  • ✅ Tests exist for the new module, following established codebase conventions.
  • ✅ PR description clearly explains scope, rationale, and an example — makes the intent reviewable.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 296.5 AIC · ⌖ 13.6 AIC

it("detects AI credits exceeded markers", () => {
const result = detectNonRetryableHarnessGuard("error: max_ai_credits_exceeded=true");
expect(result.aiCreditsExceeded).toBe(true);
expect(result.awfAPIProxyBlockingRequests).toBe(false);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Only the first AI-credits pattern (max_ai_credits_exceeded) is exercised. The second and third patterns are untested, so a regression in those regexes would go undetected.

💡 Add tests for the other two patterns
it("detects ai_credits_rate_limit_error signal", () => {
  const result = detectNonRetryableHarnessGuard("error: ai_credits_rate_limit_error=true");
  expect(result.aiCreditsExceeded).toBe(true);
});

it("detects broad ai-credits-budget-exceeded signal", () => {
  const result = detectNonRetryableHarnessGuard("ai credits budget exceeded");
  expect(result.aiCreditsExceeded).toBe(true);
});

Each regex in AI_CREDITS_EXCEEDED_PATTERNS represents a distinct signal variant. One test per pattern ensures the full detection surface is validated.


it("detects AWF API proxy blocking request markers", () => {
const result = detectNonRetryableHarnessGuard("awf api proxy is blocking requests for this run");
expect(result.aiCreditsExceeded).toBe(false);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Only the first AWF proxy pattern (full phrase with "awf") is tested in the positive direction; the second (api-proxy.*blocking requests) and third (api-proxy.*blocked requests) patterns have no dedicated test.

💡 Cover the remaining AWF proxy patterns
it("detects api-proxy blocking requests without awf prefix", () => {
  const result = detectNonRetryableHarnessGuard("api-proxy is blocking requests");
  expect(result.awfAPIProxyBlockingRequests).toBe(true);
});

it("detects api-proxy blocked requests variant", () => {
  const result = detectNonRetryableHarnessGuard("api proxy blocked request");
  expect(result.awfAPIProxyBlockingRequests).toBe(true);
});

Driving one test per pattern (rather than one test per signal type) keeps each regex independently verifiable and makes failures self-diagnosing.

expect(result.aiCreditsExceeded).toBe(false);
expect(result.awfAPIProxyBlockingRequests).toBe(false);
});
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] No test covers non-string input (e.g. null, undefined, an object) or combined signals where both flags are true. These are the boundary conditions most likely to surface a real regression.

💡 Add boundary tests
it("returns false for null input", () => {
  const result = detectNonRetryableHarnessGuard(null);
  expect(result.aiCreditsExceeded).toBe(false);
  expect(result.awfAPIProxyBlockingRequests).toBe(false);
});

it("returns both flags true when output contains both signals", () => {
  const combined = "max_ai_credits_exceeded=true; DIFC_FILTERED";
  const result = detectNonRetryableHarnessGuard(combined);
  expect(result.aiCreditsExceeded).toBe(true);
  expect(result.awfAPIProxyBlockingRequests).toBe(true);
});

The combined-signals test also validates that the log message in each harness correctly emits both reasons separated by " and ".


"use strict";

const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] All three AI-credits patterns are on one long line, making them very hard to review for correctness and nearly impossible to diff in future PRs.

💡 One pattern per line
const AI_CREDITS_EXCEEDED_PATTERNS = [
  /\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i,
  /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i,
  /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i,
];

Note also that the third pattern lacks a leading \b word-boundary unlike the first two — worth confirming that is intentional (it broadens the match surface).


const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i];

const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [/\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i, /\bDIFC_FILTERED\b/];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] Same readability issue: all four AWF proxy patterns are on one line. Splitting to one-per-line makes each pattern independently reviewable and auditable.

💡 Suggested formatting
const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [
  /\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i,
  /\bapi[\s_-]*proxy\b.*\bblocking requests\b/i,
  /\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i,
  /\bDIFC_FILTERED\b/,
];

Comment thread actions/setup/js/claude_harness.cjs Outdated
const reasons = [];
if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded");
if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests");
log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (persistent AWF guard condition)`);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The log message always says "persistent AWF guard condition" even when only aiCreditsExceeded is true — that is an AI-provider condition, not an AWF condition. A reader diagnosing a failure would be pointed in the wrong direction.

💡 Use a more accurate label

Consider a neutral label such as "non-retryable guard condition" or check which flag triggered and use a specific term:

log(`attempt ${attempt + 1}: ${reasons.join(" and ")} — not retrying (non-retryable guard condition)`);

The same change applies to the equivalent lines in codex_harness.cjs (line 434) and copilot_harness.cjs (line 746).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No blocking issues

Pure formatting pass — collapses two multi-line regex arrays to single-line form. Both result lines (231 and 221 chars) fit the projects printWidth: 240` Prettier config. No functional, logical, or interface changes.

🔎 Code quality review by PR Code Quality Reviewer · ⌖ 12.8 AIC

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 100/100 — Excellent

Analyzed 4 test(s) in 1 newly added JavaScript test file: 4 design tests, 0 implementation tests, 0 guideline violations.

📊 Metrics & Test Classification (4 tests analyzed)
Metric Value
New/modified tests analyzed 4
✅ Design tests (behavioral contracts) 4 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 4 (100%)
Duplicate test clusters 0
Test inflation detected No (ratio ≈ 1.04×)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
detects AI credits exceeded markers actions/setup/js/harness_retry_guard.test.cjs ✅ Design None — asserts positive match AND negative cross-check
detects AWF API proxy blocking request markers actions/setup/js/harness_retry_guard.test.cjs ✅ Design None — asserts positive match AND negative cross-check
detects DIFC filtered proxy block markers actions/setup/js/harness_retry_guard.test.cjs ✅ Design None — tests a distinct JSON-embedded pattern
returns false when output has no guard markers actions/setup/js/harness_retry_guard.test.cjs ✅ Design None — explicit negative/no-match path

Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 4 tests (vitest)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). No mocks, no guideline violations, all 4 tests assert directly on the module's observable boolean return values.

💡 Optional Hardening Suggestions

The current suite covers the happy path and a clean negative case with excellent density. Consider adding these for deeper confidence in the regex library:

  • Null/undefined input guard: detectNonRetryableHarnessGuard(null) — the production code already handles this, but a test would lock in that contract.
  • Empty string: detectNonRetryableHarnessGuard("") — ensures no pattern accidentally matches the empty string.
  • Mixed signals: A string containing both an AI-credit marker and a proxy-blocking marker — confirms both flags can be true simultaneously.

Three modified production files (claude_harness.cjs, codex_harness.cjs, copilot_harness.cjs) have no corresponding test changes in this PR. If the integration path through those files exercises the new guard, a future PR could add integration-level tests to close that gap.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §27183534573

🧪 Test quality analysis by Test Quality Sentinel · 310.2 AIC · ⌖ 27.2 AIC ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 4 tests are behavioral design tests with no mocks, no guideline violations, and a healthy test-to-production line ratio of ~1.04×.

@pelikhan

pelikhan commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

@copilot Run pr-finisher skill

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@copilot Run pr-finisher skill

Done. I ran the pr-finisher pass, addressed the actionable review feedback, and pushed commit e8c2645 with the fixes and validation updates.

@pelikhan pelikhan merged commit 420bdb5 into main Jun 9, 2026
20 checks passed
@pelikhan pelikhan deleted the copilot/update-harness-ai-credit-detection branch June 9, 2026 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants