Skip to content

feat(lib): define agent failure taxonomy with structured error codes#238

Merged
snipcodeit merged 1 commit intomainfrom
issue/229-define-agent-failure-taxonomy-with-struc
Mar 6, 2026
Merged

feat(lib): define agent failure taxonomy with structured error codes#238
snipcodeit merged 1 commit intomainfrom
issue/229-define-agent-failure-taxonomy-with-struc

Conversation

@snipcodeit
Copy link
Owner

Summary

  • Created lib/agent-errors.cjs with a formal taxonomy of 5 GSD agent failure modes: timeout, malformed-output, partial-completion, hallucination, and permission-denied
  • Each failure type carries a structured error code (AGENT_ERR_*), severity level (critical/high/medium/low), human-readable description, and recommended recovery action
  • Includes AgentFailureError class extending MgwError, plus classifyAgentFailure() for pattern and context-based classification
  • Complements existing lib/retry.cjs pipeline retry infrastructure with agent-specific failure handling

Closes #229

Milestone Context

  • Milestone: v8 — Agent Reliability & Failure Recovery
  • Phase: 44 — Agent Failure Taxonomy & Diagnostic Logging
  • Issue: 1 of 9 in milestone

Changes

lib/

File Action Description
lib/agent-errors.cjs Added Agent failure taxonomy module (384 lines)

Module exports:

  • AGENT_FAILURE_TYPES — frozen object with 5 failure type definitions
  • SEVERITY_LEVELS — ordered severity enum with numeric weights
  • AgentFailureError — Error class extending MgwError with agentType, failureType, artifacts fields
  • classifyAgentFailure(error, context) — pattern + context-based classification
  • getRecoveryAction(failureType) — recovery action lookup
  • isRetryable(failureType) — retry eligibility check
  • compareSeverity(a, b) — severity comparison utility
  • getFailureByCode(code) — reverse lookup by error code

Failure type summary:

Code Name Severity Retryable
AGENT_ERR_TIMEOUT timeout high yes
AGENT_ERR_MALFORMED_OUTPUT malformed-output high yes
AGENT_ERR_PARTIAL_COMPLETION partial-completion medium yes
AGENT_ERR_HALLUCINATION hallucination critical no
AGENT_ERR_PERMISSION_DENIED permission-denied high no

Test Plan

  • node -e "require('./lib/agent-errors.cjs')" loads without errors
  • All 5 failure types present with complete field sets (code, name, severity, description, recovery, retryable)
  • AgentFailureError extends MgwError (instanceof check)
  • classifyAgentFailure() correctly classifies all 5 failure patterns via message matching
  • classifyAgentFailure() handles context-based classification (artifact mismatch, partial tasks)
  • classifyAgentFailure() returns null for unrecognized errors (fallback to retry.cjs)
  • getRecoveryAction() returns valid action/retryable/severity for each type
  • isRetryable() returns true for timeout/malformed-output/partial-completion, false for hallucination/permission-denied
  • No new external dependencies introduced

Add lib/agent-errors.cjs with a formal taxonomy of 5 GSD agent failure
modes: timeout, malformed-output, partial-completion, hallucination, and
permission-denied. Each failure type has a structured error code
(AGENT_ERR_*), severity level, description, and recommended recovery
action. Includes AgentFailureError class extending MgwError,
classifyAgentFailure() for pattern and context-based classification,
getRecoveryAction() for recovery lookup, and isRetryable() for retry
eligibility checks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@snipcodeit
Copy link
Owner Author

Testing Procedures

Quick Verification

# Module loads without errors
node -e "const m = require('./lib/agent-errors.cjs'); console.log('OK:', Object.keys(m.AGENT_FAILURE_TYPES).length, 'failure types');"

# All exports present
node -e "
const m = require('./lib/agent-errors.cjs');
const exports = ['AGENT_FAILURE_TYPES', 'SEVERITY_LEVELS', 'AgentFailureError', 'classifyAgentFailure', 'getRecoveryAction', 'isRetryable', 'compareSeverity', 'getFailureByCode'];
exports.forEach(e => console.log(e + ':', typeof m[e]));
"

Classification Tests

node -e "
const { classifyAgentFailure } = require('./lib/agent-errors.cjs');
const tests = [
  ['Agent timed out after 50 turns', 'timeout'],
  ['Invalid JSON in agent response', 'malformed-output'],
  ['Agent completed 2 of 5 tasks', 'partial-completion'],
  ['Artifact missing: PLAN.md not found', 'hallucination'],
  ['Permission denied writing to /src', 'permission-denied'],
  ['Random unknown error', null],
];
tests.forEach(([msg, expected]) => {
  const r = classifyAgentFailure({ message: msg });
  const actual = r ? r.type : null;
  console.log(actual === expected ? 'PASS' : 'FAIL', msg.substring(0, 40), '→', actual);
});
"

Error Class Tests

node -e "
const { AgentFailureError } = require('./lib/agent-errors.cjs');
const { MgwError } = require('./lib/errors.cjs');
const err = new AgentFailureError('test', { failureType: 'timeout', agentType: 'gsd-executor' });
console.log('instanceof MgwError:', err instanceof MgwError);
console.log('code:', err.code);
console.log('severity:', err.getSeverity());
console.log('retryable:', err.isRetryable());
"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to core library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Define agent failure taxonomy with structured error codes

1 participant