Skip to content

htekdev/agent-harness

Repository files navigation

@htekdev/agent-harness

Lightweight TypeScript library that wraps any LLM agent with guardrails, safety nets, observability, and human-in-the-loop checkpoints. Built around the CNCF's four pillars of platform control.

Why?

Building AI agents is the easy part. The hard engineering challenge is controlling them at scale — enforcing budgets, blocking dangerous output, retrying failures, and knowing when to escalate to a human.

agent-harness gives you that control layer in ~50 lines of config.

Quick Start

npm install @htekdev/agent-harness
import { AgentHarness } from '@htekdev/agent-harness';

const harness = new AgentHarness();

const result = await harness.run(
  async () => ({
    output: 'The capital of France is Paris.',
    confidence: 0.95,
  }),
  { input: 'What is the capital of France?' },
);

console.log(result.status);  // 'completed'
console.log(result.output);  // 'The capital of France is Paris.'
console.log(result.metrics); // { tokens: 0, cost: 0, durationMs: 1, retries: 0, ... }

Features

  • Multi-provider LLM support — OpenAI, Anthropic, GitHub Models, Copilot, or any custom provider
  • Context compaction — automatic conversation history management with sliding-window, summarize, and hybrid strategies
  • Guardrails — budget caps, timeouts, tool restrictions, output filtering via blocked patterns
  • Safety nets — retry with exponential backoff, fallback responses, transient-error classification
  • Human-in-the-loop — conditional escalation based on confidence scores or custom predicates
  • Enforcement hooks — pre/post policy validation, declarative compliance rules, audit logging
  • Observability — lifecycle hooks (beforeRun, afterRun, onError, onEscalate, onCompaction), structured metrics

Providers

All providers implement the LLMProvider interface:

interface LLMProvider {
  readonly name: string;
  chatCompletion(
    messages: ChatMessage[],
    options?: ChatCompletionOptions,
  ): Promise<ChatCompletion>;
}
Provider Factory Function Config Interface Auth
OpenAI createOpenAIProvider(config) OpenAIProviderConfig apiKey or OPENAI_API_KEY env
Anthropic createAnthropicProvider(config) AnthropicProviderConfig apiKey or ANTHROPIC_API_KEY env
GitHub Models createGitHubModelsProvider(config) GitHubModelsProviderConfig token, apiKey, or GITHUB_TOKEN env
Copilot createCopilotProvider(config?) CopilotProviderConfig Auto-resolved (see below)

OpenAI

import { AgentHarness, createOpenAIProvider } from '@htekdev/agent-harness';

const provider = createOpenAIProvider({
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
});

const harness = new AgentHarness({
  provider,
  guardrails: {
    maxCostPerRun: 0.05,
    maxDurationMs: 15_000,
    blockedPatterns: [/DROP TABLE/i, /DELETE FROM/i],
    maxOutputTokens: 2000,
  },
  safetyNets: {
    maxRetries: 2,
    fallbackResponse: 'I was unable to process your request safely.',
  },
});

const result = await harness.run(
  async (ctx) => {
    const completion = await provider.chatCompletion([
      { role: 'system', content: 'You are a helpful code reviewer.' },
      { role: 'user', content: ctx.input.input },
    ]);
    return { output: completion.content, confidence: 0.9 };
  },
  { input: 'Review this SQL query: SELECT * FROM users WHERE active = true' },
);

Anthropic

import { createAnthropicProvider } from '@htekdev/agent-harness';

const provider = createAnthropicProvider({
  model: 'claude-sonnet-4-20250514',
  // apiKey defaults to ANTHROPIC_API_KEY env
});

GitHub Models

import { createGitHubModelsProvider } from '@htekdev/agent-harness';

const provider = createGitHubModelsProvider({
  model: 'gpt-4o',
  // token defaults to GITHUB_TOKEN env
  organization: 'my-org', // optional, for org-attributed usage
});

Copilot (Zero-Config)

The Copilot provider resolves credentials automatically via a 6-source chain:

  1. COPILOT_GITHUB_TOKEN env
  2. GH_TOKEN env
  3. Copilot CLI keychain (Windows)
  4. Copilot SDK config files (hosts.json / apps.json)
  5. gh auth token (GitHub CLI)
  6. GITHUB_TOKEN env
import { createCopilotProvider } from '@htekdev/agent-harness';

// Zero config — credentials auto-resolved
const provider = createCopilotProvider();

// Or explicit
const provider = createCopilotProvider({
  model: 'gpt-4o',
  token: 'ghp_...',
});

Custom Provider

Implement the LLMProvider interface to use any backend:

import type { LLMProvider } from '@htekdev/agent-harness';

const myProvider: LLMProvider = {
  name: 'my-provider',
  async chatCompletion(messages, options) {
    // Call your LLM backend
    return {
      content: '...',
      model: 'my-model',
      usage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
      finishReason: 'stop',
    };
  },
};

The Four Pillars

The harness config maps directly to the CNCF's four pillars of platform control:

const harness = new AgentHarness({
  // PILLAR 1: Golden Paths — pre-approved configuration
  model: 'gpt-4o',

  // PILLAR 2: Guardrails — hard policy enforcement
  guardrails: {
    maxCostPerRun: 0.10,
    maxDurationMs: 30_000,
    allowedTools: ['search', 'calculator'],
    blockedPatterns: [/DROP\s+TABLE/i, /DELETE\s+FROM/i, /rm\s+-rf/i],
    maxOutputTokens: 2000,
  },

  // PILLAR 3: Safety Nets — error recovery
  safetyNets: {
    maxRetries: 3,
    retryDelayMs: 1000,
    retryBackoffMultiplier: 2,
    fallbackResponse: 'Agent could not complete this task safely. A human will review.',
    onError: (err, ctx) => {
      console.error(`[safety-net] Attempt ${ctx.attempt} failed: ${err.message}`);
    },
  },

  // PILLAR 4: Manual Review — human-in-the-loop
  review: {
    requireApproval: (result) => (result.confidence ?? 1) < 0.7,
    onEscalate: async (result) => {
      console.log('[review] Escalated:', result.output.substring(0, 100));
    },
    approvalTimeoutMs: 60_000,
  },
});
Pillar Config Section What It Controls
Golden Paths model, provider, maxTokens Pre-approved models and defaults
Guardrails guardrails Budget caps, timeouts, tool restrictions, blocked patterns
Safety Nets safetyNets Retry logic, backoff, fallback responses
Manual Review review Confidence-based escalation, approval timeouts

API Reference

new AgentHarness(config?)

Creates a harness instance. All config fields are optional.

interface HarnessConfig {
  provider?: LLMProvider;
  model?: string;
  maxTokens?: number;
  guardrails?: GuardrailConfig;
  safetyNets?: SafetyNetConfig;
  review?: ReviewConfig;
  enforcement?: EnforcementConfig;
  compaction?: CompactionConfig;
  hooks?: HookConfig;
}

harness.run(agentFn, input)

Executes an agent function through the full harness lifecycle:

Input → Pre-Enforcement → Pre-Run Guardrails → Context Compaction → Agent Execution (with timeout + retry) → Post-Run Guardrails → Post-Enforcement → Review Check → Output

const result = await harness.run<AgentOutput>(agentFn, input);

Parameters:

  • agentFn: (context: RunContext) => Promise<AgentOutput> — The agent function to wrap. Receives a RunContext with the input, attempt number, and accumulated metrics.
  • input: RunInput — The input containing the prompt string, optional message history, and metadata.
interface RunInput {
  input: string;
  messages?: ChatMessage[];
  metadata?: Record<string, unknown>;
}

Returns: Promise<HarnessResult>

interface HarnessResult {
  output: string;
  status: 'completed' | 'escalated' | 'rejected' | 'failed' | 'timeout';
  metrics: RunMetrics;
  violations?: string[];
  escalationReason?: string;
  error?: Error;
}

RunMetrics

interface RunMetrics {
  tokens: number;         // Total tokens consumed across all attempts
  cost: number;           // Estimated total cost (USD)
  durationMs: number;     // Wall-clock duration
  retries: number;        // Number of retry attempts
  compactionEvents: number;
  tokensSaved: number;    // Tokens reclaimed by compaction
}

GuardrailConfig

interface GuardrailConfig {
  maxCostPerRun?: number;      // Max USD per run
  maxDurationMs?: number;      // Max wall-clock ms
  allowedTools?: string[];     // Tool allowlist
  blockedPatterns?: RegExp[];  // Output rejection patterns
  maxOutputTokens?: number;    // Max output tokens
}

SafetyNetConfig

interface SafetyNetConfig {
  maxRetries?: number;              // Default: 3
  retryDelayMs?: number;            // Default: 1000
  retryBackoffMultiplier?: number;  // Default: 2
  fallbackResponse?: string;
  onError?: (error: Error, context: RunContext) => void | Promise<void>;
}

ReviewConfig

interface ReviewConfig {
  requireApproval?: (result: AgentOutput) => boolean | Promise<boolean>;
  onEscalate?: (result: AgentOutput, context: RunContext) => void | Promise<void>;
  approvalTimeoutMs?: number;
}

EnforcementConfig

interface EnforcementConfig {
  preEnforce?: (input: RunInput) => EnforcementResult | Promise<EnforcementResult>;
  postEnforce?: (result: AgentOutput, context: RunContext) => EnforcementResult | Promise<EnforcementResult>;
  complianceRules?: ComplianceRule[];
}

interface ComplianceRule {
  name: string;
  description?: string;
  check: (output: string) => boolean;  // true = compliant
}

HookConfig

interface HookConfig {
  beforeRun?: (input: RunInput) => void | Promise<void>;
  afterRun?: (result: HarnessResult) => void | Promise<void>;
  onError?: (error: Error, context: RunContext) => void | Promise<void>;
  onEscalate?: (result: AgentOutput, context: RunContext) => void | Promise<void>;
  onCompaction?: (event: CompactionEvent) => void | Promise<void>;
}

Context Compaction

When conversations grow long, the harness can automatically compact the message history to stay within the model's context budget.

interface CompactionConfig {
  strategy?: CompactionStrategy;
  maxContextTokens?: number;       // Default: 128,000
  compactionThreshold?: number;    // Default: 0.75 (fraction of max)
  preserveSystemPrompt?: boolean;
  preserveLastN?: number;          // Default: 4
  summaryModel?: string;           // Model for summary-based compaction
}

Strategies

Strategy Description Cost
none No compaction — messages are never trimmed Free
sliding-window Keeps system prompt + last N messages, drops the middle Free
summarize Uses an LLM to summarize dropped messages into a single summary message 1 LLM call
hybrid Auto-selects: sliding-window when slightly over budget, summarize when >90% of budget Adaptive

The hybrid strategy (default) automatically picks the cheapest effective approach: sliding-window when the context is slightly over the threshold, and summarization when it's significantly over (>90% of the token budget).

Exported Utilities

Beyond the core AgentHarness class, the library exports granular building blocks:

// Guardrails
export { checkPreRunGuardrails, checkPostRunGuardrails, createTimeoutPromise, GuardrailViolation };

// Safety nets
export { executeWithRetry, applyFallback, isRetryableError };

// Review
export { checkReviewRequired, escalateForReview, ReviewTimeoutError };

// Enforcement
export { runPreEnforcement, runPostEnforcement, runComplianceRules, EnforcementError };

// Compaction
export { compactMessages, shouldCompact, countTokens, countMessageTokens };

// Metrics
export { MetricsCollector, estimateCost, MODEL_PRICING };

// Providers
export { createOpenAIProvider, createCopilotProvider, resolveCredential, ProviderError };

Testing

The project has four test suites:

# All tests
npm test

# Individual suites
npm run test:unit          # Unit tests
npm run test:integration   # Integration tests
npm run test:blackbox      # Black-box tests
npm run test:evals         # Evaluation tests

# Watch mode
npm run test:watch

# Coverage
npm run test:coverage

Requirements

  • Node.js ≥ 18.0.0
  • TypeScript ≥ 5.7

License

MIT — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published