@htekdev/agent-harness

Lightweight TypeScript library that wraps any LLM agent with guardrails, safety nets, observability, and human-in-the-loop checkpoints. Built around the CNCF's four pillars of platform control.

Why?

Building AI agents is the easy part. The hard engineering challenge is controlling them at scale — enforcing budgets, blocking dangerous output, retrying failures, and knowing when to escalate to a human.

agent-harness gives you that control layer in ~50 lines of config.

Quick Start

npm install @htekdev/agent-harness

import { AgentHarness } from '@htekdev/agent-harness';

const harness = new AgentHarness();

const result = await harness.run(
  async () => ({
    output: 'The capital of France is Paris.',
    confidence: 0.95,
  }),
  { input: 'What is the capital of France?' },
);

console.log(result.status);  // 'completed'
console.log(result.output);  // 'The capital of France is Paris.'
console.log(result.metrics); // { tokens: 0, cost: 0, durationMs: 1, retries: 0, ... }

Features

Multi-provider LLM support — OpenAI, Anthropic, GitHub Models, Copilot, or any custom provider
Context compaction — automatic conversation history management with sliding-window, summarize, and hybrid strategies
Guardrails — budget caps, timeouts, tool restrictions, output filtering via blocked patterns
Safety nets — retry with exponential backoff, fallback responses, transient-error classification
Human-in-the-loop — conditional escalation based on confidence scores or custom predicates
Enforcement hooks — pre/post policy validation, declarative compliance rules, audit logging
Observability — lifecycle hooks (beforeRun, afterRun, onError, onEscalate, onCompaction), structured metrics

Providers

All providers implement the LLMProvider interface:

interface LLMProvider {
  readonly name: string;
  chatCompletion(
    messages: ChatMessage[],
    options?: ChatCompletionOptions,
  ): Promise<ChatCompletion>;
}

Provider	Factory Function	Config Interface	Auth
OpenAI	`createOpenAIProvider(config)`	`OpenAIProviderConfig`	`apiKey` or `OPENAI_API_KEY` env
Anthropic	`createAnthropicProvider(config)`	`AnthropicProviderConfig`	`apiKey` or `ANTHROPIC_API_KEY` env
GitHub Models	`createGitHubModelsProvider(config)`	`GitHubModelsProviderConfig`	`token`, `apiKey`, or `GITHUB_TOKEN` env
Copilot	`createCopilotProvider(config?)`	`CopilotProviderConfig`	Auto-resolved (see below)

OpenAI

import { AgentHarness, createOpenAIProvider } from '@htekdev/agent-harness';

const provider = createOpenAIProvider({
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
});

const harness = new AgentHarness({
  provider,
  guardrails: {
    maxCostPerRun: 0.05,
    maxDurationMs: 15_000,
    blockedPatterns: [/DROP TABLE/i, /DELETE FROM/i],
    maxOutputTokens: 2000,
  },
  safetyNets: {
    maxRetries: 2,
    fallbackResponse: 'I was unable to process your request safely.',
  },
});

const result = await harness.run(
  async (ctx) => {
    const completion = await provider.chatCompletion([
      { role: 'system', content: 'You are a helpful code reviewer.' },
      { role: 'user', content: ctx.input.input },
    ]);
    return { output: completion.content, confidence: 0.9 };
  },
  { input: 'Review this SQL query: SELECT * FROM users WHERE active = true' },
);

Anthropic

import { createAnthropicProvider } from '@htekdev/agent-harness';

const provider = createAnthropicProvider({
  model: 'claude-sonnet-4-20250514',
  // apiKey defaults to ANTHROPIC_API_KEY env
});

GitHub Models

import { createGitHubModelsProvider } from '@htekdev/agent-harness';

const provider = createGitHubModelsProvider({
  model: 'gpt-4o',
  // token defaults to GITHUB_TOKEN env
  organization: 'my-org', // optional, for org-attributed usage
});

Copilot (Zero-Config)

The Copilot provider resolves credentials automatically via a 6-source chain:

COPILOT_GITHUB_TOKEN env
GH_TOKEN env
Copilot CLI keychain (Windows)
Copilot SDK config files (hosts.json / apps.json)
gh auth token (GitHub CLI)
GITHUB_TOKEN env

import { createCopilotProvider } from '@htekdev/agent-harness';

// Zero config — credentials auto-resolved
const provider = createCopilotProvider();

// Or explicit
const provider = createCopilotProvider({
  model: 'gpt-4o',
  token: 'ghp_...',
});

Custom Provider

Implement the LLMProvider interface to use any backend:

import type { LLMProvider } from '@htekdev/agent-harness';

const myProvider: LLMProvider = {
  name: 'my-provider',
  async chatCompletion(messages, options) {
    // Call your LLM backend
    return {
      content: '...',
      model: 'my-model',
      usage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
      finishReason: 'stop',
    };
  },
};

The Four Pillars

The harness config maps directly to the CNCF's four pillars of platform control:

const harness = new AgentHarness({
  // PILLAR 1: Golden Paths — pre-approved configuration
  model: 'gpt-4o',

  // PILLAR 2: Guardrails — hard policy enforcement
  guardrails: {
    maxCostPerRun: 0.10,
    maxDurationMs: 30_000,
    allowedTools: ['search', 'calculator'],
    blockedPatterns: [/DROP\s+TABLE/i, /DELETE\s+FROM/i, /rm\s+-rf/i],
    maxOutputTokens: 2000,
  },

  // PILLAR 3: Safety Nets — error recovery
  safetyNets: {
    maxRetries: 3,
    retryDelayMs: 1000,
    retryBackoffMultiplier: 2,
    fallbackResponse: 'Agent could not complete this task safely. A human will review.',
    onError: (err, ctx) => {
      console.error(`[safety-net] Attempt ${ctx.attempt} failed: ${err.message}`);
    },
  },

  // PILLAR 4: Manual Review — human-in-the-loop
  review: {
    requireApproval: (result) => (result.confidence ?? 1) < 0.7,
    onEscalate: async (result) => {
      console.log('[review] Escalated:', result.output.substring(0, 100));
    },
    approvalTimeoutMs: 60_000,
  },
});

Pillar	Config Section	What It Controls
Golden Paths	`model`, `provider`, `maxTokens`	Pre-approved models and defaults
Guardrails	`guardrails`	Budget caps, timeouts, tool restrictions, blocked patterns
Safety Nets	`safetyNets`	Retry logic, backoff, fallback responses
Manual Review	`review`	Confidence-based escalation, approval timeouts

API Reference

`new AgentHarness(config?)`

Creates a harness instance. All config fields are optional.

interface HarnessConfig {
  provider?: LLMProvider;
  model?: string;
  maxTokens?: number;
  guardrails?: GuardrailConfig;
  safetyNets?: SafetyNetConfig;
  review?: ReviewConfig;
  enforcement?: EnforcementConfig;
  compaction?: CompactionConfig;
  hooks?: HookConfig;
}

`harness.run(agentFn, input)`

Executes an agent function through the full harness lifecycle:

Input → Pre-Enforcement → Pre-Run Guardrails → Context Compaction → Agent Execution (with timeout + retry) → Post-Run Guardrails → Post-Enforcement → Review Check → Output

const result = await harness.run<AgentOutput>(agentFn, input);

Parameters:

agentFn: (context: RunContext) => Promise<AgentOutput> — The agent function to wrap. Receives a RunContext with the input, attempt number, and accumulated metrics.
input: RunInput — The input containing the prompt string, optional message history, and metadata.

interface RunInput {
  input: string;
  messages?: ChatMessage[];
  metadata?: Record<string, unknown>;
}

Returns: Promise<HarnessResult>

interface HarnessResult {
  output: string;
  status: 'completed' | 'escalated' | 'rejected' | 'failed' | 'timeout';
  metrics: RunMetrics;
  violations?: string[];
  escalationReason?: string;
  error?: Error;
}

`RunMetrics`

interface RunMetrics {
  tokens: number;         // Total tokens consumed across all attempts
  cost: number;           // Estimated total cost (USD)
  durationMs: number;     // Wall-clock duration
  retries: number;        // Number of retry attempts
  compactionEvents: number;
  tokensSaved: number;    // Tokens reclaimed by compaction
}

`GuardrailConfig`

interface GuardrailConfig {
  maxCostPerRun?: number;      // Max USD per run
  maxDurationMs?: number;      // Max wall-clock ms
  allowedTools?: string[];     // Tool allowlist
  blockedPatterns?: RegExp[];  // Output rejection patterns
  maxOutputTokens?: number;    // Max output tokens
}

`SafetyNetConfig`

interface SafetyNetConfig {
  maxRetries?: number;              // Default: 3
  retryDelayMs?: number;            // Default: 1000
  retryBackoffMultiplier?: number;  // Default: 2
  fallbackResponse?: string;
  onError?: (error: Error, context: RunContext) => void | Promise<void>;
}

`ReviewConfig`

interface ReviewConfig {
  requireApproval?: (result: AgentOutput) => boolean | Promise<boolean>;
  onEscalate?: (result: AgentOutput, context: RunContext) => void | Promise<void>;
  approvalTimeoutMs?: number;
}

`EnforcementConfig`

interface EnforcementConfig {
  preEnforce?: (input: RunInput) => EnforcementResult | Promise<EnforcementResult>;
  postEnforce?: (result: AgentOutput, context: RunContext) => EnforcementResult | Promise<EnforcementResult>;
  complianceRules?: ComplianceRule[];
}

interface ComplianceRule {
  name: string;
  description?: string;
  check: (output: string) => boolean;  // true = compliant
}

`HookConfig`

interface HookConfig {
  beforeRun?: (input: RunInput) => void | Promise<void>;
  afterRun?: (result: HarnessResult) => void | Promise<void>;
  onError?: (error: Error, context: RunContext) => void | Promise<void>;
  onEscalate?: (result: AgentOutput, context: RunContext) => void | Promise<void>;
  onCompaction?: (event: CompactionEvent) => void | Promise<void>;
}

Context Compaction

When conversations grow long, the harness can automatically compact the message history to stay within the model's context budget.

interface CompactionConfig {
  strategy?: CompactionStrategy;
  maxContextTokens?: number;       // Default: 128,000
  compactionThreshold?: number;    // Default: 0.75 (fraction of max)
  preserveSystemPrompt?: boolean;
  preserveLastN?: number;          // Default: 4
  summaryModel?: string;           // Model for summary-based compaction
}

Strategies

Strategy	Description	Cost
`none`	No compaction — messages are never trimmed	Free
`sliding-window`	Keeps system prompt + last N messages, drops the middle	Free
`summarize`	Uses an LLM to summarize dropped messages into a single summary message	1 LLM call
`hybrid`	Auto-selects: sliding-window when slightly over budget, summarize when >90% of budget	Adaptive

The hybrid strategy (default) automatically picks the cheapest effective approach: sliding-window when the context is slightly over the threshold, and summarization when it's significantly over (>90% of the token budget).

Exported Utilities

Beyond the core AgentHarness class, the library exports granular building blocks:

// Guardrails
export { checkPreRunGuardrails, checkPostRunGuardrails, createTimeoutPromise, GuardrailViolation };

// Safety nets
export { executeWithRetry, applyFallback, isRetryableError };

// Review
export { checkReviewRequired, escalateForReview, ReviewTimeoutError };

// Enforcement
export { runPreEnforcement, runPostEnforcement, runComplianceRules, EnforcementError };

// Compaction
export { compactMessages, shouldCompact, countTokens, countMessageTokens };

// Metrics
export { MetricsCollector, estimateCost, MODEL_PRICING };

// Providers
export { createOpenAIProvider, createCopilotProvider, resolveCredential, ProviderError };

Testing

The project has four test suites:

# All tests
npm test

# Individual suites
npm run test:unit          # Unit tests
npm run test:integration   # Integration tests
npm run test:blackbox      # Black-box tests
npm run test:evals         # Evaluation tests

# Watch mode
npm run test:watch

# Coverage
npm run test:coverage

Requirements

Node.js ≥ 18.0.0
TypeScript ≥ 5.7

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
examples		examples
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@htekdev/agent-harness

Why?

Quick Start

Features

Providers

OpenAI

Anthropic

GitHub Models

Copilot (Zero-Config)

Custom Provider

The Four Pillars

API Reference

`new AgentHarness(config?)`

`harness.run(agentFn, input)`

`RunMetrics`

`GuardrailConfig`

`SafetyNetConfig`

`ReviewConfig`

`EnforcementConfig`

`HookConfig`

Context Compaction

Strategies

Exported Utilities

Testing

Requirements

License

About

Uh oh!

Releases

Packages

Languages

License

htekdev/agent-harness

Folders and files

Latest commit

History

Repository files navigation

@htekdev/agent-harness

Why?

Quick Start

Features

Providers

OpenAI

Anthropic

GitHub Models

Copilot (Zero-Config)

Custom Provider

The Four Pillars

API Reference

new AgentHarness(config?)

harness.run(agentFn, input)

RunMetrics

GuardrailConfig

SafetyNetConfig

ReviewConfig

EnforcementConfig

HookConfig

Context Compaction

Strategies

Exported Utilities

Testing

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`new AgentHarness(config?)`

`harness.run(agentFn, input)`

`RunMetrics`

`GuardrailConfig`

`SafetyNetConfig`

`ReviewConfig`

`EnforcementConfig`

`HookConfig`

Packages