Lightweight TypeScript library that wraps any LLM agent with guardrails, safety nets, observability, and human-in-the-loop checkpoints. Built around the CNCF's four pillars of platform control.
Building AI agents is the easy part. The hard engineering challenge is controlling them at scale — enforcing budgets, blocking dangerous output, retrying failures, and knowing when to escalate to a human.
agent-harness gives you that control layer in ~50 lines of config.
npm install @htekdev/agent-harnessimport { AgentHarness } from '@htekdev/agent-harness';
const harness = new AgentHarness();
const result = await harness.run(
async () => ({
output: 'The capital of France is Paris.',
confidence: 0.95,
}),
{ input: 'What is the capital of France?' },
);
console.log(result.status); // 'completed'
console.log(result.output); // 'The capital of France is Paris.'
console.log(result.metrics); // { tokens: 0, cost: 0, durationMs: 1, retries: 0, ... }- Multi-provider LLM support — OpenAI, Anthropic, GitHub Models, Copilot, or any custom provider
- Context compaction — automatic conversation history management with sliding-window, summarize, and hybrid strategies
- Guardrails — budget caps, timeouts, tool restrictions, output filtering via blocked patterns
- Safety nets — retry with exponential backoff, fallback responses, transient-error classification
- Human-in-the-loop — conditional escalation based on confidence scores or custom predicates
- Enforcement hooks — pre/post policy validation, declarative compliance rules, audit logging
- Observability — lifecycle hooks (
beforeRun,afterRun,onError,onEscalate,onCompaction), structured metrics
All providers implement the LLMProvider interface:
interface LLMProvider {
readonly name: string;
chatCompletion(
messages: ChatMessage[],
options?: ChatCompletionOptions,
): Promise<ChatCompletion>;
}| Provider | Factory Function | Config Interface | Auth |
|---|---|---|---|
| OpenAI | createOpenAIProvider(config) |
OpenAIProviderConfig |
apiKey or OPENAI_API_KEY env |
| Anthropic | createAnthropicProvider(config) |
AnthropicProviderConfig |
apiKey or ANTHROPIC_API_KEY env |
| GitHub Models | createGitHubModelsProvider(config) |
GitHubModelsProviderConfig |
token, apiKey, or GITHUB_TOKEN env |
| Copilot | createCopilotProvider(config?) |
CopilotProviderConfig |
Auto-resolved (see below) |
import { AgentHarness, createOpenAIProvider } from '@htekdev/agent-harness';
const provider = createOpenAIProvider({
model: 'gpt-4o',
apiKey: process.env.OPENAI_API_KEY,
});
const harness = new AgentHarness({
provider,
guardrails: {
maxCostPerRun: 0.05,
maxDurationMs: 15_000,
blockedPatterns: [/DROP TABLE/i, /DELETE FROM/i],
maxOutputTokens: 2000,
},
safetyNets: {
maxRetries: 2,
fallbackResponse: 'I was unable to process your request safely.',
},
});
const result = await harness.run(
async (ctx) => {
const completion = await provider.chatCompletion([
{ role: 'system', content: 'You are a helpful code reviewer.' },
{ role: 'user', content: ctx.input.input },
]);
return { output: completion.content, confidence: 0.9 };
},
{ input: 'Review this SQL query: SELECT * FROM users WHERE active = true' },
);import { createAnthropicProvider } from '@htekdev/agent-harness';
const provider = createAnthropicProvider({
model: 'claude-sonnet-4-20250514',
// apiKey defaults to ANTHROPIC_API_KEY env
});import { createGitHubModelsProvider } from '@htekdev/agent-harness';
const provider = createGitHubModelsProvider({
model: 'gpt-4o',
// token defaults to GITHUB_TOKEN env
organization: 'my-org', // optional, for org-attributed usage
});The Copilot provider resolves credentials automatically via a 6-source chain:
COPILOT_GITHUB_TOKENenvGH_TOKENenv- Copilot CLI keychain (Windows)
- Copilot SDK config files (
hosts.json/apps.json) gh auth token(GitHub CLI)GITHUB_TOKENenv
import { createCopilotProvider } from '@htekdev/agent-harness';
// Zero config — credentials auto-resolved
const provider = createCopilotProvider();
// Or explicit
const provider = createCopilotProvider({
model: 'gpt-4o',
token: 'ghp_...',
});Implement the LLMProvider interface to use any backend:
import type { LLMProvider } from '@htekdev/agent-harness';
const myProvider: LLMProvider = {
name: 'my-provider',
async chatCompletion(messages, options) {
// Call your LLM backend
return {
content: '...',
model: 'my-model',
usage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
finishReason: 'stop',
};
},
};The harness config maps directly to the CNCF's four pillars of platform control:
const harness = new AgentHarness({
// PILLAR 1: Golden Paths — pre-approved configuration
model: 'gpt-4o',
// PILLAR 2: Guardrails — hard policy enforcement
guardrails: {
maxCostPerRun: 0.10,
maxDurationMs: 30_000,
allowedTools: ['search', 'calculator'],
blockedPatterns: [/DROP\s+TABLE/i, /DELETE\s+FROM/i, /rm\s+-rf/i],
maxOutputTokens: 2000,
},
// PILLAR 3: Safety Nets — error recovery
safetyNets: {
maxRetries: 3,
retryDelayMs: 1000,
retryBackoffMultiplier: 2,
fallbackResponse: 'Agent could not complete this task safely. A human will review.',
onError: (err, ctx) => {
console.error(`[safety-net] Attempt ${ctx.attempt} failed: ${err.message}`);
},
},
// PILLAR 4: Manual Review — human-in-the-loop
review: {
requireApproval: (result) => (result.confidence ?? 1) < 0.7,
onEscalate: async (result) => {
console.log('[review] Escalated:', result.output.substring(0, 100));
},
approvalTimeoutMs: 60_000,
},
});| Pillar | Config Section | What It Controls |
|---|---|---|
| Golden Paths | model, provider, maxTokens |
Pre-approved models and defaults |
| Guardrails | guardrails |
Budget caps, timeouts, tool restrictions, blocked patterns |
| Safety Nets | safetyNets |
Retry logic, backoff, fallback responses |
| Manual Review | review |
Confidence-based escalation, approval timeouts |
Creates a harness instance. All config fields are optional.
interface HarnessConfig {
provider?: LLMProvider;
model?: string;
maxTokens?: number;
guardrails?: GuardrailConfig;
safetyNets?: SafetyNetConfig;
review?: ReviewConfig;
enforcement?: EnforcementConfig;
compaction?: CompactionConfig;
hooks?: HookConfig;
}Executes an agent function through the full harness lifecycle:
Input → Pre-Enforcement → Pre-Run Guardrails → Context Compaction → Agent Execution (with timeout + retry) → Post-Run Guardrails → Post-Enforcement → Review Check → Output
const result = await harness.run<AgentOutput>(agentFn, input);Parameters:
agentFn: (context: RunContext) => Promise<AgentOutput>— The agent function to wrap. Receives aRunContextwith the input, attempt number, and accumulated metrics.input: RunInput— The input containing the prompt string, optional message history, and metadata.
interface RunInput {
input: string;
messages?: ChatMessage[];
metadata?: Record<string, unknown>;
}Returns: Promise<HarnessResult>
interface HarnessResult {
output: string;
status: 'completed' | 'escalated' | 'rejected' | 'failed' | 'timeout';
metrics: RunMetrics;
violations?: string[];
escalationReason?: string;
error?: Error;
}interface RunMetrics {
tokens: number; // Total tokens consumed across all attempts
cost: number; // Estimated total cost (USD)
durationMs: number; // Wall-clock duration
retries: number; // Number of retry attempts
compactionEvents: number;
tokensSaved: number; // Tokens reclaimed by compaction
}interface GuardrailConfig {
maxCostPerRun?: number; // Max USD per run
maxDurationMs?: number; // Max wall-clock ms
allowedTools?: string[]; // Tool allowlist
blockedPatterns?: RegExp[]; // Output rejection patterns
maxOutputTokens?: number; // Max output tokens
}interface SafetyNetConfig {
maxRetries?: number; // Default: 3
retryDelayMs?: number; // Default: 1000
retryBackoffMultiplier?: number; // Default: 2
fallbackResponse?: string;
onError?: (error: Error, context: RunContext) => void | Promise<void>;
}interface ReviewConfig {
requireApproval?: (result: AgentOutput) => boolean | Promise<boolean>;
onEscalate?: (result: AgentOutput, context: RunContext) => void | Promise<void>;
approvalTimeoutMs?: number;
}interface EnforcementConfig {
preEnforce?: (input: RunInput) => EnforcementResult | Promise<EnforcementResult>;
postEnforce?: (result: AgentOutput, context: RunContext) => EnforcementResult | Promise<EnforcementResult>;
complianceRules?: ComplianceRule[];
}
interface ComplianceRule {
name: string;
description?: string;
check: (output: string) => boolean; // true = compliant
}interface HookConfig {
beforeRun?: (input: RunInput) => void | Promise<void>;
afterRun?: (result: HarnessResult) => void | Promise<void>;
onError?: (error: Error, context: RunContext) => void | Promise<void>;
onEscalate?: (result: AgentOutput, context: RunContext) => void | Promise<void>;
onCompaction?: (event: CompactionEvent) => void | Promise<void>;
}When conversations grow long, the harness can automatically compact the message history to stay within the model's context budget.
interface CompactionConfig {
strategy?: CompactionStrategy;
maxContextTokens?: number; // Default: 128,000
compactionThreshold?: number; // Default: 0.75 (fraction of max)
preserveSystemPrompt?: boolean;
preserveLastN?: number; // Default: 4
summaryModel?: string; // Model for summary-based compaction
}| Strategy | Description | Cost |
|---|---|---|
none |
No compaction — messages are never trimmed | Free |
sliding-window |
Keeps system prompt + last N messages, drops the middle | Free |
summarize |
Uses an LLM to summarize dropped messages into a single summary message | 1 LLM call |
hybrid |
Auto-selects: sliding-window when slightly over budget, summarize when >90% of budget | Adaptive |
The hybrid strategy (default) automatically picks the cheapest effective approach: sliding-window when the context is slightly over the threshold, and summarization when it's significantly over (>90% of the token budget).
Beyond the core AgentHarness class, the library exports granular building blocks:
// Guardrails
export { checkPreRunGuardrails, checkPostRunGuardrails, createTimeoutPromise, GuardrailViolation };
// Safety nets
export { executeWithRetry, applyFallback, isRetryableError };
// Review
export { checkReviewRequired, escalateForReview, ReviewTimeoutError };
// Enforcement
export { runPreEnforcement, runPostEnforcement, runComplianceRules, EnforcementError };
// Compaction
export { compactMessages, shouldCompact, countTokens, countMessageTokens };
// Metrics
export { MetricsCollector, estimateCost, MODEL_PRICING };
// Providers
export { createOpenAIProvider, createCopilotProvider, resolveCredential, ProviderError };The project has four test suites:
# All tests
npm test
# Individual suites
npm run test:unit # Unit tests
npm run test:integration # Integration tests
npm run test:blackbox # Black-box tests
npm run test:evals # Evaluation tests
# Watch mode
npm run test:watch
# Coverage
npm run test:coverage- Node.js ≥ 18.0.0
- TypeScript ≥ 5.7
MIT — see LICENSE.