A minimal agent SDK for TypeScript. Three primitives, zero opinions on your LLM provider.
npm install @elfenlabs/cog
import { createContext, createTool, createOpenAIProvider, runAgent } from '@elfenlabs/cog'
// Define a tool
const getWeather = createTool({
id: 'get_weather',
description: 'Get the current weather for a city',
schema: {
city: { type: 'string', description: 'The city name' },
},
execute: async (args) => {
const { city } = args as { city: string }
return { city, temp: 22, condition: 'sunny' }
},
})
// Create context and provider
const ctx = createContext()
ctx.push("What's the weather in Tokyo?")
const provider = createOpenAIProvider('https://api.openai.com', 'gpt-4o', {
apiKey: process.env.OPENAI_API_KEY,
})
// Run the agent
const result = await runAgent({
ctx,
provider,
instruction: 'You are a helpful assistant. Use tools when needed.',
tools: [getWeather],
})
console.log(result.response) // "The weather in Tokyo is 22Β°C and sunny."
console.log(result.steps) // 2
console.log(result.usage) // { promptTokens, completionTokens, totalTokens }
| Primitive | What it is |
|---|---|
| Context | Append-only message chain. You push messages in, the agent loop reads them out. |
| Tool | Schema + execute function. The agent calls tools automatically based on model output. |
| Agent | The loop. Calls the provider, executes tool calls, repeats until the model responds with text. |
An ordered Message[] chain. Push strings (become user messages) or full Message objects.
import { createContext } from '@elfenlabs/cog'
const ctx = createContext()
// Strings become user messages
ctx.push('What is 2 + 2?')
// Full messages for other roles
ctx.push({ role: 'system', content: 'You are a math tutor.' })
// Read messages
ctx.messages // readonly Message[]
// Serialize / restore
const snapshot = ctx.serialize()
const restored = createContext({ from: snapshot })
type Message = {
role: 'system' | 'user' | 'assistant' | 'tool'
content: string
reasoning?: string // chain-of-thought from reasoning models
toolCallId?: string // links tool results back to the call
toolCalls?: ToolCallRequest[] // tool calls requested by the model
pinned?: boolean // protected from eviction (see Context Window Management)
}
A tool is an id, a description, a schema, and an execute function.
import { createTool } from '@elfenlabs/cog'
const calculator = createTool({
id: 'calculator',
description: 'Evaluate a math expression',
schema: {
expression: { type: 'string', description: 'The expression to evaluate', required: true },
},
execute: async (args) => {
const { expression } = args as { expression: string }
return { result: eval(expression) }
},
})
// The .spec property gives you the wire format for provider APIs
calculator.spec // { name, description, parameters }
type ToolParameter = {
type: 'string' | 'number' | 'boolean' | 'object' | 'array'
description: string
required?: boolean // default: true
properties?: Record<string, ToolParameter> // for 'object'
items?: ToolParameter // for 'array'
enum?: string[] // for 'string'
}
runAgent calls the provider in a loop, executing tool calls until the model responds with text only.
import { runAgent } from '@elfenlabs/cog'
const result = await runAgent({
ctx, // Context β the conversation so far
provider, // Provider β any LLM backend
instruction: '...', // system prompt (prepended to every call)
tools: [tool1, tool2], // available tools
// Limits
maxSteps: 50, // default: 50
signal: abortController.signal,
// Context window management (see below)
maxContextTokens: 100_000,
evictionStrategy: new SlidingWindowStrategy(),
tokenCounter: (text) => text.length / 4, // default
// Streaming callbacks
onThinkingStart: () => {},
onThinking: (chunk) => {}, // reasoning tokens (dim/hidden)
onThinkingEnd: () => {},
onOutputStart: () => {},
onOutput: (chunk) => {}, // content tokens (visible)
onOutputEnd: () => {},
// Tool lifecycle hooks
onBeforeToolCall: async (tool, args) => {
// return false to block the call
},
onAfterToolCall: (tool, args, result) => {},
})
result.response // final text response
result.steps // number of provider calls made
result.usage // { promptTokens, completionTokens, totalTokens }
βββββββββββββββββββββββββββββββββββββββββββββββ
β system prompt + ctx.messages β provider β
β β β
β ββ tool calls? βββββββββββββββββββββββββββ β
β β YES β execute tools β push results β β
β β β loop back to provider β β
β ββ text content? βββββββββββββββββββββββββ€ β
β β YES β push assistant message β return β β
β ββ reasoning only? ββββββββββββββββββββββ€ β
β β YES β push reasoning β loop β β
β ββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ
The Provider interface is a single method. Implement it for any LLM backend.
interface Provider {
generate(params: {
messages: Message[]
tools?: ToolSpec[]
signal?: AbortSignal
stream?: StreamCallbacks
}): Promise<GenerateResult>
}
type GenerateResult = {
content?: string
reasoning?: string
toolCalls?: ToolCallRequest[]
usage?: Usage
}
type StreamCallbacks = {
onReasoning?: (chunk: string) => void
onContent?: (chunk: string) => void
}
Works with OpenAI, vLLM, OpenRouter, Ollama, LiteLLM, and any OpenAI-compatible API. Supports streaming (SSE) with reasoning model support (reasoning_content).
import { createOpenAIProvider } from '@elfenlabs/cog'
// OpenAI
const openai = createOpenAIProvider('https://api.openai.com', 'gpt-4o', {
apiKey: process.env.OPENAI_API_KEY,
})
// Local vLLM
const vllm = createOpenAIProvider('http://localhost:8000', 'my-model')
// OpenRouter
const openrouter = createOpenAIProvider('https://openrouter.ai/api', 'anthropic/claude-sonnet-4.5', {
apiKey: process.env.OPENROUTER_API_KEY,
temperature: 0.2,
})
Agents are just functions. Wrap runAgent inside a tool to create sub-agents with isolated context.
const searchOrders = createTool({
id: 'search_orders',
description: 'Search through paginated orders to find a match',
schema: {
query: { type: 'string', description: 'What to search for' },
},
execute: async (args) => {
const { query } = args as { query: string }
// Sub-agent gets its own isolated context
const subCtx = createContext()
subCtx.push(`Find: ${query}`)
const fetchPage = createTool({
id: 'fetch_page',
description: 'Fetch a page of orders',
schema: { page: { type: 'number', description: 'Page number' } },
execute: async (a) => api.getOrders((a as { page: number }).page),
})
const result = await runAgent({
ctx: subCtx,
provider,
instruction: 'Search through pages until you find the item or exhaust all pages.',
tools: [fetchPage],
maxSteps: 20,
})
// Only the final answer bubbles up β no pagination noise in parent context
return result.response
},
})
// Parent agent uses the sub-agent as a regular tool
const result = await runAgent({
ctx: createContext(),
provider,
instruction: 'Use search_orders to look up order information.',
tools: [searchOrders],
})
LLM APIs have context limits. When the conversation exceeds the limit, some providers return a 400 error β others silently truncate from the beginning, evicting your system prompt first. Cog prevents this with automatic compaction.
Pass an evictionStrategy to runAgent and Cog will compact the context before every generate() call:
import { runAgent, SlidingWindowStrategy } from '@elfenlabs/cog'
const result = await runAgent({
ctx,
provider,
instruction: 'You are a helpful assistant.',
tools: [myTool],
maxContextTokens: 100_000, // token budget (default: 100k)
evictionStrategy: new SlidingWindowStrategy(), // enable auto-compaction
})
SlidingWindowStrategy evicts the oldest non-pinned messages first. Tool call groups (assistant message + tool results) are always evicted as a unit to maintain structural integrity.
Pin critical messages to protect them from eviction:
const ctx = createContext()
ctx.push({ role: 'user', content: 'Project spec: build a CLI tool that...' })
ctx.pin(-1) // protect from eviction (-1 = last pushed)
ctx.push({ role: 'user', content: 'Also, here are the requirements...' })
ctx.pin(-1)
// Later, if needed:
ctx.unpin(0) // remove protection
Pinned messages are never evicted. The system prompt (passed as instruction) and tool definitions are always protected automatically β they're budgeted as fixed costs.
The default token estimator uses text.length / 4 (~3.5β4 chars per token for English). For precise counting:
import { encode } from 'tiktoken'
const result = await runAgent({
// ...
tokenCounter: (text) => encode(text).length,
})
Strategies can also be called directly β by the host app, a tool, or any caller:
import { SlidingWindowStrategy } from '@elfenlabs/cog'
const strategy = new SlidingWindowStrategy()
const tokenCounter = (text: string) => text.length / 4
// Proactive compaction at 50% to fight context rot
strategy.compact(ctx, maxTokens * 0.5, tokenCounter)
Implement the EvictionStrategy interface for custom behavior:
import type { EvictionStrategy, TokenCounter } from '@elfenlabs/cog'
import type { Context } from '@elfenlabs/cog'
class SummarizingStrategy implements EvictionStrategy {
compact(ctx: Context, budgetTokens: number, tokenCounter: TokenCounter): void {
// Your logic: summarize old messages, evict, push summary, etc.
}
}
import { MaxStepsError, AgentAbortError, ContextBudgetError } from '@elfenlabs/cog'
try {
await runAgent({ ctx, provider, instruction: '...', tools, maxSteps: 10 })
} catch (err) {
if (err instanceof MaxStepsError) {
// Agent exceeded step limit
}
if (err instanceof AgentAbortError) {
// AbortSignal was triggered
}
if (err instanceof ContextBudgetError) {
// Fixed context (system prompt + tools + pinned) exceeds maxContextTokens
}
}
Unknown tool calls and tool execution errors are automatically caught and fed back to the model as tool messages, letting it recover gracefully.
MIT