msm-agent is a portable AI agent runtime. Write one file describing who your agent is, run one command, and get a production-ready AI agent with an HTTP API, WhatsApp integration, semantic memory, and a self-improving feedback loop — no framework knowledge required.
npm install msm-agentThe agent is the hands — it receives events, asks the brain what to do, executes tools, feeds results back, and delivers responses. The brain (LLM) only decides; it never executes. This separation is what makes the runtime safe, testable, and independently deployable.
A product manager writes support-agent.md in 10 minutes.
A developer runs docker run -v ./support-agent.md msm-agent.
Done.
- The Agent Definition File
- Quick Start
- Architecture
- How It Works — The Execution Loop
- The 5 Adapter Interfaces
- Brain Integration
- Production Adapters
- Equipment — Connected External Systems
- Skills — Reusable In-Process Tool Packs
- Pre-Processing Gates
- Quality Scoring and Self-Improvement
- Arabic-Native Routing
- Sovereign Deployment — Zero Cloud
- Deeper Evolving Layer — Signal Decay & Contradiction Detection
- Streaming Responses (SSE)
- Episodic Memory
- Distributed Session Locking
- Jobs and Missions
- MCP Server
- Running as a Microservice — full guide →
- HTTP API Reference — full reference →
- Ops Dashboard — details →
- Configuration Reference — full options →
- Guard System — reference →
- Testing
- License
An agent is defined in a single .md file. No YAML, no code, no configuration objects. The runtime parses the file and compiles it into a validated configuration.
# Support Agent
Domain: E-commerce customer support
Language: Arabic and English
## Persona
Name: Nour
Style: warm, direct, solution-focused
## Capabilities
- answer product questions
- check order status
- create support tickets
- escalate billing disputes to human
## Brain
provider: openai
model: gpt-4o-mini
## Limits
maxIterations: 6
confidenceThreshold: 0.7
costCapPerTask: 0.05
## Hours
Timezone: Asia/Qatar
Mon-Fri: 09:00-18:00
Sat: 10:00-14:00
Message: We are currently closed. We'll respond first thing in the morning.
## Skills
- booking
- payments
## Equipment
connectors:
- type: shopify
operations: [orders.list, customers.get]
access: read
endpoint: ${SHOPIFY_ENDPOINT}
credentials:
type: api_key
value: ${SHOPIFY_API_KEY}
dedicatedTools: [generate_quote, escalate_to_human]The runtime compiles this into your agent. Every section is optional. You can start with just a name, a persona, and a brain — and add capabilities incrementally.
If you prefer a more structured, machine-queryable format, the same definition works as an IntentText .it file. The runtime auto-detects the format.
# Write your agent definition (see section 1)
cat > support-agent.md << 'EOF'
# Support Agent
Domain: Customer support
Language: English
## Persona
Name: Alex
Style: helpful and direct
## Brain
provider: openai
model: gpt-4o-mini
EOF
# Run
docker run \
-e AGENT_FILE=/agent/support-agent.md \
-e OPENAI_API_KEY=sk-... \
-v ./support-agent.md:/agent/support-agent.md:ro \
-p 3000:3000 \
msm-agent
# Talk to it
curl -X POST http://localhost:3000/chat \
-H "Content-Type: application/json" \
-d '{"message": "What are your business hours?"}'import {
createAgent,
loadAgent,
buildBrain,
InMemoryAdapter,
MockToolAdapter,
ManualEventAdapter,
ConsoleDeliveryAdapter,
} from "msm-agent";
// Load the definition file
const def = await loadAgent("./support-agent.md");
// Create the agent
const agent = createAgent({
brain: buildBrain(def), // reads OPENAI_API_KEY from env
memory: new InMemoryAdapter(),
tools: new MockToolAdapter(),
events: new ManualEventAdapter(),
delivery: new ConsoleDeliveryAdapter(),
config: def.config,
});
// Handle an event
const outcome = await agent.handleEvent({
type: "user_message",
sessionId: "session-1",
text: "What is the status of my order?",
modality: "text",
});
console.log(outcome.type); // "response" | "clarification" | "escalated" | ...If you use msm-ai as your brain (the 6-layer prompt pipeline):
import { wrapMSM } from "msm-agent/bridge/msm";
import { createPipeline } from "msm-ai";
const pipeline = await createPipeline("./support.yaml");
const brain = wrapMSM(pipeline);
const agent = createAgent({ brain, ...adapters });┌───────────────────────────────────────────────────────────────────┐
│ AGENT DEFINITION FILE (support-agent.md or support-agent.it) │
│ │
│ Persona · Capabilities · Brain · Limits · Hours · │
│ Skills · Equipment · Memory rules │
└─────────────────────────────┬─────────────────────────────────────┘
│ loadAgent()
▼
┌───────────────────────────────────────────────────────────────────┐
│ msm-agent runtime │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Pre-Processing Gates (zero LLM cost) │ │
│ │ Acknowledgement gate · Business hours gate │ │
│ └──────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼──────────────────────────────────┐ │
│ │ Execution Loop │ │
│ │ │ │
│ │ event → context builder → brain → guards → dispatch: │ │
│ │ respond / escalate / clarify / delegate → deliver │ │
│ │ use_tool → validate → dedup → execute → loop │ │
│ │ │ │
│ │ + session mutex (prevents race conditions) │ │
│ │ + pre-hook (fast-intent short-circuit) │ │
│ │ + plan tracking (create / advance / replan / freestyle) │ │
│ │ + control bus (kill / pause / disable per iteration) │ │
│ │ + tool dedup (same call → cached result) │ │
│ │ + strict tool validation (abort on bad reasoning) │ │
│ │ + flush gate (buffered async writes) │ │
│ └──────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼──────────────────────────────────┐ │
│ │ Quality Scoring (zero LLM cost) │ │
│ │ scoreOutcome() → resolution · efficiency · error rate │ │
│ └──────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼──────────────────────────────────┐ │
│ │ Evolving Layer │ │
│ │ postOutcome() writes · preReason() injects hints │ │
│ │ refreshStrategies() computes improvement notes │ │
│ └──────────────────────────┬──────────────────────────────────┘ │
└─────────────────────────────┼─────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌────────────────┐ ┌────────────────────┐
│ MemoryAdapter │ │ ToolAdapter │ │ ControlBusAdapter │
│ SQLite/Mongo/ │ │ Equipment / │ │ Redis / in-memory │
│ Postgres/Neo4j│ │ Skills / Mock │ └────────────────────┘
└───────────────┘ └────────────────┘
▲ ▲
┌───────┴────────┐ ┌───────┴─────────┐
│ EventAdapter │ │ DeliveryAdapter │
│ WhatsApp / │ │ WhatsApp / │
│ BullMQ / │ │ Console / │
│ Manual │ │ Custom │
└────────────────┘ └─────────────────┘
│
┌────────┴────────┐
│ Brain │
│ OpenAI · │
│ Anthropic · │
│ Ollama · │
│ MSM Pipeline │
└─────────────────┘
The runtime sits between your event sources and your brain. It provides everything except the LLM call and your business logic — guards, planning, memory, tool execution, delivery, observability, and self-improvement all ship out of the box.
Every incoming event goes through this sequence:
0. [Session Lock] Acquire per-session mutex — prevents two events
from the same user running concurrently.
1. [Gates] Zero-LLM pre-processing checks:
- Acknowledgement: "ok", "thanks", "👍", "تمام"
→ suppressed (no brain call, no delivery)
- Business hours: outside configured schedule
→ canned closed-message (no brain call)
2. [Pre-Hook] Optional fast-intent gate — return an outcome directly
for trivial inputs (greetings, FAQs) to skip the loop.
3. [Control Bus] Per-iteration kill/pause check. Stops immediately
if the task was killed or tenant is paused.
4. [Typing] Send typing indicator via DeliveryAdapter (optional).
5. [Context] Build brain input:
- Conversation history (compacted if long)
- Task state: status, plan progress, recent failures
- Semantic memory: MemoryAdapter.search()
- Available tools catalog
- Equipment block (connected external systems)
- Evolving hints: [strategy] and [past approach] notes
- Tool results from previous iterations
6. [Brain] Call brain → orchestration decision.
7. [Plan] If brain returned a multi-step plan, track it.
8. [Guards] Evaluate all guard conditions:
- Confidence gate (below threshold → clarify)
- Iteration / cost / time budgets (hard limits)
- Repetition guard (3+ same tool → advisory signal)
- Dead-end guard (4+ failures across 2+ tools → advisory)
9. [Dispatch] Route on brain's action:
respond / complete → record → deliver → DONE
escalate → record → deliver → DONE
clarify → record → deliver → DONE
delegate → record → deliver → DONE
use_tool → continue to step 10
10. [Tool Pipeline] For tool calls:
a. Check if tool is disabled (control bus)
b. Check rate limit
c. Dedup check (same tool + same params → return cached)
d. Validate parameters
e. Human approval (if tool.requiresApproval = true)
f. Execute
g. Record step to memory
11. [Plan Advance] On success → advance plan step.
On failure → replan (up to maxReplans) → freestyle.
12. [Loop] Go to step 3 with tool result in context.
13. [Quality] After terminal outcome: scoreOutcome() computes
resolution, efficiency, error rate, and flags.
14. [Evolving] postOutcome() writes structured learning event to memory.
Flags feed into strategy notes for future runs.
If the loop exhausts maxIterations without a terminal action, the runtime force-responds with the last available text rather than hanging.
The runtime provides the loop. You provide 5 adapters that connect it to your infrastructure.
| Adapter | Purpose | Built-in options |
|---|---|---|
MemoryAdapter |
Conversation history, task state, semantic search | InMemoryAdapter, SQLiteMemoryAdapter, PostgresMemoryAdapter, MongoMemoryAdapter, Neo4jMemoryAdapter |
ToolAdapter |
Execute domain actions; mark tools requiresApproval to pause for human sign-off |
Your implementation or EquipmentToolAdapter, SkillsToolAdapter |
EventAdapter |
Receive work from webhooks, queues, or manual calls | BullMQEventAdapter (durable), simple HTTP handler |
DeliveryAdapter |
Send responses to the user's channel | WhatsAppDeliveryAdapter, your implementation |
ControlBusAdapter |
Kill tasks, pause tenants, disable tools at runtime | RedisControlBus (production), InMemoryControlBus (dev) |
Each adapter has a dummy implementation for tests (DummyMemoryAdapter, etc.) — no external services required.
→ Full interface specs, code examples, and production wiring in docs/INTEGRATION-GUIDE.md
The runtime ships built-in LLM brains for OpenAI, Anthropic, and Ollama. buildBrain(def) auto-selects based on your agent definition:
import { buildBrain, loadAgent } from "msm-agent";
const def = await loadAgent("./support-agent.md");
const brain = buildBrain(def); // reads OPENAI_API_KEY / ANTHROPIC_API_KEY / OLLAMA_ENDPOINT| Provider | provider: value |
Env var |
|---|---|---|
| OpenAI | openai |
OPENAI_API_KEY |
| Anthropic | anthropic |
ANTHROPIC_API_KEY |
| Ollama | ollama |
OLLAMA_ENDPOINT |
| Azure OpenAI | openai |
OPENAI_BASE_URL |
For the msm-ai 6-layer prompt pipeline, wrap it with wrapMSM() from msm-agent/bridge/msm. Any object with a run(input): Promise<BrainPayload> method also works as a custom brain.
→ Full examples and custom brain spec in docs/INTEGRATION-GUIDE.md
The CLI selects adapters automatically from environment variables. For embedded use, import them directly from "msm-agent".
| Adapter | Activate via | Peer dep | Best for |
|---|---|---|---|
InMemoryAdapter |
default | none | Tests, prototypes |
SQLiteMemoryAdapter |
MEMORY_PATH=/data/agent.db |
none (Node.js 22+) | Dev, single-container |
PostgresMemoryAdapter |
DATABASE_URL=postgresql://... |
pnpm add postgres |
Production, SQL workloads |
MongoMemoryAdapter |
DATABASE_URL=mongodb://... |
pnpm add mongodb |
Production, Atlas Vector Search |
Neo4jMemoryAdapter |
NEO4J_URL=bolt://... |
pnpm add neo4j-driver |
Graph-enriched semantic search |
RedisControlBus |
REDIS_URL=redis://... |
pnpm add ioredis |
Multi-instance control bus |
BullMQEventAdapter |
manual / pnpm add bullmq |
pnpm add bullmq ioredis |
Durable queue, cron, retries |
WhatsAppDeliveryAdapter |
WHATSAPP_GATEWAY_URL=... |
none | WhatsApp delivery via HTTP gateway |
Neo4j wraps any primary adapter as a graph enrichment layer. Failed BullMQ jobs retry 3× with exponential back-off.
→ Full setup details, connect patterns, and Neo4j stacking in docs/INTEGRATION-GUIDE.md
Equipment lets you connect external APIs (CRM systems, booking platforms, e-commerce stores) directly from the agent definition file. No code changes required — credentials are resolved from environment variables at load time.
## Equipment
connectors:
- type: shopify
operations: [orders.list, orders.get, customers.get]
access: read
endpoint: ${SHOPIFY_ENDPOINT}
credentials:
type: api_key
value: ${SHOPIFY_API_KEY}
- type: fresha
operations: [bookings.list, bookings.create, bookings.update]
access: readwrite
endpoint: ${FRESHA_ENDPOINT}
credentials:
type: bearer
value: ${FRESHA_TOKEN}
dedicatedTools: [generate_quote, escalate_to_human]When the agent has equipment, the runtime automatically injects an EQUIPMENT block into every brain call so the LLM explicitly knows which systems it has access to:
EQUIPMENT (connected systems):
- shopify: orders.list, orders.get, customers.get [read]
- fresha: bookings.list, bookings.create, bookings.update [readwrite]
DEDICATED TOOLS: generate_quote, escalate_to_human
A connector is a ~50-line module mapping API operations to tool definitions:
import { ConnectorRegistry } from "msm-agent";
ConnectorRegistry.register("shopify", (config) => ({
tools: [
{
name: "orders.list",
description: "List recent Shopify orders",
execute: async (args) => {
const response = await fetch(`${config.endpoint}/orders.json`, {
headers: { "X-Shopify-Access-Token": config.credentials.value },
});
return { status: "ok", result: await response.json() };
},
},
],
}));Once registered, any agent definition that lists type: shopify in its equipment block will automatically get these tools.
import { EquipmentToolAdapter, loadAgent } from "msm-agent";
const def = await loadAgent("./agent.md");
const tools = EquipmentToolAdapter.create(def.equipment, baseToolAdapter);
const agent = createAgent({ tools, ...rest });Skills are named bundles of tools that live inside your process — no external API calls, no credentials. They are the right choice for shared business logic that multiple agents reuse.
## Skills
- booking
- payments
- knowledgeimport { SkillRegistry } from "msm-agent";
SkillRegistry.register("booking", (options) => [
{
name: "booking_check_availability",
description: "Check available slots for a service",
parameters: {
serviceId: { type: "string", required: true },
date: { type: "string" },
},
execute: async (args) => {
const slots = await calendar.getSlots(args.serviceId, args.date);
return { status: "ok", result: { slots } };
},
},
{
name: "booking_create",
description: "Create a booking",
parameters: { serviceId: { type: "string" }, slotId: { type: "string" } },
execute: async (args) => {
const booking = await calendar.book(args);
return { status: "ok", result: { booking } };
},
},
]);| Equipment (Connectors) | Skills | |
|---|---|---|
| Needs credentials | Yes — API key, bearer token, etc. | No |
| External API | Yes | No — runs in-process |
| Defined in | .md ## Equipment block |
.md ## Skills list |
| Registry | ConnectorRegistry |
SkillRegistry |
| Adapter | EquipmentToolAdapter |
SkillToolAdapter |
Gates are zero-LLM filters that run before the brain loop. They handle common patterns cheaply, saving a full LLM call each time they fire.
Suppresses meaningless acknowledgements — "ok", "thanks", "got it", "👍", "تمام", "شكرا", and similar — with no response delivered. No LLM call, no delivery.
Returns a configurable canned message outside working hours. No LLM call.
## Hours
Timezone: Asia/Qatar
Mon-Fri: 09:00-18:00
Sat: 10:00-14:00
Message: We are currently closed. We will respond first thing when we open.Both gates are activated by the CLI when the corresponding sections are present in the definition file. For embedded use:
import { checkGates } from "msm-agent";
const agent = createAgent({
gates: {
acknowledgement: true,
businessHours: {
timezone: "Asia/Qatar",
schedule: { "Mon-Fri": "09:00-18:00", Sat: "10:00-14:00" },
closedMessage: "We are closed. Open Mon–Fri 9am–6pm.",
},
},
...adapters,
});The runtime measures the quality of every task outcome without any LLM calls. These measurements feed an automatic self-improvement loop.
After each task, scoreOutcome() computes three dimensions from the LoopOutcome:
| Dimension | Signal | Range |
|---|---|---|
resolution |
Did the task reach a response? (vs. error/escalation) | 0–1 |
efficiency |
How many tool calls were needed? (fewer is better) | 0–1 |
errorRate |
What fraction of tool calls succeeded? | 0–1 |
When a dimension falls below its threshold, a flag is raised:
| Flag | Trigger |
|---|---|
failed_resolution |
resolution < 0.5 |
slow_response |
efficiency < 0.5 (> 5 tools) |
high_error_rate |
> 30% tool calls failed |
import { scoreOutcome } from "msm-agent";
const score = scoreOutcome(outcome);
// { resolution: 0.7, efficiency: 0.9, errorRate: 1.0, flags: [] }The evolving layer connects quality scores to actual behavior improvement. It uses the existing memory adapter — no new database, no ML pipeline.
Every task:
preReason() → inject strategy notes + past approach hints into brain context
postOutcome() → write quality flags and outcome to memory
(on startup in assist mode):
refreshStrategies() → analyze recent quality flags, write improvement notes
Three modes:
| Mode | Learning | Hint injection | Purpose |
|---|---|---|---|
off |
none | none | Default — total silence |
shadow |
writes to memory | none | Safe observation — collect data without influencing |
assist |
writes to memory | injects hints | Full loop — learns and applies |
How hints work: In assist mode, preReason() retrieves strategy notes from memory and injects them at the top of the brain's context. For example, after several failed_resolution events, the agent's context will include:
[strategy] Ask clarifying questions when the user's intent is ambiguous.
Break compound requests into individual steps before proceeding.
FLAG_STRATEGIES maps each quality flag to an actionable improvement note:
import { FLAG_STRATEGIES } from "msm-agent";
FLAG_STRATEGIES.failed_resolution;
// → "Ask clarifying questions when the user's intent is ambiguous..."
FLAG_STRATEGIES.slow_response;
// → "Prioritize direct tool calls over multi-step planning..."
FLAG_STRATEGIES.high_error_rate;
// → "Verify tool parameters carefully before execution..."Enable via environment variable:
EVOLVING_MODE=shadow # observe and collect (safe starting point)
EVOLVING_MODE=assist # observe, collect, and inject improvement hintsThe evolving layer requires a memory adapter that implements search() and store() (SQLite, Postgres, or MongoDB). Without these, it degrades silently to a no-op.
When language: arabic (or ar) is declared in the ## Brain section of the agent definition, the runtime automatically routes Arabic user input through an Arabic-capable model. No code changes required.
## Brain
provider: ollama
model: phi4-mini
language: arabicHow it works:
- The brain factory builds a
RoutingBrainwrapping two sub-brains. - Before each request,
detectLanguage(input)runs a Unicode character-set heuristic — if > 30% of non-whitespace characters fall in the Arabic block (U+0600–U+06FF), the input is classified as Arabic. - Arabic input → routes to the Arabic-capable model. English/other → routes to the primary model.
- Both sub-brains implement the same
Braininterface — the rest of the runtime is unaware.
Environment variables:
| Variable | Default | Purpose |
|---|---|---|
ARABIC_OLLAMA_MODEL |
jais |
Ollama model for Arabic input |
ARABIC_OPENAI_MODEL |
— | OpenAI model override for Arabic (falls back to primary) |
ARABIC_ANTHROPIC_MODEL |
— | Anthropic model override for Arabic (falls back to primary) |
Language values accepted in ## Brain:
| Value | Behaviour |
|---|---|
arabic / ar |
Arabic input → Arabic model; others → primary |
auto |
Same as arabic; falls back to primary if no Arabic model env var set |
english / en |
No routing — same as omitting the field |
| omitted | No routing (existing behaviour) |
import { detectLanguage, RoutingBrain } from "msm-agent";
// Detect language of a string:
detectLanguage("مرحباً كيف حالك"); // → "ar"
detectLanguage("Hello there"); // → "en"
// Use RoutingBrain directly in programmatic mode:
const router = new RoutingBrain(primaryBrain, arabicBrain);The language detector runs in < 1ms. No API call, no ML model. Safe to call on every request.
For government, healthcare, and legal deployments that cannot use cloud LLMs, msm-agent supports a sovereign mode that enforces local-only processing.
# Zero API keys. Zero cloud. Fully air-gapped.
docker run \\
-e AGENT_FILE=/agent/inquiry-agent.md \\
-e SOVEREIGN=true \\
-e OLLAMA_ENDPOINT=http://ollama:11434 \\
-v ./inquiry-agent.md:/agent/inquiry-agent.md:ro \\
-v agent-data:/data \\
-p 3000:3000 \\
msm-agentWhat SOVEREIGN=true does:
- Validates at startup — if
OPENAI_API_KEYorANTHROPIC_API_KEYare present in the environment, the process exits with an error. This prevents accidental credential exposure. - Defaults the brain provider to Ollama — if the agent definition has no
## Brainsection (or uses a cloud provider), it is overridden toprovider: ollama, model: phi4-mini. - Defaults storage to SQLite — if neither
DATABASE_URLnorMEMORY_PATHis set,MEMORY_PATHis defaulted to/data/agent.db. No external database required. - Logs a sovereign banner at startup:
Sovereign mode: all processing is local — no cloud credentials loaded. - Adds
sovereign: trueto the/healthresponse for readiness probe confirmation.
curl http://localhost:3000/health
# → { "status": "ok", "sovereign": true, "provider": "ollama", ... }Recommended agent definition for sovereign deployments:
# Government Inquiry Agent
Domain: Citizen services
Language: Arabic and English
## Brain
provider: ollama
model: phi4-mini
language: arabic
## Capabilities
- answer public service inquiries
- explain application procedures
- escalate complex cases to a human officer
## Rules
- never fabricate policy details
- respond in the same language as the user
- escalate when confidence < 70%Air-gap checklist:
- Ollama running in the same private network (no external calls)
- SQLite volume mounted at
/data(or Postgres on private infra) - No
OPENAI_API_KEY/ANTHROPIC_API_KEYin the environment -
SOVEREIGN=trueset — runtime validates the above on startup -
/healthreturns"sovereign": true— use as liveness probe
Phase 14 introduced automatic strategy notes (flag frequency → improvement hints). Phase 17 adds three mechanisms that make the learning layer reliable at scale:
Strategy notes lose relevance over time. computeDecayScore() assigns a score based on how recently the note was supported by quality events:
decayScore = supportingEventCount / (daysSinceLastEvidence + 1)
× recencyWeight (1.0 if < 7 days, 0.5 if < 30, 0.1 otherwise)
Notes with decayScore < 0.1 are pruned by consolidate(). An agent running for months will retain only the strategy notes backed by recent evidence.
When the flag-counting system produces contradictory advice (e.g., "ask clarifying questions" vs. "respond directly"), both notes would otherwise be injected into the prompt — confusing the agent. consolidate() detects these pairs and removes the note with the lower decay score.
import { areContradictory, CONTRADICTION_PAIRS } from "msm-agent";
areContradictory(
"Ask clarifying questions when intent is ambiguous.",
"Respond directly without asking extra questions.",
);
// → true — the lower-scored note will be removed on consolidationA failed_resolution on a 6-tool, 10-iteration task is a stronger signal than one on a simple FAQ lookup. computeTaskWeight() scales the flag count contribution accordingly:
weight = 1 + log(toolCount + 1) + (maxIterations / actualIterations)
Set quality.weight on the QualityScore before calling postOutcome() to activate weighted counting in refreshStrategies().
import { consolidateStrategies } from "msm-agent";
// Or via the MemoryEvolvingAdapter:
const report = await evolvingAdapter.consolidate(memory);
// { pruned: 2, contradictionsResolved: 1, consolidatedAt: "2025-..." }Run consolidation periodically (e.g., nightly, alongside refreshStrategies() on startup in assist mode) to keep the evolving layer clean.
Every HTTP endpoint supports Server-Sent Events. Add Accept: text/event-stream to any request and the agent streams tokens to the client as they arrive — first token in < 1 second instead of waiting for the full response.
curl -N http://localhost:3000/chat \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"message": "What is our refund policy?"}'Stream event format:
data: {"type":"delta","text":"Our"}
data: {"type":"delta","text":" refund policy"}
data: {"type":"delta","text":" allows..."}
data: {"type":"done","sessionId":"sess_abc","outcome":{"type":"response",...}}
| Event type | Payload | When |
|---|---|---|
delta |
{ type, text } |
Each token chunk from the brain |
done |
{ type, sessionId?, outcome } |
Full LoopOutcome when loop completes |
error |
{ type, error } |
If the loop throws |
Works on /v1/event and /chat. Requires an OpenAI or Anthropic brain (both implement Brain.stream()). Falls back to normal JSON response if the brain does not support streaming.
Programmatic usage:
const outcome = await agent.streamEvent(event, (delta) => {
process.stdout.write(delta);
});Episodic memory lets the agent learn from past interactions using semantic search instead of keyword matching.
How it works:
memory.store()calls optionally embed and index each memory entry into a separate Qdrant collection ({agentName}_episodic)- On every new turn,
memory.search()retrieves the most semantically similar past interactions - Retrieved memories are injected into the brain's system prompt automatically
Enable via CLI:
# Same Qdrant instance used for KB — episodic index uses a separate collection
QDRANT_URL=http://localhost:6333 \
EMBED_PROVIDER=openai \
OPENAI_API_KEY=... \
node dist/server/cli.jsProgrammatic opt-in:
import {
EpisodicMemoryAdapter,
SqliteMemoryAdapter,
QdrantKnowledgeAdapter,
} from "msm-agent";
const memory = new EpisodicMemoryAdapter(
new SqliteMemoryAdapter({ path: "./agent.db" }),
qdrantKnowledgeAdapter, // optional — enables semantic search
);Without Qdrant, episodic memory falls back to standard LIKE-based keyword search (backward compatible).
By default, createAgent() uses an in-process mutex to serialize events per session. For multi-instance deployments (multiple Node processes or containers), replace it with the Redis distributed lock to prevent race conditions on shared session state.
import { createAgent, RedisDistributedLock } from "msm-agent";
const agent = createAgent({
lock: new RedisDistributedLock({
host: process.env.REDIS_HOST,
port: 6379,
}),
...adapters,
});Via CLI:
REDIS_URL=redis://localhost:6379 node dist/server/cli.js
# → RedisDistributedLock activated automatically when REDIS_URL is setHow it works: Uses Redis SET NX PX (atomic) with auto-extend heartbeat. If a second event arrives for the same session while one is in-flight, it either queues (within TTL) or returns a 409 Conflict. Prevents duplicate task creation and memory corruption under load.
| Adapter | Use case |
|---|---|
InProcessLockAdapter |
Single instance (default) |
RedisDistributedLock |
Multi-instance / horizontal scaling |
For long-running stateful workflows that span multiple interactions or run on a schedule, use the Jobs API.
ENABLE_JOBS=true # activates the Jobs adapter and HTTP routescurl -X POST http://localhost:3000/jobs \
-H "Content-Type: application/json" \
-d '{
"sessionId": "user-123",
"name": "Monthly inventory audit",
"budget": { "maxSteps": 50, "maxDurationMs": 3600000 }
}'
# { "jobId": "jbm_a1b2c3", "status": "running" }POST /jobs → creates job, status: "running"
POST /v1/event → each event on the session increments job step count
terminal outcomes (response, escalated) → "waiting"
budget exceeded → "failed" (HTTP 402)
POST /jobs/:id/cancel → job marked "cancelled"
GET /jobs/:id → job state, step count, elapsed duration
GET /jobs → list all jobs (filterable by status, sessionId)
InMemoryJobAdapter is used by default when ENABLE_JOBS=true. For persistence, set MEMORY_PATH alongside ENABLE_JOBS=true to use SQLiteJobAdapter (same database file as the memory adapter, zero extra dependencies).
Expose the agent as an MCP (Model Context Protocol) server so any MCP client — Claude Desktop, Cursor, custom AI tools — can call it as a tool provider.
ENABLE_MCP=true # stdio transport (CLI / IDE)
ENABLE_MCP=true MCP_TRANSPORT=http # HTTP transport (server deployments)
MCP_PORT=3001 # HTTP transport port (default: 3001)| Tool | Description |
|---|---|
agent_chat |
Send a message and get a response (auto-generates sessionId) |
agent_approve_task |
Approve or deny a pending tool requiring human approval |
agent_search_memory |
Full-text search of the agent's semantic memory |
| Resource | Description |
|---|---|
session://{sessionId} |
Conversation transcript for a session |
agent://definition |
Agent identity and capabilities |
import { createMcpServer } from "msm-agent/server";
const mcp = await createMcpServer(agent, def, {
transport: "http",
port: 3001,
memory,
});
// later:
await mcp.stop();The CLI boots an HTTP server from any .md or .it definition file. Adapters wire automatically from environment variables — no code changes needed.
# Single agent (in-memory, local dev)
AGENT_FILE=./agent.md OPENAI_API_KEY=sk-... node dist/server/cli.js
# Single agent (full production)
AGENT_FILE=./agent.md DATABASE_URL=postgresql://... REDIS_URL=redis://... node dist/server/cli.js
# Multi-agent hub — comma-separated definition files
AGENT_FILES=./feasibility.md,./legal.md,./hr.md \
DATABASE_URL=mongodb://... REDIS_URL=redis://... node dist/server/cli.jsProgression: In-memory → SQLite → Postgres/Mongo → add Redis + BullMQ + EVOLVING_MODE=shadow.
→ Docker Compose, all environment variables, and deployment guide in docs/DEPLOYMENT.md
Security notice: The HTTP server has no TLS built in. In any deployment beyond local dev you must place it behind an HTTPS-terminating reverse proxy (nginx, Caddy, an AWS ALB, etc.). For multi-user or public deployments also add an authentication gateway in front — the built-in
API_KEY/ Basic Auth options are a last line of defence, not a substitute for transport-layer security.
Run multiple agents in a single process with shared infrastructure (MongoDB, Redis, Qdrant, BullMQ). Each agent routes by URL — no extra service, no duplicate connections.
import { createAgent, createAgentHub } from "msm-agent";
import { createAgentServer } from "msm-agent/server";
// Shared adapters — instantiate once
const memory = await MongoMemoryAdapter.connect(process.env.DATABASE_URL);
const controlBus = await RedisControlBus.connect(process.env.REDIS_URL);
const hub = createAgentHub({
feasibility: createAgent({ brain: feasibilityBrain, memory, tools: feasibilityTools, ... }),
legal: createAgent({ brain: legalBrain, memory, tools: legalTools, ... }),
hr: createAgent({ brain: hrBrain, memory, tools: hrTools, ... }),
});
// Hub-aware server — routes /agents/:name/* automatically
const server = createAgentServer(hub, { feasibility: feasDef, legal: legalDef, hr: hrDef }, {
port: 3000, memory, controlBus,
});
await server.start();
// → POST /agents/feasibility/event
// → POST /agents/legal/event
// → POST /agents/hr/eventSession namespacing: Prefix session IDs with the agent name to prevent memory bleed when agents share a MemoryAdapter:
feasibility::sess_abc ← feasibility agent session
legal::sess_abc ← separate legal session, same suffix
Single-agent mode:
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Agent identity and readiness |
/v1/event |
POST | Process any AgentEvent (stateful sessions) |
/chat |
POST | Stateless single-turn (demo / testing) |
/session/:id |
GET | Conversation history + active task |
/task/approve |
POST | Resume a paused approval task |
/webhook/whatsapp |
POST | Inbound WhatsApp (HMAC-SHA256 verified) |
/jobs/* |
— | Jobs CRUD (ENABLE_JOBS=true) |
/admin/* |
— | Control bus + memory search (password-gated) |
/dashboard |
GET | Ops panel UI (DASHBOARD_PASSWORD required) |
Hub mode (v0.3.0) — URL-based routing:
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Status of all registered agents |
/agents |
GET | List registered agent names |
/agents/:name/health |
GET | Individual agent identity |
/agents/:name/event |
POST | Route event to named agent (stateful) |
/agents/:name/chat |
POST | Stateless single-turn for named agent |
/agents/:name/session/:id |
GET | Session state for named agent |
/agents/:name/task/approve |
POST | Approval callback for named agent |
→ Full request/response examples in docs/DEPLOYMENT.md
Every agent can be equipped with a vector KB backed by Qdrant — using per-agent collections with no SDK dependency (pure REST).
At deploy time — index your documents:
import { QdrantKnowledgeAdapter } from "msm-agent";
const kb = QdrantKnowledgeAdapter.create({
url: process.env.QDRANT_URL, // http://localhost:6333
collection: "support_kb",
embedProvider: "gemini", // gemini | openai | ollama
embedApiKey: process.env.GEMINI_API_KEY,
});
await kb.indexDocument("doc-001", "Refund Policy", fullPolicyText);
await kb.indexDocument("doc-002", "Shipping FAQ", shippingText);
// → chunks content (3000 chars / 500 overlap), embeds, upserts to QdrantAt runtime — attach to any agent:
const agent = createAgent({ brain, memory, tools, ..., knowledge: kb });
// → on every loop iteration, top-5 KB hits are injected into the brain prompt:
// "Knowledge base results:
// - [Refund Policy] (relevance 87%) We offer 30-day refunds for..."Via CLI (automatic wiring):
QDRANT_URL=http://localhost:6333 \
QDRANT_COLLECTION=support_kb \
EMBED_PROVIDER=gemini \
GEMINI_API_KEY=... \
AGENT_FILE=./support-agent.md \
node dist/server/cli.jsEmbedding providers:
| Provider | Key Required | Model Default |
|---|---|---|
gemini |
GEMINI_API_KEY |
text-embedding-004 (768-dim) |
openai |
OPENAI_API_KEY |
text-embedding-3-small (768-dim) |
ollama |
— (local) | nomic-embed-text (768-dim) |
Hub mode — each agent gets its own collection automatically (<agentName>_kb):
QDRANT_URL=http://localhost:6333 \
EMBED_PROVIDER=openai \
OPENAI_API_KEY=... \
AGENT_FILES=./feasibility.md,./legal.md \
node dist/server/cli.js
# → feasibility_kb collection + legal_kb collectionSmart chunking: Documents are split at sentence/paragraph boundaries with configurable overlap to prevent context loss at chunk edges. Text-only — chunking logic has no external dependency.
When DASHBOARD_PASSWORD is set, a built-in ops panel is available at GET /dashboard. Panels: pending approvals, control bus commands, memory search, session inspector. No external CDN or build step.
→ docs/DEPLOYMENT.md#3-ops-dashboard
Key createAgent() options: brain, memory, tools, events, delivery, plus controlBus, evolving, gates, preHook, compactHistory, costExtractor, onIteration, onGuard, onPlanCreated, onFatalError, onInjectionDetected.
Loop config defaults: maxIterations: 6, maxReplans: 2, confidenceThreshold: 0.6, toolDedup: true, costCapPerTask: 0 (unlimited), timeoutMs: 0 (unlimited), maxToolCallsPerTask: 0 (unlimited).
→ Full options, LoopOutcome types in docs/DEPLOYMENT.md#4-configuration-reference
Hard guards abort execution (iteration budget, cost cap, timeout, confidence gate, task killed, tenant paused, rate limited, tool disabled). Soft guards emit advisory signals to onGuard (repetition, dead-end).
→ docs/DEPLOYMENT.md#5-guard-system
pnpm test337 tests. All tests use the included dummy adapters — no external services required. The test suite covers:
- Core loop, session mutex, plan tracking, tool dedup, flush gate
- All 5 guard types
- Memory adapters (in-memory)
- Control bus commands
- Definition file parsing (
.mdand.it) - Brain system prompt generation
- WhatsApp event + delivery adapters
- Equipment connector registry and tool adapter
- Skills registry and tool adapter
- Pre-processing gates (acknowledgement + business hours)
- Quality scoring (
scoreOutcome,FLAG_STRATEGIES) - Evolving layer (
preReason,postOutcome,refreshStrategies) - Arabic-native routing (
detectLanguage,RoutingBrain,BrainSchema.language) - Sovereign deployment (
/health sovereign field, startup validation logic) - Deeper evolving layer (
computeDecayScore,areContradictory,consolidateStrategies) - Jobs lifecycle (create, list, cancel, budget enforcement)
- MCP server tool and resource exposure
- Context builder, output sanitization, input guard
MIT
- Integration Guide — adapter specs, brain wiring, production setup, full example
- Deployment Reference — CLI, Docker, HTTP API, config options, guard reference
- Production Readiness & Ownership Boundary — parity matrix, what to build yourself