You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The README headline name-drops "Runtime Tool Forging" without explaining
what it is or how it works. New section right after Install does the
explainer:
- Runtime Tool Forging: agent writes a TypeScript function mid-decision,
judge LLM approves it, V8 isolate sandbox runs it (128 MB heap, 10s
wall clock, no fs/network/eval/dynamic-import). Approved tools land
in a discoverable index for reuse so cost flattens after first forges.
- Optional HEXACO Personality: opt-in 6-dimensional trait vector. Most
deployments never touch it; the runtime stays personality-neutral by
default. When passed, biases retrieval / decision routing / tool
selection at the kernel level (not in a prompt).
- "Why emergent": memory + RTF + (optional) HEXACO compose to produce
behavior the prompt did not specify and the developer did not
predict. Each capability is documented and configurable; the
surprises come from how they compose.
Replaces vague headline name-drop with a concrete mechanics walkthrough.
Two capabilities that distinguish AgentOS from chat-completion wrappers and from frameworks that hard-code an agent's affordances at startup.
55
+
56
+
### Runtime Tool Forging
57
+
58
+
**An agent can write itself a new tool at runtime, get the tool reviewed, and run it inside a sandbox, all in one turn.** That tool then becomes available for the rest of the session and for future agents in the same runtime.
59
+
60
+
The mechanics, in order:
61
+
62
+
1.**Detect the gap.** Mid-decision, an agent notices the next step needs a function it doesn't have. (Example from a Mars Genesis run: a security officer agent decides it needs `compute_resource_allocation_under_drought_constraint(state) → priorityList` to make a defensible recommendation.)
63
+
2.**Forge.** The agent writes a TypeScript function. The function's input/output is described by a Zod schema; the function body is generated by the LLM from the agent's stated intent.
64
+
3.**Judge.** A separate LLM call ("judge") reads the forged function alongside the agent's stated intent and approves or rejects it. Mismatch rejects.
65
+
4.**Sandbox-execute.** Approved functions run inside a [V8 isolate](https://github.com/laverdet/isolated-vm) with a 128 MB heap and a 10-second wall clock. No filesystem, no network, no `eval`, no dynamic import. The sandbox is the load-bearing security boundary.
66
+
5.**Catalog and reuse.** Approved tools land in a discoverable tool index. Future turns invoke them via `call_forged_tool(name, args)`. Reuse costs tens of tokens. A forge costs full LLM tokens, so cost flattens after the first few turns of a long-running session.
67
+
68
+
In practice this is the difference between "the agent can do what we wrote handlers for" and "the agent can extend its own capability surface when the task warrants it." See the [emergent-tools post](https://agentos.sh/en/blog/emergent-tools-hexaco-leaders) for the live walkthrough.
69
+
70
+
### Optional HEXACO Personality
71
+
72
+
**HEXACO traits are opt-in. Most AgentOS deployments never touch personality and behave personality-neutral.** The runtime works exactly the same with or without a personality vector.
73
+
74
+
When you do pass a personality vector, the runtime treats it as a structured signal that biases retrieval, decision routing, and tool selection. Same agent, same prompt, same tool set: a high-Openness leader and a high-Conscientiousness leader produce measurably different outcomes because the kernel weights different memories and different tools differently.
HEXACO covers six factors ([Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness, Openness](https://hexaco.org/)). The personality vector is editable, inspectable, and removable on consent. The implementation is in the kernel, not in a prompt; prompt-only personality dissolves under pressure, kernel-encoded personality survives.
94
+
95
+
### Why "emergent"
96
+
97
+
Memory + Runtime Tool Forging + (optional) HEXACO produce behavior the prompt did not specify and the developer did not predict in advance. In a Paracosm Mars Genesis run, two leaders with the same starting state, same agent roster, same seed diverge by turn six: one because the personality biased its specialists toward different memories, one because a forged tool from turn two became the obvious next move on turn five.
98
+
99
+
Nothing about that emergence is mystical. It's the combination of (a) durable memory that survives across turns, (b) a tool surface that can grow within a session, and (c) optional personality biasing the choices among them. Each capability is documented and configurable; the surprises come from how they compose.
100
+
101
+
---
102
+
52
103
## Memory Benchmarks at Matched Reader
53
104
54
105
Same `gpt-4o` reader, same dataset, same `gpt-4o-2024-08-06` judge across every row. Cross-provider configurations are excluded because they cannot be reproduced from public methodology disclosures.
0 commit comments