Conversation
…ssion.commitTurn Restructure the public API into symmetric create*/use* layers: - createAgent / createAgentPool (Layer 3: scoped, returns result) - useAgent / useAgentPool (Layer 2: resource, branches alive) useAgentPool provides Subscription<AgentEvent, AgentPoolResult> via internal Channel with close-value pattern (Effection collections guide). Tick loop runs in spawn() concurrent with Subscription consumption. createAgentPool drains inside withSharedRoot body, forwards to broadcast. Recursion composes via Effection context inheritance. useAgent delegates to useAgentPool N=1, manages root via ensure(). Schema→grammar compiled via jsonSchemaToGrammar, set on root before fork. createAgent wraps in scoped(). Session.commitTurn(query, response) handles warm/cold internally — replaces promoteTrunk/appendTurn duplication. Research policy scoped per query (fresh instance per handleQuery) to prevent time budget expiry across multi-turn sessions. Pressure dynamics unchanged — four-phase tick loop, trailing stop, staggered reporting, pruneOnReport, recovery extraction all preserved. Error boundary in spawned tick loop catches decode failures, closes channel with partial results instead of crashing. BREAKING: Delete generate.ts, run-agents.ts, spawn-agents.ts.
…ngs eval - SettledTool.args: store tc.arguments instead of callId; guards parse JSON args - Recovery: try/catch handles prefill overflow; preserve pressure_critical drops - PoolTaskSpec: add optional systemPrompt for per-task rendering - Harness: render filtered sibling lists (exclude own questions); restore tool descriptions in eta templates - web_research: include in toolCtx via recursiveOpts - Findings eval: split into conflicts (trigger grounding) and observations (cross-agent analysis) - Bridge: cap maxTurns at 2; rename root.md to fallback.md
- ResearchTask type replaces PlanQuestion with descriptions and intent - plan.md → plan.eta with dimension decomposition language - web-search: no reranking in explore mode, min(provider score, entailment) in exploit mode - Tool descriptions cleaned for web_search and fetch_page - UseAgentOpts.maxTurns flows through to useAgentPool - PoolTaskSpec.systemPrompt wired for per-task prompt rendering - Nudge trace events carry actual message instead of hardcoded reason - Dedup guards parse JSON args correctly for fetch URL and search query - Research/corpus templates: question→task terminology, siblingTasks - Findings eval prompt tightened against false-positive conflicts - Bridge bug: r.tasks → acc.tasks (prior discoveries preserved in 3+ sources) - root.md → fallback.md
… TUI streaming Research pipeline: - Sequential task spine via reduce combinator: each task forks warm from a shared queryRoot, findings prefilled as user+assistant turns between tasks via extendSpine helper. Subsequent tasks inherit prior findings through KV attention. - plan.eta rewritten for chain-shaped plans: task 1 = landscape discovery, tasks 2+ explicitly reference prior findings. - web-worker.eta threads taskIndex for spine-awareness on tasks 1+ and current date so search queries anchor on the current year. - synthesize.eta restructured: holistic analysis → narrative-arc report with sections that advance/qualify/challenge the answer. Form-flexible body (prose, ### subsections, tables, counterpoints). ## Sources footer. - report.eta → recovery.eta (clarifies role vs the report tool). - bridge.eta deleted. Agent observation + terminal tool protection: - Agent.observe(ctx): per-token partial parse via parseChatOutput(isPartial: true) to detect which tool the agent is generating. Format-agnostic across all model families llama.cpp supports. Latches on first detection. - Agent.finalize(ctx): strict parse at isStop, replaces standalone parseChatOutput call in the pool's PRODUCE phase. - Agent.currentTool: read-only getter populated by observe, available to shouldExit, TUI, tracing, or any future consumer. - DefaultAgentPolicy.shouldExit: agents mid-generation of the terminal tool are protected from time-budget kill. Only KV pressure.critical can force a kill on an agent actively writing its report. - DefaultAgentPolicyOpts.terminalTool: threads the terminal tool name into the policy for the shouldExit guard. SDK contrastive-decode primitives: - Branch.setLogits(Float32Array): write companion to getLogits. - BranchStore.mergeLogits(dst, experts, alpha): pure-CPU additive merge of experts' logit snapshots into dst's. For DExperts-style contrastive decoding across research branches. - Session.prefillAligned(content, experts): batched alignment prefill across trunk + experts in a single store.prefill dispatch. - NAPI surface: \_branchSetLogits, \_storeMergeLogits. Agent pool coordination: - ToolContext.peerHistory: sibling agents' tool histories exposed to tools for cross-agent duplicate detection. - WebSearchTool + FetchPageTool: reject queries/URLs already issued by peers via peerHistory check. - DefaultAgentPolicyOpts.shouldExplore: two-axis thresholds (context fraction + time fraction) replacing single exploreThreshold number. - PlanTool.maxItems: 10 → 6. TUI + example cleanup: - PageStream abstraction for vertical token streaming (synth phase). - Tree glyph helpers (shared/tui/tree.ts). - agent-view.ts rewrite: dead sub-agent tracking removed, data-driven argSummaries + resultPreviews tables replace six-alias OR chains and if/else-if forest. - main.ts: node:util.parseArgs replaces fragile flagIndices parser. - Harness: runResearchTask + extendSpine + startTimer helpers extracted. Stale telemetry fields removed. enableThinking unset. Deps: - @lloyal-labs/lloyal.node: ^2.0.5 → ^2.1.0 (root + rig peer dep).
There was a problem hiding this comment.
Pull request overview
This PR introduces a “KV spine” deep-research workflow and updates the agents/rig/sdk layers to support new orchestration primitives (single-agent + pool APIs, recursive delegation, cross-agent tool dedup), along with SDK hooks for logits snapshot manipulation.
Changes:
- Add new Agents APIs (
useAgent/createAgent,createAgentPool,reduce) and refactor pool execution to stream events via aSubscription. - Add Rig delegation + cross-agent dedup in
web_search/fetch_page, and update planning output to structuredResearchTasks. - Add SDK internals for logits snapshot set/merge and new session helper methods used by the spine workflow + new deep-research example CLI/harness/TUI.
Reviewed changes
Copilot reviewed 63 out of 65 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/sdk/test/MockSessionContext.ts | Adds mock no-ops for new SessionContext internal methods. |
| packages/sdk/src/types.ts | Extends SessionContext with _branchSetLogits and _storeMergeLogits. |
| packages/sdk/src/Session.ts | Imports Branch as a value; adds commitTurn and prefillAligned. |
| packages/sdk/src/BranchStore.ts | Adds mergeLogits() wrapper over _storeMergeLogits. |
| packages/sdk/src/Branch.ts | Adds setLogits() API for overwriting cached logits snapshot. |
| packages/sdk/package.json | Bumps SDK version to 1.6.0. |
| packages/rig/src/tools/web-search.ts | Adds optional provider score, cross-agent dedup, explore/exploit rerank behavior changes. |
| packages/rig/src/tools/plan.ts | Switches to createAgent + Eta templating; introduces ResearchTask + adapter. |
| packages/rig/src/tools/index.ts | Exports new DelegateTool, taskToContent, ResearchTask. |
| packages/rig/src/tools/fetch-page.ts | Adds cross-agent URL dedup + trims description. |
| packages/rig/src/tools/delegate.ts | New recursive delegation tool built on createAgentPool, with entailment + echo gating. |
| packages/rig/src/sources/web.ts | Formatting/typing cleanups in buffering fetch tool wrapper. |
| packages/rig/src/index.ts | Re-exports DelegateTool, taskToContent, and new types. |
| packages/rig/package.json | Bumps rig version + updates peer dep on @lloyal-labs/lloyal.node. |
| packages/agents/test/spawn-agents.test.ts | Updates tests for new explore threshold configuration shape. |
| packages/agents/test/helpers/mock-branch.ts | Adds async iterator support for Agent iteration tests. |
| packages/agents/test/agent-pool.test.ts | Updates tests to drain the new Subscription result flow; adjusts expectations. |
| packages/agents/test/AgentPolicy.test.ts | Updates tool-history argument encoding; adjusts policy guard expectations. |
| packages/agents/test/Agent.test.ts | Adds tests for Agent async iterator state accumulation. |
| packages/agents/src/use-agent.ts | New single-agent wrapper built on useAgentPool. |
| packages/agents/src/types.ts | Adds peerHistory to ToolContext; includes agent instance in AgentResult. |
| packages/agents/src/trace-writer.ts | Makes JSONL trace buffer size configurable. |
| packages/agents/src/trace-types.ts | Extends pool:agentNudge trace event shape. |
| packages/agents/src/spawn-agents.ts | Removes legacy spawnAgents implementation. |
| packages/agents/src/source.ts | Minor formatting adjustment. |
| packages/agents/src/run-agents.ts | Removes legacy runAgents wrapper. |
| packages/agents/src/index.ts | Replaces legacy exports with new APIs and re-exports new combinator. |
| packages/agents/src/generate.ts | Removes legacy prepare/generate implementation. |
| packages/agents/src/create-agent-pool.ts | New createAgentPool() wrapper that drains events and returns AgentPoolResult. |
| packages/agents/src/combinators.ts | Adds reduce() helper for sequential Effection folds. |
| packages/agents/src/agent-pool.ts | Refactors useAgentPool to return a Subscription and adds cross-agent history plumbing. |
| packages/agents/src/AgentPolicy.ts | Updates dedup guards to parse JSON args; adds explore thresholds by axes; adds terminal-tool protection. |
| packages/agents/src/Agent.ts | Adds partial parsing (observe), finalize, async iterator, and new state fields. |
| packages/agents/package.json | Bumps agents version to 1.6.0. |
| package.json | Updates devDependency on @lloyal-labs/lloyal.node. |
| examples/supervisor/harness.ts | Migrates to new Agents APIs (createAgent, createAgentPool), removes legacy report pass. |
| examples/shared/tui/types.ts | Adds new stream regions for vertical token streaming. |
| examples/shared/tui/tree.ts | New shared tree glyph vocabulary for TUIs. |
| examples/shared/tui/page-stream.ts | New vertical “page stream” renderer for live token output. |
| examples/shared/tui/index.ts | Re-exports new TUI helpers and simplifies exports. |
| examples/shared/tui/agent-view.ts | Major refactor for streaming output + new tool call/result formatting. |
| examples/reflection/harness.ts | Migrates to createAgentPool, removes legacy report pass + findings field usage. |
| examples/react-agent/harness.ts | Migrates to createAgentPool, removes legacy report pass + findings field usage. |
| examples/deep-research/tui.ts | New spine-oriented workflow event presentation + streaming synthesis support. |
| examples/deep-research/prompts/web-worker.eta | New web worker prompt template for spine tasks. |
| examples/deep-research/prompts/verify.eta | New verify prompt template. |
| examples/deep-research/prompts/synthesize.eta | New synthesis prompt template enforcing source-grounding. |
| examples/deep-research/prompts/recovery.eta | New recovery prompt template. |
| examples/deep-research/prompts/plan.eta | New chain-shaped planning prompt template. |
| examples/deep-research/prompts/findings-eval.eta | New findings eval prompt template. |
| examples/deep-research/prompts/fallback.eta | New fallback prompt template. |
| examples/deep-research/prompts/eval.eta | New eval prompt template. |
| examples/deep-research/prompts/corpus-worker.eta | New corpus worker prompt template. |
| examples/deep-research/main.ts | New Deep Research CLI entry point (web/corpus/both). |
| examples/deep-research/harness.ts | New spine-based harness implementing plan → sequential task spine → synth → verify/eval → commit. |
| examples/deep-research-web/tasks/web-research.eta | Removes legacy deep-research-web task templates. |
| examples/deep-research-web/tasks/verify.md | Removes legacy deep-research-web task templates. |
| examples/deep-research-web/tasks/synthesize.eta | Removes legacy deep-research-web task templates. |
| examples/deep-research-web/tasks/plan.md | Removes legacy deep-research-web task templates. |
| examples/deep-research-web/tasks/findings-eval.md | Removes legacy deep-research-web task templates. |
| examples/deep-research-web/tasks/eval.md | Removes legacy deep-research-web task templates. |
| examples/deep-research-web/tasks/corpus-research.eta | Removes legacy deep-research-web task templates. |
| examples/deep-research-web/tasks/bridge.md | Removes legacy deep-research-web task templates. |
| examples/deep-research-web/main.ts | Removes legacy deep-research-web CLI. |
| examples/deep-research-web/harness.ts | Removes legacy deep-research-web harness implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export function useAgentPool(opts: AgentPoolOptions): Operation<Subscription<AgentEvent, AgentPoolResult>> { | ||
| return resource(function*(provide) { | ||
| const ctx: SessionContext = yield* Ctx.expect(); | ||
| const store: BranchStore = yield* Store.expect(); | ||
| const events: Channel<AgentEvent, void> = yield* Events.expect(); | ||
| const poolChannel = createChannel<AgentEvent, AgentPoolResult>(); |
There was a problem hiding this comment.
This is a breaking API change: useAgentPool now returns a Subscription<AgentEvent, AgentPoolResult> instead of an AgentPoolResult, and the public docs in this comment block still reference runAgents (which was removed from exports). If useAgentPool is part of the public API, consider keeping the old signature (e.g., introduce a new useAgentPoolEvents() that returns a Subscription) or bump the package major version and update the docstring accordingly.
| export { useAgent, createAgent } from './use-agent'; | ||
| export type { UseAgentOpts } from './use-agent'; | ||
| export { createAgentPool } from './create-agent-pool'; | ||
| export type { CreateAgentPoolOpts, PoolTaskSpec } from './create-agent-pool'; | ||
| export { diverge } from './diverge'; | ||
| export { useAgentPool, ContextPressure } from './agent-pool'; | ||
| export { runAgents } from './run-agents'; | ||
| export { createToolkit } from './toolkit'; |
There was a problem hiding this comment.
This module removed previously exported surface area (generate, prepare, runAgents, spawnAgents, and their associated types) while only bumping minor version. If these were public APIs, this is a semver-breaking change; consider re-exporting compatibility wrappers (possibly deprecated) or bumping the major version to avoid silently breaking downstream consumers.
| const timings: OpTiming[] = [ | ||
| { label: "Plan", tokens: plan.tokenCount, detail: intent, timeMs: plan.timeMs }, | ||
| { label: "Research", tokens: researchTotalTokens, detail: `${researchTotalToolCalls} tools`, timeMs: researchTimeMs }, | ||
| { label: "Synthesize", tokens: researchTotalTokens, detail: "spine fork", timeMs: synthTimeMs }, | ||
| { label: "Eval", tokens: evalAgent.tokenCount, detail: `converged: ${evalConverged ? "yes" : "no"}`, timeMs: evalTimeMs }, |
There was a problem hiding this comment.
In the stats table, the "Synthesize" step is reporting tokens: researchTotalTokens, which is the research-task token sum, not the synthesis pool's token usage. This will mislead users/operators about where time/tokens are spent. Track and report synthesis token count separately (e.g., from the synthesis pool result).
| // Warm path priority: explicit parent > session trunk > cold | ||
| const warmParent = opts.parent ?? opts.session?.trunk ?? undefined; | ||
|
|
||
| return yield* withSharedRoot( | ||
| { systemPrompt: opts.systemPrompt, tools: toolkit.toolsJson, parent: warmParent }, | ||
| function* (root) { | ||
| const sub = yield* useAgentPool({ | ||
| tasks: opts.tasks.map((t) => ({ | ||
| systemPrompt: t.systemPrompt ?? opts.systemPrompt, | ||
| content: t.content, | ||
| tools: toolkit.toolsJson, | ||
| parent: root, |
There was a problem hiding this comment.
createAgentPool passes parent: warmParent into withSharedRoot, whose warm-path implementation prefills a turn-separator into the shared root. useAgentPool's per-agent setup then also prefixes each agent suffix with getTurnSeparator(), so warm pools will end up with two separators before each agent prompt. This can introduce an extra empty turn in the chat template; consider ensuring only one layer (shared root or agent suffix) inserts the separator on the warm path.
| maxItems: 6, | ||
| }, |
There was a problem hiding this comment.
The JSON-schema grammar limit is hard-coded to maxItems: 6, but the tool is configured via maxQuestions. This makes the grammar constraint disagree with the runtime slicing and can silently cap plans below the configured limit. Use this._maxQuestions (or derive from opts) for maxItems so the grammar enforces the same bound you later apply in parsing.
| yield* ensure(() => { if (!root.disposed) root.pruneSubtreeSync(); }); | ||
|
|
||
| const prefillTokens = warmParent ? ctx.getTurnSeparator() : sharedTokens; | ||
| yield* call(() => root.prefill(prefillTokens)); |
There was a problem hiding this comment.
On the warm-parent path, useAgent prefills getTurnSeparator() into root, but useAgentPool's setupAgent() also prefixes each task prompt with getTurnSeparator(). That results in two separators back-to-back whenever opts.parent / opts.session.trunk is used, which can introduce an extra empty turn in the model's chat template. Consider removing the root-level separator prefill here (or making setupAgent omit the separator when forking from a root that already inserted it).
| observe(ctx: SessionContext): void { | ||
| if (this._toolObserved) return; | ||
| this._parsed = ctx.parseChatOutput(this._rawOutput, this.fmt.format, { | ||
| reasoningFormat: this.fmt.reasoningFormat, | ||
| generationPrompt: this.fmt.generationPrompt, |
There was a problem hiding this comment.
observe() calls ctx.parseChatOutput() on every token until a tool call is detected. For long free-text generations (or when no tool call happens), this can become very expensive (re-parsing an ever-growing buffer) and may dominate runtime. Consider gating this (e.g., only call observe when pressure/time is near an exit condition, or after detecting a cheap sentinel substring), or add a backoff so partial parsing is not attempted on every single token.
| constructor(fd: number, opts?: { bufferSize?: number }) { | ||
| this._fd = fd; | ||
| this._bufferSize = opts?.bufferSize ?? 1; | ||
| } | ||
|
|
||
| nextId(): TraceId { return this._nextId++; } | ||
|
|
||
| write(event: TraceEvent): void { | ||
| this._buffer.push(JSON.stringify(event)); | ||
| if (this._buffer.length >= 64) this.flush(); | ||
| if (this._buffer.length >= this._bufferSize) this.flush(); |
There was a problem hiding this comment.
JsonlTraceWriter's docstring says it buffers up to 64 events before flushing, but the implementation defaults bufferSize to 1, which forces a synchronous writeSync per event and can severely degrade performance when tracing is enabled. Consider restoring the default to 64 (or updating the documentation and ensuring callers explicitly opt into unbuffered mode).
| intent, | ||
| planTokens: plan.tokenCount, | ||
| agentTokens: researchTotalTokens, | ||
| synthTokens: answer.length, |
There was a problem hiding this comment.
synthTokens: answer.length is character count, not token count, so the complete payload mixes units (tokens vs chars). Use the synthesis agent's .tokenCount / pool .totalTokens (or explicitly rename this field to synthChars) to keep telemetry consistent.
| synthTokens: answer.length, | |
| synthChars: answer.length, |
| * @throws If logits length does not match n_vocab | ||
| */ | ||
| setLogits(logits: Float32Array): void { | ||
| this._ensureNotDisposed(); | ||
| this._ctx._branchSetLogits(this._handle, logits); |
There was a problem hiding this comment.
Branch.setLogits() docs say it throws when logits.length != n_vocab, but the method currently does no validation and just forwards to _branchSetLogits. Since SessionContext exposes vocabSize, add a length check here (or update the JSDoc to match actual behavior) so callers get a deterministic, SDK-level error instead of relying on backend-specific behavior.
No description provided.