Feat/spine by lloyal-research · Pull Request #9 · lloyal-ai/sdk

lloyal-research · 2026-04-16T23:53:12Z

No description provided.

…ssion.commitTurn Restructure the public API into symmetric create*/use* layers: - createAgent / createAgentPool (Layer 3: scoped, returns result) - useAgent / useAgentPool (Layer 2: resource, branches alive) useAgentPool provides Subscription<AgentEvent, AgentPoolResult> via internal Channel with close-value pattern (Effection collections guide). Tick loop runs in spawn() concurrent with Subscription consumption. createAgentPool drains inside withSharedRoot body, forwards to broadcast. Recursion composes via Effection context inheritance. useAgent delegates to useAgentPool N=1, manages root via ensure(). Schema→grammar compiled via jsonSchemaToGrammar, set on root before fork. createAgent wraps in scoped(). Session.commitTurn(query, response) handles warm/cold internally — replaces promoteTrunk/appendTurn duplication. Research policy scoped per query (fresh instance per handleQuery) to prevent time budget expiry across multi-turn sessions. Pressure dynamics unchanged — four-phase tick loop, trailing stop, staggered reporting, pruneOnReport, recovery extraction all preserved. Error boundary in spawned tick loop catches decode failures, closes channel with partial results instead of crashing. BREAKING: Delete generate.ts, run-agents.ts, spawn-agents.ts.

…ngs eval - SettledTool.args: store tc.arguments instead of callId; guards parse JSON args - Recovery: try/catch handles prefill overflow; preserve pressure_critical drops - PoolTaskSpec: add optional systemPrompt for per-task rendering - Harness: render filtered sibling lists (exclude own questions); restore tool descriptions in eta templates - web_research: include in toolCtx via recursiveOpts - Findings eval: split into conflicts (trigger grounding) and observations (cross-agent analysis) - Bridge: cap maxTurns at 2; rename root.md to fallback.md

- ResearchTask type replaces PlanQuestion with descriptions and intent - plan.md → plan.eta with dimension decomposition language - web-search: no reranking in explore mode, min(provider score, entailment) in exploit mode - Tool descriptions cleaned for web_search and fetch_page - UseAgentOpts.maxTurns flows through to useAgentPool - PoolTaskSpec.systemPrompt wired for per-task prompt rendering - Nudge trace events carry actual message instead of hardcoded reason - Dedup guards parse JSON args correctly for fetch URL and search query - Research/corpus templates: question→task terminology, siblingTasks - Findings eval prompt tightened against false-positive conflicts - Bridge bug: r.tasks → acc.tasks (prior discoveries preserved in 3+ sources) - root.md → fallback.md

… TUI streaming Research pipeline: - Sequential task spine via reduce combinator: each task forks warm from a shared queryRoot, findings prefilled as user+assistant turns between tasks via extendSpine helper. Subsequent tasks inherit prior findings through KV attention. - plan.eta rewritten for chain-shaped plans: task 1 = landscape discovery, tasks 2+ explicitly reference prior findings. - web-worker.eta threads taskIndex for spine-awareness on tasks 1+ and current date so search queries anchor on the current year. - synthesize.eta restructured: holistic analysis → narrative-arc report with sections that advance/qualify/challenge the answer. Form-flexible body (prose, ### subsections, tables, counterpoints). ## Sources footer. - report.eta → recovery.eta (clarifies role vs the report tool). - bridge.eta deleted. Agent observation + terminal tool protection: - Agent.observe(ctx): per-token partial parse via parseChatOutput(isPartial: true) to detect which tool the agent is generating. Format-agnostic across all model families llama.cpp supports. Latches on first detection. - Agent.finalize(ctx): strict parse at isStop, replaces standalone parseChatOutput call in the pool's PRODUCE phase. - Agent.currentTool: read-only getter populated by observe, available to shouldExit, TUI, tracing, or any future consumer. - DefaultAgentPolicy.shouldExit: agents mid-generation of the terminal tool are protected from time-budget kill. Only KV pressure.critical can force a kill on an agent actively writing its report. - DefaultAgentPolicyOpts.terminalTool: threads the terminal tool name into the policy for the shouldExit guard. SDK contrastive-decode primitives: - Branch.setLogits(Float32Array): write companion to getLogits. - BranchStore.mergeLogits(dst, experts, alpha): pure-CPU additive merge of experts' logit snapshots into dst's. For DExperts-style contrastive decoding across research branches. - Session.prefillAligned(content, experts): batched alignment prefill across trunk + experts in a single store.prefill dispatch. - NAPI surface: \_branchSetLogits, \_storeMergeLogits. Agent pool coordination: - ToolContext.peerHistory: sibling agents' tool histories exposed to tools for cross-agent duplicate detection. - WebSearchTool + FetchPageTool: reject queries/URLs already issued by peers via peerHistory check. - DefaultAgentPolicyOpts.shouldExplore: two-axis thresholds (context fraction + time fraction) replacing single exploreThreshold number. - PlanTool.maxItems: 10 → 6. TUI + example cleanup: - PageStream abstraction for vertical token streaming (synth phase). - Tree glyph helpers (shared/tui/tree.ts). - agent-view.ts rewrite: dead sub-agent tracking removed, data-driven argSummaries + resultPreviews tables replace six-alias OR chains and if/else-if forest. - main.ts: node:util.parseArgs replaces fragile flagIndices parser. - Harness: runResearchTask + extendSpine + startTimer helpers extracted. Stale telemetry fields removed. enableThinking unset. Deps: - @lloyal-labs/lloyal.node: ^2.0.5 → ^2.1.0 (root + rig peer dep).

Copilot

Pull request overview

This PR introduces a “KV spine” deep-research workflow and updates the agents/rig/sdk layers to support new orchestration primitives (single-agent + pool APIs, recursive delegation, cross-agent tool dedup), along with SDK hooks for logits snapshot manipulation.

Changes:

Add new Agents APIs (useAgent/createAgent, createAgentPool, reduce) and refactor pool execution to stream events via a Subscription.
Add Rig delegation + cross-agent dedup in web_search/fetch_page, and update planning output to structured ResearchTasks.
Add SDK internals for logits snapshot set/merge and new session helper methods used by the spine workflow + new deep-research example CLI/harness/TUI.

Reviewed changes

Copilot reviewed 63 out of 65 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
packages/sdk/test/MockSessionContext.ts	Adds mock no-ops for new SessionContext internal methods.
packages/sdk/src/types.ts	Extends `SessionContext` with `_branchSetLogits` and `_storeMergeLogits`.
packages/sdk/src/Session.ts	Imports `Branch` as a value; adds `commitTurn` and `prefillAligned`.
packages/sdk/src/BranchStore.ts	Adds `mergeLogits()` wrapper over `_storeMergeLogits`.
packages/sdk/src/Branch.ts	Adds `setLogits()` API for overwriting cached logits snapshot.
packages/sdk/package.json	Bumps SDK version to 1.6.0.
packages/rig/src/tools/web-search.ts	Adds optional provider score, cross-agent dedup, explore/exploit rerank behavior changes.
packages/rig/src/tools/plan.ts	Switches to `createAgent` + Eta templating; introduces `ResearchTask` + adapter.
packages/rig/src/tools/index.ts	Exports new `DelegateTool`, `taskToContent`, `ResearchTask`.
packages/rig/src/tools/fetch-page.ts	Adds cross-agent URL dedup + trims description.
packages/rig/src/tools/delegate.ts	New recursive delegation tool built on `createAgentPool`, with entailment + echo gating.
packages/rig/src/sources/web.ts	Formatting/typing cleanups in buffering fetch tool wrapper.
packages/rig/src/index.ts	Re-exports `DelegateTool`, `taskToContent`, and new types.
packages/rig/package.json	Bumps rig version + updates peer dep on `@lloyal-labs/lloyal.node`.
packages/agents/test/spawn-agents.test.ts	Updates tests for new explore threshold configuration shape.
packages/agents/test/helpers/mock-branch.ts	Adds async iterator support for Agent iteration tests.
packages/agents/test/agent-pool.test.ts	Updates tests to drain the new `Subscription` result flow; adjusts expectations.
packages/agents/test/AgentPolicy.test.ts	Updates tool-history argument encoding; adjusts policy guard expectations.
packages/agents/test/Agent.test.ts	Adds tests for Agent async iterator state accumulation.
packages/agents/src/use-agent.ts	New single-agent wrapper built on `useAgentPool`.
packages/agents/src/types.ts	Adds `peerHistory` to `ToolContext`; includes `agent` instance in `AgentResult`.
packages/agents/src/trace-writer.ts	Makes JSONL trace buffer size configurable.
packages/agents/src/trace-types.ts	Extends `pool:agentNudge` trace event shape.
packages/agents/src/spawn-agents.ts	Removes legacy `spawnAgents` implementation.
packages/agents/src/source.ts	Minor formatting adjustment.
packages/agents/src/run-agents.ts	Removes legacy `runAgents` wrapper.
packages/agents/src/index.ts	Replaces legacy exports with new APIs and re-exports new combinator.
packages/agents/src/generate.ts	Removes legacy `prepare`/`generate` implementation.
packages/agents/src/create-agent-pool.ts	New `createAgentPool()` wrapper that drains events and returns `AgentPoolResult`.
packages/agents/src/combinators.ts	Adds `reduce()` helper for sequential Effection folds.
packages/agents/src/agent-pool.ts	Refactors `useAgentPool` to return a `Subscription` and adds cross-agent history plumbing.
packages/agents/src/AgentPolicy.ts	Updates dedup guards to parse JSON args; adds explore thresholds by axes; adds terminal-tool protection.
packages/agents/src/Agent.ts	Adds partial parsing (`observe`), `finalize`, async iterator, and new state fields.
packages/agents/package.json	Bumps agents version to 1.6.0.
package.json	Updates devDependency on `@lloyal-labs/lloyal.node`.
examples/supervisor/harness.ts	Migrates to new Agents APIs (`createAgent`, `createAgentPool`), removes legacy report pass.
examples/shared/tui/types.ts	Adds new stream regions for vertical token streaming.
examples/shared/tui/tree.ts	New shared tree glyph vocabulary for TUIs.
examples/shared/tui/page-stream.ts	New vertical “page stream” renderer for live token output.
examples/shared/tui/index.ts	Re-exports new TUI helpers and simplifies exports.
examples/shared/tui/agent-view.ts	Major refactor for streaming output + new tool call/result formatting.
examples/reflection/harness.ts	Migrates to `createAgentPool`, removes legacy report pass + findings field usage.
examples/react-agent/harness.ts	Migrates to `createAgentPool`, removes legacy report pass + findings field usage.
examples/deep-research/tui.ts	New spine-oriented workflow event presentation + streaming synthesis support.
examples/deep-research/prompts/web-worker.eta	New web worker prompt template for spine tasks.
examples/deep-research/prompts/verify.eta	New verify prompt template.
examples/deep-research/prompts/synthesize.eta	New synthesis prompt template enforcing source-grounding.
examples/deep-research/prompts/recovery.eta	New recovery prompt template.
examples/deep-research/prompts/plan.eta	New chain-shaped planning prompt template.
examples/deep-research/prompts/findings-eval.eta	New findings eval prompt template.
examples/deep-research/prompts/fallback.eta	New fallback prompt template.
examples/deep-research/prompts/eval.eta	New eval prompt template.
examples/deep-research/prompts/corpus-worker.eta	New corpus worker prompt template.
examples/deep-research/main.ts	New Deep Research CLI entry point (web/corpus/both).
examples/deep-research/harness.ts	New spine-based harness implementing plan → sequential task spine → synth → verify/eval → commit.
examples/deep-research-web/tasks/web-research.eta	Removes legacy deep-research-web task templates.
examples/deep-research-web/tasks/verify.md	Removes legacy deep-research-web task templates.
examples/deep-research-web/tasks/synthesize.eta	Removes legacy deep-research-web task templates.
examples/deep-research-web/tasks/plan.md	Removes legacy deep-research-web task templates.
examples/deep-research-web/tasks/findings-eval.md	Removes legacy deep-research-web task templates.
examples/deep-research-web/tasks/eval.md	Removes legacy deep-research-web task templates.
examples/deep-research-web/tasks/corpus-research.eta	Removes legacy deep-research-web task templates.
examples/deep-research-web/tasks/bridge.md	Removes legacy deep-research-web task templates.
examples/deep-research-web/main.ts	Removes legacy deep-research-web CLI.
examples/deep-research-web/harness.ts	Removes legacy deep-research-web harness implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T02:16:06Z

+export function useAgentPool(opts: AgentPoolOptions): Operation<Subscription<AgentEvent, AgentPoolResult>> {
  return resource(function*(provide) {
    const ctx: SessionContext = yield* Ctx.expect();
    const store: BranchStore = yield* Store.expect();
-    const events: Channel<AgentEvent, void> = yield* Events.expect();
+    const poolChannel = createChannel<AgentEvent, AgentPoolResult>();


This is a breaking API change: useAgentPool now returns a Subscription<AgentEvent, AgentPoolResult> instead of an AgentPoolResult, and the public docs in this comment block still reference runAgents (which was removed from exports). If useAgentPool is part of the public API, consider keeping the old signature (e.g., introduce a new useAgentPoolEvents() that returns a Subscription) or bump the package major version and update the docstring accordingly.

Copilot · 2026-04-19T02:16:06Z

+export { useAgent, createAgent } from './use-agent';
+export type { UseAgentOpts } from './use-agent';
+export { createAgentPool } from './create-agent-pool';
+export type { CreateAgentPoolOpts, PoolTaskSpec } from './create-agent-pool';
 export { diverge } from './diverge';
 export { useAgentPool, ContextPressure } from './agent-pool';
-export { runAgents } from './run-agents';
 export { createToolkit } from './toolkit';


This module removed previously exported surface area (generate, prepare, runAgents, spawnAgents, and their associated types) while only bumping minor version. If these were public APIs, this is a semver-breaking change; consider re-exporting compatibility wrappers (possibly deprecated) or bumping the major version to avoid silently breaking downstream consumers.

Copilot · 2026-04-19T02:16:07Z

+  const timings: OpTiming[] = [
+    { label: "Plan",       tokens: plan.tokenCount,     detail: intent,                                 timeMs: plan.timeMs },
+    { label: "Research",   tokens: researchTotalTokens, detail: `${researchTotalToolCalls} tools`,      timeMs: researchTimeMs },
+    { label: "Synthesize", tokens: researchTotalTokens, detail: "spine fork",                            timeMs: synthTimeMs },
+    { label: "Eval",       tokens: evalAgent.tokenCount, detail: `converged: ${evalConverged ? "yes" : "no"}`, timeMs: evalTimeMs },


In the stats table, the "Synthesize" step is reporting tokens: researchTotalTokens, which is the research-task token sum, not the synthesis pool's token usage. This will mislead users/operators about where time/tokens are spent. Track and report synthesis token count separately (e.g., from the synthesis pool result).

Copilot · 2026-04-19T02:16:07Z

+  // Warm path priority: explicit parent > session trunk > cold
+  const warmParent = opts.parent ?? opts.session?.trunk ?? undefined;
+
+  return yield* withSharedRoot(
+    { systemPrompt: opts.systemPrompt, tools: toolkit.toolsJson, parent: warmParent },
+    function* (root) {
+      const sub = yield* useAgentPool({
+        tasks: opts.tasks.map((t) => ({
+          systemPrompt: t.systemPrompt ?? opts.systemPrompt,
+          content: t.content,
+          tools: toolkit.toolsJson,
+          parent: root,


createAgentPool passes parent: warmParent into withSharedRoot, whose warm-path implementation prefills a turn-separator into the shared root. useAgentPool's per-agent setup then also prefixes each agent suffix with getTurnSeparator(), so warm pools will end up with two separators before each agent prompt. This can introduce an extra empty turn in the chat template; consider ensuring only one layer (shared root or agent suffix) inserts the separator on the warm path.

Copilot · 2026-04-19T02:16:07Z

+          maxItems: 6,
        },


The JSON-schema grammar limit is hard-coded to maxItems: 6, but the tool is configured via maxQuestions. This makes the grammar constraint disagree with the runtime slicing and can silently cap plans below the configured limit. Use this._maxQuestions (or derive from opts) for maxItems so the grammar enforces the same bound you later apply in parsing.

Copilot · 2026-04-19T02:16:07Z

+    yield* ensure(() => { if (!root.disposed) root.pruneSubtreeSync(); });
+
+    const prefillTokens = warmParent ? ctx.getTurnSeparator() : sharedTokens;
+    yield* call(() => root.prefill(prefillTokens));


On the warm-parent path, useAgent prefills getTurnSeparator() into root, but useAgentPool's setupAgent() also prefixes each task prompt with getTurnSeparator(). That results in two separators back-to-back whenever opts.parent / opts.session.trunk is used, which can introduce an extra empty turn in the model's chat template. Consider removing the root-level separator prefill here (or making setupAgent omit the separator when forking from a root that already inserted it).

Copilot · 2026-04-19T02:16:08Z

+  observe(ctx: SessionContext): void {
+    if (this._toolObserved) return;
+    this._parsed = ctx.parseChatOutput(this._rawOutput, this.fmt.format, {
+      reasoningFormat: this.fmt.reasoningFormat,
+      generationPrompt: this.fmt.generationPrompt,


observe() calls ctx.parseChatOutput() on every token until a tool call is detected. For long free-text generations (or when no tool call happens), this can become very expensive (re-parsing an ever-growing buffer) and may dominate runtime. Consider gating this (e.g., only call observe when pressure/time is near an exit condition, or after detecting a cheap sentinel substring), or add a backoff so partial parsing is not attempted on every single token.

Copilot · 2026-04-19T02:16:08Z

+  constructor(fd: number, opts?: { bufferSize?: number }) {
+    this._fd = fd;
+    this._bufferSize = opts?.bufferSize ?? 1;
+  }

  nextId(): TraceId { return this._nextId++; }

  write(event: TraceEvent): void {
    this._buffer.push(JSON.stringify(event));
-    if (this._buffer.length >= 64) this.flush();
+    if (this._buffer.length >= this._bufferSize) this.flush();


JsonlTraceWriter's docstring says it buffers up to 64 events before flushing, but the implementation defaults bufferSize to 1, which forces a synchronous writeSync per event and can severely degrade performance when tracing is enabled. Consider restoring the default to 64 (or updating the documentation and ensuring callers explicitly opt into unbuffered mode).

Copilot · 2026-04-19T02:16:08Z

+      intent,
+      planTokens: plan.tokenCount,
+      agentTokens: researchTotalTokens,
+      synthTokens: answer.length,


synthTokens: answer.length is character count, not token count, so the complete payload mixes units (tokens vs chars). Use the synthesis agent's .tokenCount / pool .totalTokens (or explicitly rename this field to synthChars) to keep telemetry consistent.

Suggested change

synthTokens: answer.length,

synthChars: answer.length,

Copilot · 2026-04-19T02:16:08Z

+   * @throws If logits length does not match n_vocab
+   */
+  setLogits(logits: Float32Array): void {
+    this._ensureNotDisposed();
+    this._ctx._branchSetLogits(this._handle, logits);


Branch.setLogits() docs say it throws when logits.length != n_vocab, but the method currently does no validation and just forwards to _branchSetLogits. Since SessionContext exposes vocabSize, add a length check here (or update the JSDoc to match actual behavior) so callers get a deterministic, SDK-level error instead of relying on backend-specific behavior.

lloyal-research added 7 commits April 8, 2026 10:49

WIP

d21ea7a

rename example to deep-research

09dcd6a

refactor(dx): drop create prefix

8471dd4

lloyal-research requested a review from Copilot April 19, 2026 02:09

Copilot started reviewing on behalf of lloyal-research April 19, 2026 02:09 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

refactor(dx): address PR comments

66b8a4e

lloyal-research added the skip-gpu-tests label Apr 19, 2026

lloyal-research merged commit b29c771 into main Apr 19, 2026
6 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/spine#9

Feat/spine#9
lloyal-research merged 8 commits intomainfrom
feat/spine

lloyal-research commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lloyal-research commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants