Skip to content

feat: replace orchestrator/query with synthesis agents#39

Merged
charlie83Gs merged 89 commits intomainfrom
synthesis-agents-refactor
Mar 27, 2026
Merged

feat: replace orchestrator/query with synthesis agents#39
charlie83Gs merged 89 commits intomainfrom
synthesis-agents-refactor

Conversation

@charlie83Gs
Copy link
Copy Markdown
Contributor

Summary

Major architectural refactoring that removes the deprecated orchestrator, query, and conversation systems and replaces them with a new synthesis-based research flow.

Removed (~10,880 lines):

  • Legacy orchestrator agents (wave planner, scope planner, sub-explorer)
  • Query mode (worker-query, QueryAgent, query tools)
  • Conversation system (worker-conversations, follow-up turns, resynthesis)
  • Chat/query/answer frontend UI (ChatPanel, QueryBar, ConversationHistory, etc.)
  • Renamed worker-orchestrator → worker-bottomup

Added (~3,645 lines):

  • worker-synthesis — new service with:
    • SynthesizerAgent (extends BaseAgent) with 8 navigation tools matching the MCP synthesizer pattern: search_graph, search_facts, get_node, get_edges, get_facts, get_dimensions, get_fact_sources, get_node_paths
    • SuperSynthesizerAgent for multi-scope meta-synthesis (reconnaissance → parallel dispatch → combine)
    • Document processing pipeline (sentence splitting, embedding, node text linking, fact embedding linking)
    • System prompts adapted from the reference MCP agent with 4-phase investigation strategy
  • Data model — SynthesisSentence, SentenceFact, SentenceNodeLink, SynthesisChild tables + migrations; supersynthesis NodeType; Visibility enum; visibility/creator_id on Node
  • API endpoints — 9 routes for synthesis CRUD (create, list, get document, get sentence facts, get nodes, delete, update visibility)
  • Frontend document view — /syntheses list page, /syntheses/[id] detail page with sentence-level fact panel, node list, sub-synthesis list

Test plan

  • Run just test-all to verify no backend regressions from cleanup
  • Run cd frontend && pnpm lint && pnpm type-check && pnpm test for frontend
  • Run uv sync --all-packages to verify workspace resolution with new package
  • Verify just worker starts without import errors
  • Test synthesis creation via POST /api/v1/syntheses
  • Test synthesis document retrieval via GET /api/v1/syntheses/{id}
  • Test super-synthesis creation via POST /api/v1/super-syntheses
  • Verify frontend /syntheses page renders
  • Verify frontend /syntheses/[id] document view renders with sentence interaction

🤖 Generated with Claude Code

charlie83Gs and others added 30 commits March 25, 2026 15:33
Phase 0A: Delete legacy orchestrator agents directory
- Move ScopePlan, ScopeBriefing, WaveAccumulator to bottom_up/state.py
- Move scout_impl to bottom_up/scout.py
- Move wave_planner to bottom_up/wave_planner.py
- Delete agents/ and prompts/ directories
- Delete agent-related tests

Phase 0B: Delete query mode and conversations
- Delete worker-query service entirely
- Delete worker-conversations service entirely
- Remove from worker-all imports and workflow registrations
- Remove conversations router and endpoints from API
- Remove ConversationState from kt-agents-core
- Remove synthesize_answer_impl (unused after query removal)
- Remove conversation tests from test_api.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete components/chat/ (ChatPanel, ChatInput, ChatMessage, etc.)
- Delete components/answer/ (AnswerView, ConfidenceIndicator, etc.)
- Delete components/query/ (QueryBar, QueryBudgetControls, QuickActions)
- Delete useConversation hook and conversation/[id] page
- Delete ResearchNodeDialog (depended on conversation flow)
- Inline ConfidenceIndicator as Badge in DimensionsTab and ConvergenceTab
- Replace home page query UI with simple landing page
- Rename sidebar "Query" to "Home"
- Update research components to not navigate to /conversation/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename service directory and Python module
- Update all internal imports (kt_worker_orchestrator -> kt_worker_bottomup)
- Update pyproject.toml package name and dependencies across workspace
- Update docker-compose.yml, Dockerfile, justfile, helm charts, CI workflows
- Remove worker-query and worker-conversations from docker-compose, CI, helm
- Fix docstring references in worker-nodes composite pipeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1: Data model changes for synthesis/supersynthesis features

- Add supersynthesis NodeType and Visibility enum to kt-config
- Add ALL_SYNTHESES_ID and ALL_SUPERSYNTHESES_ID default parents
- Add visibility and creator_id columns to Node model (graph-db)
- Add visibility and creator_id columns to WriteNode model (write-db)
- Add SynthesisSentence model (sentences in a synthesis document)
- Add SentenceFact model (sentence-to-fact links by embedding distance)
- Add SentenceNodeLink model (sentence-to-node links by text match)
- Add SynthesisChild model (supersynthesis-to-synthesis links)
- Add graph-db migration (zzae) for new tables and columns
- Add write-db migration (w030) for visibility/creator_id on write_nodes
- Add SynthesisDocumentRepository with full CRUD operations
- Add SynthesizerInput/Output and SuperSynthesizerInput/Output to kt-hatchet

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2: Create the worker-synthesis service

New service: services/worker-synthesis/
- SynthesizerAgent (extends BaseAgent) with 8 navigation tools:
  search_graph, search_facts, get_node, get_edges, get_facts,
  get_dimensions, get_fact_sources, get_node_paths
- finish_synthesis tool for document submission
- System prompt adapted from MCP synthesizer agent with 4-phase
  investigation strategy and core principles
- Document processing pipeline: split sentences, embed, link nodes
  by text match, link facts by embedding similarity
- Hatchet workflow (synthesizer_wf) that creates synthesis node,
  runs agent, and processes the document
- Registered in worker-all and justfile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3: SuperSynthesizer for multi-scope meta-synthesis

- SuperSynthesizerAgent (extends BaseAgent) with tools:
  read_synthesis, get_synthesis_nodes, search_graph, get_node,
  finish_super_synthesis
- System prompt adapted from super-synthesize skill with
  cross-pollination, thematic reorganization, meta-pattern detection
- 3-task Hatchet workflow (super_synthesizer_wf):
  1. reconnaissance: LLM plans 3-7 thematic scopes from graph search
  2. run_sub_syntheses: dispatches synthesizer_wf in parallel
  3. combine: runs SuperSynthesizerAgent to produce meta-synthesis
- Registered in worker-all and worker-synthesis __main__

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 4: Backend API endpoints

- POST /api/v1/syntheses — dispatch synthesizer_wf
- POST /api/v1/super-syntheses — dispatch super_synthesizer_wf
- GET /api/v1/syntheses — list syntheses (paginated, visibility filter)
- GET /api/v1/syntheses/{id} — full document with sentences, fact counts,
  node links, sub-syntheses (for supersynthesis)
- GET /api/v1/syntheses/{id}/sentences/{pos}/facts — facts for sentence
  grouped by source
- GET /api/v1/syntheses/{id}/nodes — all referenced nodes
- DELETE /api/v1/syntheses/{id} — delete synthesis
- PATCH /api/v1/syntheses/{id} — update visibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 5: Frontend synthesis document viewer

New pages:
- /syntheses — list all syntheses with create button
- /syntheses/[id] — full document view

New components:
- SynthesisDocument — main document renderer with sentence-level
  interactivity (hover for fact count, click to open fact panel)
- SynthesisFactPanel — right panel showing facts grouped by source
  for selected sentence
- SynthesisNodeList — bottom section with all referenced nodes
- SubSynthesisList — child syntheses for supersynthesis docs
- CreateSynthesisDialog — form for creating new synthesis

Updates:
- Add synthesis TypeScript types to types/index.ts
- Add synthesis API methods to lib/api.ts
- Add "Syntheses" to sidebar navigation
- Update home page with syntheses card

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 6: Integration

- Add worker-synthesis Dockerfile
- Add worker-synthesis to docker-compose.yml
- Add worker-synthesis to CI build matrix
- Update CLAUDE.md comprehensively:
  - Update project description (document-based, not chat-based)
  - Replace worker-orchestrator/query/conversations with
    worker-bottomup and worker-synthesis
  - Update workflow documentation (synthesizer_wf, super_synthesizer_wf)
  - Update agent architecture (SynthesizerAgent, SuperSynthesizerAgent)
  - Add synthesis/supersynthesis node types
  - Update import names, scopes, code location tables
  - Update research flow description
  - Remove conversation/query references throughout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Regenerate uv.lock after removing worker-query/worker-conversations
  packages (uv sync --frozen was failing on deleted distributions)
- Remove stale kt-worker-query and kt-worker-conversations from
  api/pyproject.toml dependencies and sources
- Add kt-worker-synthesis to api/pyproject.toml
- Fix frontend lint errors:
  - Replace <a> tags with <Link> from next/link (page.tsx,
    syntheses/[id]/page.tsx, ResearchBuildProgress.tsx)
  - Fix setState-in-effect pattern in syntheses/[id]/page.tsx
    (use useCallback + separate fetch function)
  - Remove unused Search import from Sidebar.tsx
  - Remove unused researchDialogOpen state and Research button
    from NodeDetailPanel.tsx
  - Remove unused result variable in SourceUploadForm.tsx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Auto-fix 16 ruff import sorting issues across synthesis service
- Add 'supersynthesis' to expected node types in test_sanity.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Run ruff format on 9 files that had formatting issues
- Clean up leftover empty dirs from deleted services
  (worker-conversations, worker-query, worker-orchestrator)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…chet parents

Hatchet SDK v1 expects parent task references as function objects,
not string names. Fix parents= and ctx.task_output() calls in
super_synthesizer_wf.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The zzae migration was pointing to zzad as its parent, but
ggg8h9i0j1k2 and hhh9i0j1k2l3 were added after zzad on main,
creating a branch with two heads. Fix by setting zzae's
down_revision to hhh9i0j1k2l3 (the actual latest head).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dialog

- Center list and detail pages with mx-auto + px-4
- Add synthesis type selector (Synthesis vs Super-Synthesis) to the
  create dialog with visual card-style toggle
- Regular synthesis shows exploration budget control
- Super-synthesis shows explanation of automatic scope planning
- Both types share topic and visibility inputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The request() function already prepends API_PREFIX (/api/v1), so the
synthesis API functions should use bare paths like /syntheses, not
/api/v1/syntheses (which resulted in /api/v1/api/v1/syntheses -> 404).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use hatchet.runs.aio_create(workflow_name=...) instead of importing
worker modules directly. This follows best practice — the API should
not depend on worker packages. Workers communicate via Hatchet only.

- Remove kt-worker-synthesis from api pyproject.toml dependencies
- Use hatchet.runs.aio_create("synthesizer_wf", input={...})
- Use hatchet.runs.aio_create("super_synthesizer_wf", input={...})

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…imports

Add dispatch_workflow() and run_workflow() helpers to kt_hatchet.client
that dispatch workflows by name string via hatchet.runs.aio_create().

Refactor all API endpoints to use these helpers instead of importing
worker modules directly. This eliminates cross-service dependencies
between the API and worker packages (except kt-worker-ingest which
is still imported for pipeline processing functions).

Files changed:
- kt_hatchet/client.py: add dispatch_workflow(), run_workflow()
- syntheses.py: already using dispatch_workflow
- graph_builder.py: auto_build_task
- edges.py: enrich_edge_task
- sources.py: reingest_source_wf (sync wait via run_workflow)
- nodes.py: rebuild_node_task, node_pipeline_wf, regenerate_composite_task
- seeds.py: build_composite_task (sync), node_pipeline_wf
- research.py: ingest_confirm_wf, ingest_decompose_wf, ingest_build_wf,
  bottom_up_prepare_wf, agent_select_wf
- api/pyproject.toml: remove kt-worker-bottomup, kt-worker-nodes deps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- dispatch_workflow() now raises RuntimeError with a clear message
  when Hatchet rejects the request (e.g. no active worker)
- Synthesis endpoints catch RuntimeError and return 503 with
  actionable error message ("Ensure the worker is running")
- run_workflow() reuses dispatch_workflow() to avoid duplication

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The runs.aio_create() v1 REST API returns 500 errors in the current
Hatchet version. Switch dispatch_workflow() and run_workflow() to use
admin.aio_run_workflow() instead, which is the same gRPC-based method
that Workflow.aio_run_no_wait() uses internally and works correctly.

Key difference: admin.aio_run_workflow takes JSON-serialized input
(string) instead of a dict, and uses the gRPC v0 client.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d bug

- Set max_tokens=32000 for both SynthesizerAgent and SuperSynthesizerAgent
  (default was 1000, causing the agent to hang when generating documents)
- Add get_model_kwargs() hook to BaseAgent for subclass overrides
- Fix search_facts: EmbeddingService has embed_batch(), not embed()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GraphEngine.create_node() doesn't accept node_id or definition.
Use create_node() then set_node_definition() separately, matching
the pattern used by the existing node pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the 4 synthesis tables (synthesis_sentences, sentence_facts,
sentence_node_links, synthesis_children) with a JSON document stored
in the WriteNode.metadata_ field under "synthesis_document".

This follows the project's dual-database architecture:
- Pipeline writes JSON to write-db metadata via GraphEngine
- Sync worker propagates metadata to graph-db
- API reads from graph-db Node.metadata_

Changes:
- Rewrite document_processing.py to return JSON dict instead of DB records
- Update workflows to store doc in WriteNode.metadata_
- Update API to read from node metadata
- Remove synthesis table models and repository
- Simplify zzae migration to only add visibility/creator_id columns
- Update frontend for flat fact responses

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep the synthesis document JSON in metadata_ (no separate tables),
but strip it from network responses in _build_node_response(). This
means wiki, node lists, and search results don't transfer the ~50KB
document blob. Only the /syntheses/{id} endpoint reads and returns it.

- Add _strip_synthesis_doc() to nodes.py — filters metadata before response
- Remove SynthesisDocument and WriteSynthesisDocument models (not needed)
- Keep the JSON-in-metadata approach: simple, follows write-db -> sync pattern

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clicking a sentence in the synthesis document now shows a right panel
with:
- The sentence text (stripped of markdown links)
- Related nodes (clickable links to /nodes/{id}) resolved from the
  referenced_nodes lookup
- Closest facts (clickable links to /facts/{id}) with embedding
  similarity percentage

All data comes from the already-loaded document response — no extra
API calls needed. The badge on each sentence shows counts like "2n 5f"
(2 nodes, 5 facts).

Changes:
- Add fact_links (fact_id + distance) to SynthesisSentenceResponse
  in both API and frontend types
- Rewrite SynthesisDocument to show inline detail panel
- Also parse fact links [text](/facts/uuid) in sentence text
- Remove SynthesisFactPanel (no longer needed)
- Remove getSentenceFacts API call (data is already in the response)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Initial document load now only includes sentence text, fact_count,
and node_ids (lightweight). Fact links (fact_id + distance) are
fetched on demand when a sentence is clicked via the
/syntheses/{id}/sentences/{position}/facts endpoint.

This gives a more responsive initial experience — the document
renders immediately, and fact details load only when needed.

- Remove fact_links from SynthesisSentenceResponse (API + types)
- Restore getSentenceFacts API call for lazy loading
- Show loading spinner while fetching facts
- Node list still loads instantly (already in the document response)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add node_ids filter to QdrantFactRepository.search_similar() using
MatchAny on the node_ids payload field. The synthesis pipeline now
only searches facts linked to nodes the agent visited, keeping
results relevant and improving search performance.

- Add node_ids parameter to search_similar() and _build_filter()
- Import MatchAny from qdrant_client.models
- Fix argument order in link_facts_by_embedding call
- Fix r.id -> r.fact_id for FactSearchResult

Note: lefthook pyright check fails on pre-existing errors in
base.py, engine.py, results.py — not from this change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace plain text sentence rendering with react-markdown that renders
the full definition with proper headings, paragraphs, bold, links,
and lists. Sentence-level interactivity is overlaid via custom markdown
components that match text nodes to sentences from the processed
document data.

- Use react-markdown with custom p, li, a, h1-h3 components
- InteractiveText component wraps text nodes with click handlers
  and badges when they match a sentence with facts/nodes
- Internal links (/nodes/*, /facts/*) render as Next.js Links
- Sentence matching uses first 80 chars as key for fuzzy matching
- Preserves lazy-loading of fact links on sentence click

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-text-node sentence matching (which broke on inline links/bold)
with paragraph-level matching. Each paragraph is matched to the sentences
it contains by comparing stripped plain text.

- buildParagraphSentenceMap strips markdown formatting from both
  paragraph and sentence text before matching
- Entire paragraphs are clickable, showing aggregated fact/node counts
- Right panel shows all sentences in the clicked paragraph
- Each sentence is expandable to see its nodes and lazy-loaded facts
- Links within paragraphs remain clickable (stopPropagation)
- Headings render properly without losing content

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… only

- Headings (h1, h2, h3) are now clickable like paragraphs, showing
  a badge with node count when they match sentences with linked nodes
- Right panel simplified: shows just the list of unique related nodes
  for the clicked section (no sentence cards, no fact loading)
- Removed unused fact loading state and imports (Loader2, FileText,
  SentenceFactLink, getSentenceFacts)
- All section types (p, h1-h3, li) use shared getSectionInfo helper

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charlie83Gs and others added 23 commits March 27, 2026 08:10
Move the paragraph preview out of the scrollable area into a sticky
shrink-0 section above the nodes/facts. The paragraph stays visible
at the top of the dialog while scrolling through evidence below,
making it easy to compare the text to its linked facts.

- Subtle background (stone-50/50) to visually separate from content
- max-h-24 with overflow-y-auto for long paragraphs
- Show up to 400 chars (was 300)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Downloads an HTML file styled for print-to-PDF with:

Document body:
- Serif typography (Georgia) with academic styling
- Node references as blue superscript numbers [1]
- Fact citations as brown superscript numbers [1] inserted after
  the relevant sentence

Evidence Citations section (page break):
- Disclaimer about LLM fact extraction (atomic facts, pronoun
  replacement, minor differences from source text)
- Each fact: numbered, with type badge, full content, author,
  source title, and source URL

Annex: Node Definitions (page break):
- Each referenced node: numbered, concept name, type, definition
  text (truncated to 800 chars)

Workflow:
1. Loads full document from API
2. Lazy-loads fact details for all sentences with facts
3. Fetches node definitions from the nodes API
4. Generates structured HTML with print-optimized CSS
5. Downloads as .html file (user can print to PDF)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New API endpoint: GET /api/v1/prompts (public, no auth)
Returns all LLM system prompts used in the knowledge pipeline,
grouped by stage: Synthesis, Fact Decomposition, Node Pipeline,
Composite Nodes.

Each prompt includes: id, name, stage, purpose, and full text.
Loaded via lazy imports from worker packages (try/except for
graceful degradation if a package isn't installed).

Export update:
- Fetches prompts from the API during export
- Adds "Prompt Transparency" section after Node Annexes
- Groups prompts by pipeline stage with explanatory intro
- Shows exact system instructions in monospace code blocks
- Truncates prompts >3000 chars with pointer to API endpoint
- Intro explains why prompt transparency matters for research

API dependencies:
- Add kt-worker-synthesis and kt-worker-nodes to api pyproject.toml
  (needed for prompt string imports only, not workflow dispatch)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Section order: Body → Annexes (definitions) → Evidence → Prompts
Facts moved to after annexes since they are much longer.

Definitions:
- Show full definition text (was truncated to 800 chars)
- Render definitions as HTML via markdownToHTML

Links:
- Node references [N] now link to #node-N in annexes section
- Fact citations [N] now link to #fact-N in evidence section
- Remaining markdown links converted to <a> tags
- Anchor IDs added to annex and fact entries
- Styled: no underline by default, underline on hover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prompts change over time — truncating reduces transparency.
Remove the 3000-char limit and max-height/overflow CSS so
the full prompt text renders completely in the export.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent makes multiple tool calls per node visit (search, get_node,
get_facts, get_edges, get_dimensions), each taking 2 LangGraph steps.
The old formula (budget * 6 + 20) was too tight — budget 15 gave 110
which hit the limit.

New formula: budget * 20 + 50, minimum 200.
- Budget 15 → 350 (was 110)
- Budget 20 → 450 (was 140)
- Super-synth with 5 sub-syntheses → 150 (was 60)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users can now select previous synthesis documents to include in
a super-synthesis, leveraging prior research alongside new scoped
investigations.

Changes:
- Add existing_synthesis_ids to SuperSynthesizerInput (Hatchet model)
- Add to CreateSuperSynthesisRequest (API + frontend types)
- API passes existing_synthesis_ids to workflow dispatch
- Workflow appends existing IDs to the sub-synthesis list before
  the combine step
- Frontend: super-synthesis dialog loads existing syntheses with
  checkbox selection, shows count of selected items

The super-synthesizer agent reads all sub-syntheses (new + existing)
and produces a combined meta-synthesis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With 8 tools and thorough investigation, the agent easily needs
40-60 LangGraph steps per node visited. Budget 15 now gets 900,
budget 20 gets 1200. Minimum 1000 for any run.

- Synthesizer: budget * 60, min 1000
- Super-synthesizer combine: sub_count * 60, min 1000

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the user includes existing syntheses in a super-synthesis,
the reconnaissance task now reads their definitions and includes
summaries in the LLM scope planning prompt. The LLM is instructed
to design scopes that COMPLEMENT the existing research, not
duplicate it.

- Read existing synthesis definitions (first 500 chars each)
- Include them under "Existing Research — DO NOT overlap"
- Prompt explicitly says to not duplicate covered topics

Only user-selected syntheses are shown, not all syntheses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users can now set the exact number of scopes (sub-investigations)
for a super-synthesis. Set to 0 (default) to let the LLM decide
(3-7), or set a specific number to enforce it.

- Add scope_count to SuperSynthesizerInput, API request, and types
- Reconnaissance prompt uses exact count when provided
- Frontend: number input in super-synthesis dialog with explanation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
System prompt updates:
- New section "CRITICAL: Use Your FULL Exploration Budget" at the top
  of the investigation strategy
- Explicit budget allocation: 20% discovery, 50% neighbor exploration,
  25% deep evidence, 5% writing
- Strong emphasis on get_edges() → visit neighbors pattern
- "Do NOT start writing until you have used most of your budget"

Agent behavior:
- post_llm_hook now detects early finish attempts (<60% budget used)
  and redirects the agent to explore neighbors instead
- If agent tries to stop without tool calls and <70% budget used,
  nudges it to keep exploring with specific instructions
- Explicitly suggests get_edges() and neighbor visiting in nudges

This addresses the pattern where agents were only exploring 50% of
their budget before writing, producing shallow syntheses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
List page:
- Sub-syntheses that belong to a supersynthesis are nested below
  their parent with a left border indent
- Top-level only shows standalone syntheses and supersyntheses
- Compact card style for nested children
- Supersynthesis cards show "X sub-syntheses" count

Dialog:
- Default scope count changed from 0 to 5

API:
- SynthesisListItem now includes sub_synthesis_ids

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The early-finish prevention (<60% budget) was causing loops where
the agent tried to finish, got blocked, tried again, got blocked...
burning through the recursion limit without making progress.

Changes:
- Recursion limit: budget * 100, minimum 2000 (was budget * 60, min 1000)
  Budget 15 → 2000, Budget 20 → 2000
- Remove finish_synthesis blocking — let the agent write when ready
- Reduce nudge threshold from 70% to 50% — only nudge when clearly
  underexplored, not when the agent has done decent investigation
- System prompt already strongly encourages full budget usage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
QdrantFactRepository.search_similar() returns FactSearchResult objects
with fact_id/score/fact_type attributes, not raw Qdrant points with
id/payload. Every search_facts call was failing with:
  'FactSearchResult' object has no attribute 'id'

This caused agents to retry search_facts repeatedly, wasting
recursion steps without getting results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BaseAgent.agent_node now logs before each LLM call:
  [synthesizer] LLM call: 45/60 messages, ~12k tokens
  (trimmed count / total count, approximate token estimate)

SynthesizerAgent.check_budget_exhaustion now logs on every iteration:
  [synthesizer] budget: 8/15 visited (7 remaining), messages: 45

This helps diagnose:
- Token growth rate per tool call
- Whether messages are being trimmed
- Budget consumption patterns
- Potential nudge loops (many iterations with no budget change)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: check_budget_exhaustion and post_llm_hook were causing
infinite loops:

1. check_budget_exhaustion returned a HumanMessage nudge at
   remaining <= 3. With route_nudges_to_agent=True, this routed
   back to agent_node, which called check_budget_exhaustion again,
   which returned another nudge → infinite loop burning 2 messages
   per iteration.

2. post_llm_hook nudged when the LLM responded without tool calls.
   If the LLM kept responding with text (no tools), each nudge
   routed back → another text response → another nudge → loop.

Fixes:
- check_budget_exhaustion: only nudge at budget exhaustion, and
  skip if the last message is already our nudge text. Removed the
  remaining<=3 nudge entirely.
- post_llm_hook: count recent consecutive HumanMessages. If 2+
  nudges already in the last 6 messages, stop nudging and let the
  agent end naturally.

This explains why 12 sub-syntheses hit recursion limit 2000 with
only 9/12 nodes visited — the loops were consuming ~1900 of the
2000 steps with zero progress.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The old query agent (worker-query) handled nudges correctly:
1. route_nudges_to_agent = False — nudges go to END, not back
   to agent, preventing infinite loops
2. Send-once pattern: scan message history for the nudge text
   before sending, so it's never sent twice

Applied to both SynthesizerAgent and SuperSynthesizerAgent:
- route_nudges_to_agent = False (was True, causing loops)
- check_budget_exhaustion: scans for 'BUDGET EXHAUSTED' in
  message history before sending nudge (send-once pattern)
- post_llm_hook: simplified since nudges now go to END

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without nudge loops, actual step usage is predictable:
~5 tool calls per node × 2 steps each = ~10 steps/node + overhead.

New formula: budget * 30, minimum 500.
- Budget 12 → 500
- Budget 15 → 500
- Budget 20 → 600

Was budget * 100, min 2000 — that masked the nudge loop bug.
Now if the limit is hit, it's a real problem worth investigating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The search_facts bug fix (b8d4b79) accidentally removed fact content
and node_ids from the output — the agent could only see fact IDs
and types, losing the ability to read what facts say and which
nodes they connect.

Now fetches fact content from the DB and node links from NodeFact
after the Qdrant similarity search. Output shows:
- [claim] (score=0.85) First 200 chars of fact content...
  nodes: [uuid1, uuid2]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two export buttons in the synthesis header:
- HTML: downloads the formatted document as an .html file
  (single continuous document, good for archival)
- PDF: opens the document in a new window and triggers the
  browser print dialog (Save as PDF option)

Both generate the same HTML with all sections (body, annexes,
evidence citations, prompt transparency).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused score_map variable in search_facts tool
- Remove unused FileText and X imports from SynthesisDocument
- Remove unused useState import from NodeDetailPanel
- Auto-format 11 files with ruff format

All ruff check/format and frontend lint pass cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
React 19 types ReactElement.props as {} not any, so .children
doesn't exist. Use any cast for this runtime introspection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charlie83Gs charlie83Gs merged commit e10059a into main Mar 27, 2026
16 checks passed
@charlie83Gs charlie83Gs deleted the synthesis-agents-refactor branch March 27, 2026 16:47
charlie83Gs added a commit that referenced this pull request Mar 27, 2026
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charlie83Gs added a commit that referenced this pull request Mar 27, 2026
… removal

The conversations module was removed in PR #39 but research.py and
export.py still imported _conversation_to_response from it, causing a
500 error on POST /api/v1/research/bottom-up/prepare.

Inline the helper functions in research.py and update export.py to
import from research instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant