feat: query planning + multi-hop decomposition (Phase 2.1 + 2.2)#17
Conversation
Adds pkg/retrieval/plan.go: one LLM call before retrieval that returns a structured Plan (intent, entities, expected_doc_areas, is_multi_hop, sub_questions). Cached on a per-(query, model) basis in an in-process LRU (default 128 entries) so repeat questions don't burn budget. Reuses the runSelectionWithRetry pattern from single_pass.go: persistent JSON-parse failures degrade gracefully to a nil plan + nil error so the caller continues with the original query. Transport errors still bubble. The planning prompt biases conservatively on is_multi_hop — only flags queries that genuinely need decomposition into distinct sub-retrieval passes. The decomposer further self-corrects an is_multi_hop=true with empty sub_questions back to false at parse time.
Adds pkg/retrieval/decompose.go: when a Plan has IsMultiHop=true and non-empty SubQuestions, runs the wrapped Strategy once per sub-question and returns the union of selected IDs in stable first-seen order. Each sub-question is a tighter prompt than the compound original — the selection LLM gets one thing to reason about instead of a multi-part question. Fall-through is transparent: nil plan, IsMultiHop=false, or empty SubQuestions → delegate to Strategy.Select with the original query unchanged. Callers can wire the decomposer unconditionally. Aggregates Usage across sub-questions when the wrapped Strategy implements CostStrategy. Non-CostStrategy fall-back works too (Usage is zero in that case; selection behaviour is identical). Error on any sub-question short-circuits and returns the partial Usage so retrieval bugs aren't silently swallowed by the multi-hop loop.
Server-side opt-in for Phase 2.1 + 2.2. New PlanningBlock under retrieval (enabled, model, cache_size, decompose; env: VLE_RETRIEVAL_PLANNING_*). Default disabled at both config and per-request levels, so existing callers see no behaviour change. Wiring: - api.Deps gains Planner + Planning fields. The body-level enable_planning field (pointer-bool to disambiguate absent from explicit-false) overrides the config block. - handleQuery / handleAnswer route through a small set of helpers (runPlanner, runSelection, runSelectionWithUsage) that fold the Planner output into selection. Multi-hop plans go through a Decomposer wrapping the active Strategy when planning.decompose is true. - /v1/answer's synthesis prompt grows a short "Planner notes" block (intent, entities, expected doc areas, sub-questions) when a plan is present, so the model reasons with the same understanding the retrieval pipeline used. - Both endpoints surface the sanitised Plan in the response under "plan" (omitempty) when planning ran. - cmd/engine instantiates a Planner whenever LLM is configured, so per-request opt-in still works even with planning.enabled=false. - Planner transport errors are LOGGED but not propagated — a planner blip cannot 500 an otherwise-working retrieval request. OpenAPI: Plan schema added; QueryRequest/AnswerRequest get enable_planning; QueryResponse/AnswerResponse get plan ($ref Plan, omitempty). config.example.yaml gets a documented retrieval.planning block. Tests: planning defaults + env-override coverage added to pkg/config/config_test.go. All existing tests still pass.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThis PR adds a query planning system that enables optional opt-in LLM-based query planning before retrieval. Clients can request structured plans (intent, entities, document areas, multi-hop sub-questions), which drive selection via optional multi-hop decomposition and inform answer synthesis. The planner caches per (query, model) with deduplication for concurrent identical queries, and the decomposer aggregates retrieval results across sub-questions in stable first-seen order. ChangesQuery Planning Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
pkg/retrieval/plan.go): one short LLM call before retrieval that returns a structuredPlan(intent, entities, expected document areas, multi-hop flag, sub-questions). Cached in a per-process LRU keyed on (query, model) so repeat questions don't burn budget; default capacity 128.pkg/retrieval/decompose.go): when the plan is multi-hop, runs the wrapped Strategy once per sub-question and unions the per-sub-question selections in stable first-seen order. Strategy-agnostic — composes on top of single-pass, chunked-tree, agentic, or the cached wrapper. Falls through transparently when the plan is missing, non-multi-hop, or has no sub-questions./v1/queryand/v1/answervia a newenable_planningrequest-body field (per-request override) and aretrieval.planningconfig block (server-side default). The synthesis prompt grows a "Planner notes" section when a plan is present so synthesis sees the same structured understanding retrieval used. Responses surface the sanitised plan under a top-levelplankey (omitempty).Design rationale
is_multi_hopis conservative by design. The prompt biases toward false: a single question that mentions two things is not multi-hop; a compound question that requires combining two distinct retrievals is. Over-firing here would double LLM cost without quality wins. The parser also self-correctsis_multi_hop=truewith emptysub_questionsback to false.sync.Mutexserialises writes for the same key so concurrent identical queries fold to one LLM call, but the underlyingcache.LRUis mutex-guarded for atomicity. Cache failures (e.g. zero-capacity LRU) are silent — the next call simply re-issues the LLM call. Plans are returned as defensive copies so a caller mutatingEntitiesorSubQuestionscan't corrupt the cached entry.runSelectionWithRetryinsingle_pass.go). Transport errors are logged but not propagated to the HTTP layer — a planner blip should not 500 an otherwise-working retrieval request.planning.enabled: false.Risk envelope
retrieval.planning.enabled: false) and per-request (enable_planningabsent) levels. Existing callers see no behaviour change, no extra LLM calls, no extra latency.Strategy/CostStrategyinterface.Test plan
go build ./...cleango vet ./...cleango test ./...all green (planner: 11 tests, decomposer: 9 tests, config: planning defaults + env override)Retrieval.Planningdefaults (Enabled=false, CacheSize=128, Decompose=true), env overrides (VLE_RETRIEVAL_PLANNING_*)Planschema added; both Query/Answer request schemas growenable_planning; both response schemas growplan($ref, omitempty)Opt-in instructions
Per-request:
Server-wide (
config.yaml):Or env:
DO NOT MERGE — review only.
Summary by CodeRabbit
Release Notes
New Features
enable_planningoptionConfiguration
retrieval.planningconfiguration block with settings to enable planning, customize cache behavior, and control multi-hop decomposition