feat: query planning + multi-hop decomposition (Phase 2.1 + 2.2) by hallelx2 · Pull Request #17 · hallelx2/vectorless-engine

hallelx2 · 2026-05-27T01:01:06Z

Summary

Adds a Phase 2.1 query planner (pkg/retrieval/plan.go): one short LLM call before retrieval that returns a structured Plan (intent, entities, expected document areas, multi-hop flag, sub-questions). Cached in a per-process LRU keyed on (query, model) so repeat questions don't burn budget; default capacity 128.
Adds a Phase 2.2 multi-hop decomposer (pkg/retrieval/decompose.go): when the plan is multi-hop, runs the wrapped Strategy once per sub-question and unions the per-sub-question selections in stable first-seen order. Strategy-agnostic — composes on top of single-pass, chunked-tree, agentic, or the cached wrapper. Falls through transparently when the plan is missing, non-multi-hop, or has no sub-questions.
Wires both into /v1/query and /v1/answer via a new enable_planning request-body field (per-request override) and a retrieval.planning config block (server-side default). The synthesis prompt grows a "Planner notes" section when a plan is present so synthesis sees the same structured understanding retrieval used. Responses surface the sanitised plan under a top-level plan key (omitempty).

Design rationale

Planning is its own short LLM call. Folding planner intent into the selection prompt is left for Phase 2.5; today the selection model still sees the original query unchanged. This keeps the change additive — a regression in the planner cannot degrade selection quality.
is_multi_hop is conservative by design. The prompt biases toward false: a single question that mentions two things is not multi-hop; a compound question that requires combining two distinct retrievals is. Over-firing here would double LLM cost without quality wins. The parser also self-corrects is_multi_hop=true with empty sub_questions back to false.
Cache miss must never block on a writer race. A sync.Mutex serialises writes for the same key so concurrent identical queries fold to one LLM call, but the underlying cache.LRU is mutex-guarded for atomicity. Cache failures (e.g. zero-capacity LRU) are silent — the next call simply re-issues the LLM call. Plans are returned as defensive copies so a caller mutating Entities or SubQuestions can't corrupt the cached entry.
Planner failures degrade gracefully. Persistent JSON-parse failures → nil plan, retrieval continues with the original query (same pattern as runSelectionWithRetry in single_pass.go). Transport errors are logged but not propagated to the HTTP layer — a planner blip should not 500 an otherwise-working retrieval request.
Per-request opt-in beats config. The body field is a pointer-bool so we can distinguish "absent" (fall back to config) from "explicit false" (force off). The Planner is instantiated whenever an LLM is configured, so opt-in callers work even with planning.enabled: false.

Risk envelope

Default disabled at both config (retrieval.planning.enabled: false) and per-request (enable_planning absent) levels. Existing callers see no behaviour change, no extra LLM calls, no extra latency.
Planner errors do not surface. Transport / parse failures → continue with original query.
Decomposer short-circuits on sub-question errors so retrieval bugs aren't silently masked by the multi-hop loop.
No changes to existing strategies. Planner and decomposer are pure additions composing on top of the existing Strategy / CostStrategy interface.

Test plan

go build ./... clean
go vet ./... clean
go test ./... all green (planner: 11 tests, decomposer: 9 tests, config: planning defaults + env override)
Planner: cache hit/miss, concurrent same-query dedup, retry-on-bad-JSON degrades to nil plan, retry-then-success, transport-error propagation, empty-query no-op, nil-planner safety, defensive cache copy
Decomposer: nil/non-multi-hop/empty-subs fall-through, per-sub-question dispatch order, union dedup with overlap, error short-circuit with partial usage, non-CostStrategy compatibility, end-to-end Planner+Decomposer over real SinglePass
Config: Retrieval.Planning defaults (Enabled=false, CacheSize=128, Decompose=true), env overrides (VLE_RETRIEVAL_PLANNING_*)
OpenAPI: Plan schema added; both Query/Answer request schemas grow enable_planning; both response schemas grow plan ($ref, omitempty)
Manual end-to-end with a live LLM (deferred to staging — opt-in flag means risk of regression is bounded)

Opt-in instructions

Per-request:

POST /v1/answer
{"document_id":"...", "query":"...", "enable_planning": true}

Server-wide (config.yaml):

retrieval:
  planning:
    enabled: true
    model: "gemini-2.0-flash"   # cheap/fast model for the short planning call
    cache_size: 128
    decompose: true

Or env:

VLE_RETRIEVAL_PLANNING_ENABLED=true
VLE_RETRIEVAL_PLANNING_MODEL=gemini-2.0-flash
VLE_RETRIEVAL_PLANNING_DECOMPOSE=true

DO NOT MERGE — review only.

Summary by CodeRabbit

Release Notes

New Features
- Optional query planning: Enable per-request structured plan generation before retrieval via new enable_planning option
- Multi-hop query decomposition for complex queries that break down into sub-questions
- Plan details now included in API responses when planning is enabled
Configuration
- New retrieval.planning configuration block with settings to enable planning, customize cache behavior, and control multi-hop decomposition

Adds pkg/retrieval/plan.go: one LLM call before retrieval that returns a structured Plan (intent, entities, expected_doc_areas, is_multi_hop, sub_questions). Cached on a per-(query, model) basis in an in-process LRU (default 128 entries) so repeat questions don't burn budget. Reuses the runSelectionWithRetry pattern from single_pass.go: persistent JSON-parse failures degrade gracefully to a nil plan + nil error so the caller continues with the original query. Transport errors still bubble. The planning prompt biases conservatively on is_multi_hop — only flags queries that genuinely need decomposition into distinct sub-retrieval passes. The decomposer further self-corrects an is_multi_hop=true with empty sub_questions back to false at parse time.

Adds pkg/retrieval/decompose.go: when a Plan has IsMultiHop=true and non-empty SubQuestions, runs the wrapped Strategy once per sub-question and returns the union of selected IDs in stable first-seen order. Each sub-question is a tighter prompt than the compound original — the selection LLM gets one thing to reason about instead of a multi-part question. Fall-through is transparent: nil plan, IsMultiHop=false, or empty SubQuestions → delegate to Strategy.Select with the original query unchanged. Callers can wire the decomposer unconditionally. Aggregates Usage across sub-questions when the wrapped Strategy implements CostStrategy. Non-CostStrategy fall-back works too (Usage is zero in that case; selection behaviour is identical). Error on any sub-question short-circuits and returns the partial Usage so retrieval bugs aren't silently swallowed by the multi-hop loop.

Server-side opt-in for Phase 2.1 + 2.2. New PlanningBlock under retrieval (enabled, model, cache_size, decompose; env: VLE_RETRIEVAL_PLANNING_*). Default disabled at both config and per-request levels, so existing callers see no behaviour change. Wiring: - api.Deps gains Planner + Planning fields. The body-level enable_planning field (pointer-bool to disambiguate absent from explicit-false) overrides the config block. - handleQuery / handleAnswer route through a small set of helpers (runPlanner, runSelection, runSelectionWithUsage) that fold the Planner output into selection. Multi-hop plans go through a Decomposer wrapping the active Strategy when planning.decompose is true. - /v1/answer's synthesis prompt grows a short "Planner notes" block (intent, entities, expected doc areas, sub-questions) when a plan is present, so the model reasons with the same understanding the retrieval pipeline used. - Both endpoints surface the sanitised Plan in the response under "plan" (omitempty) when planning ran. - cmd/engine instantiates a Planner whenever LLM is configured, so per-request opt-in still works even with planning.enabled=false. - Planner transport errors are LOGGED but not propagated — a planner blip cannot 500 an otherwise-working retrieval request. OpenAPI: Plan schema added; QueryRequest/AnswerRequest get enable_planning; QueryResponse/AnswerResponse get plan ($ref Plan, omitempty). config.example.yaml gets a documented retrieval.planning block. Tests: planning defaults + env-override coverage added to pkg/config/config_test.go. All existing tests still pass.

coderabbitai · 2026-05-27T01:01:19Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8432ae0c-bb78-4ae6-a427-ca7bb44e8f14

📥 Commits

Reviewing files that changed from the base of the PR and between d92db83 and bb32d4e.

📒 Files selected for processing (10)

cmd/engine/main.go
config.example.yaml
internal/api/server.go
openapi.yaml
pkg/config/config.go
pkg/config/config_test.go
pkg/retrieval/decompose.go
pkg/retrieval/decompose_test.go
pkg/retrieval/plan.go
pkg/retrieval/plan_test.go

📝 Walkthrough

Walkthrough

This PR adds a query planning system that enables optional opt-in LLM-based query planning before retrieval. Clients can request structured plans (intent, entities, document areas, multi-hop sub-questions), which drive selection via optional multi-hop decomposition and inform answer synthesis. The planner caches per (query, model) with deduplication for concurrent identical queries, and the decomposer aggregates retrieval results across sub-questions in stable first-seen order.

Changes

Query Planning Feature

Layer / File(s)	Summary
Configuration, data models, and API contracts `pkg/config/config.go`, `config.example.yaml`, `openapi.yaml`	Introduces `PlanningBlock` with enabled/model/cache_size/decompose fields, `Plan` struct with intent/entities/document areas/multi-hop sub-questions, and extends QueryRequest/QueryResponse/AnswerRequest/AnswerResponse with `enable_planning` boolean and `plan` field in responses.
Planner LLM integration and caching `pkg/retrieval/plan.go`, `pkg/retrieval/plan_test.go`	Implements `Planner` with LRU cache per (query, model), per-planner mutex for concurrent deduplication, JSON-mode LLM requests with retry on parse failures, robust `ParsePlan` that tolerates code fences/prose, and defensive clones on cache hit. Tests validate happy path, cache hits/misses, concurrency deduplication, retry behavior, transport errors, empty queries, and cache immutability.
Multi-hop decomposition for plan sub-questions `pkg/retrieval/decompose.go`, `pkg/retrieval/decompose_test.go`	Introduces `Decomposer` wrapping a `Strategy`, executing it per `Plan.SubQuestions` with stable SectionID deduplication and Usage aggregation; falls through to single strategy call when plan is nil/non-multihop/empty. Tests verify fallthrough, per-sub-question dispatch, union deduplication, error short-circuiting, non-cost-strategy behavior, nil-strategy defense, and end-to-end planner+decomposer wiring.
API endpoint wiring for planning and selection `internal/api/server.go`	Updates `Deps` with `Planner` and `Planning` config, adds `enable_planning?: *bool` to request parsing, replaces direct `Strategy.Select` calls with plan-aware `runSelection` and `runSelectionWithUsage` that optionally decompose multi-hop plans, includes plan in responses, passes plan to `synthesiseAnswer` for prompt augmentation, and provides helper functions (`planningEnabled`, `runPlanner`, `runSelection`, `runSelectionWithUsage`, `shouldDecompose`, `writePlanHints`).
Application initialization and dependency injection `cmd/engine/main.go`	Initializes `Planner` in `run()` with configured/default planning model, sets cache size from config, logs planning enablement, and wires `Planner` and `Planning` config into `api.Deps`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A planner arrives with a hop and a gleam,
Breaking queries to sub-questions, a clever retrieval scheme,
Cached plans speed the way, LLM calls unified,
Multi-hop decomposition—the search dream's amplified!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/plan-and-decompose

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Sorry @hallelx2, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

hallelx2 added 3 commits May 27, 2026 01:52

Copilot AI review requested due to automatic review settings May 27, 2026 01:01

Copilot started reviewing on behalf of hallelx2 May 27, 2026 01:01 View session

hallelx2 merged commit 54ae0d5 into main May 27, 2026
4 of 8 checks passed

hallelx2 deleted the feat/plan-and-decompose branch May 27, 2026 01:02

sourcery-ai Bot reviewed May 27, 2026

View reviewed changes

hallelx2 review requested due to automatic review settings May 27, 2026 01:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: query planning + multi-hop decomposition (Phase 2.1 + 2.2)#17

feat: query planning + multi-hop decomposition (Phase 2.1 + 2.2)#17
hallelx2 merged 3 commits into
mainfrom
feat/plan-and-decompose

hallelx2 commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallelx2 commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design rationale

Risk envelope

Test plan

Opt-in instructions

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hallelx2 commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading