pkg/retrieval: add agentic-navigation strategy (Phase 1.4) by hallelx2 · Pull Request #13 · hallelx2/vectorless-engine

hallelx2 · 2026-05-26T21:54:07Z

Summary

Adds a third retrieval strategy, agentic, alongside single-pass and chunked-tree. Instead of feeding the whole tree to the model in one shot or fanning out across slices in parallel, the agentic strategy lets the model navigate the tree iteratively: it emits a JSON action each turn (outline, expand, read, done) and we feed the observation back as the next user turn until the model picks a final set of section IDs or hits MaxHops.

This trades sequential latency for the ability to:

handle trees that don't fit any single context window (no MaxTokensPerCall ceiling)
read full bodies on demand, so a narrow query doesn't pay for the whole tree
adapt traversal depth per-query (a query that's obviously about chapter 3 doesn't burn tokens reading chapters 1-2)

What changed

pkg/retrieval/agentic.go — new AgenticStrategy implementing both Strategy and CostStrategy. Includes ContentFetcher interface (storage adapter), tolerant ParseAction JSON decoder, and the full navigation loop with graceful degradation on malformed actions.
pkg/retrieval/strategy.go — Result now carries HopsTaken int (json hops_taken,omitempty). Existing strategies set it to 1.
pkg/config/config.go — new RetrievalConfig.Agentic{ MaxHops int; Model string } block; default MaxHops = 6; env overrides VLE_RETRIEVAL_AGENTIC_MAX_HOPS and VLE_RETRIEVAL_AGENTIC_MODEL; Validate() accepts "agentic" and rejects negative MaxHops.
cmd/engine/main.go + cmd/server/main.go — buildStrategy now takes the storage.Storage and adds an "agentic" case. A tiny storageFetcher adapter satisfies ContentFetcher by reading the storage object into a []byte.

Tests

pkg/retrieval/agentic_test.go adds a scriptedLLM (turn-by-turn canned replies) and covers:

TestAgenticHappyPath — expand → read → done; asserts ids, hops, reasoning, and that the read observation actually carries body content
TestAgenticExpandExpandDone — covers the plan's manual trace (two-level navigation without read)
TestAgenticHopCap — model never emits done; strategy stops at MaxHops with empty selection
TestAgenticBadJSONGraceful — every reply is prose, never JSON; returns nil error + empty selection
TestAgenticFiltersUnknownPicks — fabricated IDs dropped, picks deduplicated
TestAgenticEmptyTree — early-return guard, 0 LLM calls
TestAgenticReadFallbackWhenNoContent — read on an internal section falls back to its summary
TestParseAction — code-fence + prose-before tolerance, validates required action field

Full suite: go test ./pkg/retrieval/... passes.

Tradeoffs

Latency vs quality: the loop is N sequential LLM calls (1 outline + Ks expands + Ks reads + 1 done). On Sonnet that's about N × 1-3s. For small documents, single-pass is faster. For large ones where chunked-tree would have to read everything in parallel anyway, agentic typically wins on both quality and total tokens.
Context window vs hops: unlike chunked-tree which needs MaxTokensPerCall, agentic just needs enough room for one observation + the running message history. We bound this implicitly via MaxHops (default 6); each read is the heaviest observation.
Native tool calling: llmgate v0.2.0 declares ToolDef/ToolCall but provider adapters don't populate ToolCalls yet. We use a JSON-action text protocol instead so the strategy works across all current providers. The surface stays the same once llmgate wires native tool calling — ParseAction becomes an internal fast-path, and the loop body is unchanged.
Failure modes: a model that won't emit valid JSON degrades gracefully (empty selection, MaxHops capped, no 500), matching the existing pattern in runSelectionWithRetry. A model that fabricates IDs has them dropped by filterToTreeIDs.

Summary by Sourcery

Add an agentic, tool-like retrieval strategy that incrementally navigates large document trees, hardens existing retrieval strategies against malformed LLM JSON, and enhances PDF parsing and summarization to improve retrieval quality on complex filings and long sections.

New Features:

Introduce agentic navigation retrieval strategy that iteratively explores document trees via outline/expand/read/done actions and tracks hops taken

Bug Fixes:

Make single-pass and chunked-tree retrieval strategies robust to non-JSON LLM responses by retrying and degrading gracefully to empty selections instead of surfacing errors

Enhancements:

Improve PDF parser heading detection by using bold font heuristics, handling letter-spaced text, and relaxing heading length limits
Automatically split oversized leaf sections into smaller, titled sub-sections to improve downstream summarization and retrieval quality
Refine section summarization prompts to produce retrieval-optimized, longer single-sentence summaries with richer concrete details
Track the number of LLM turns (hops) taken by retrieval strategies and expose it in retrieval results

Tests:

Add comprehensive tests for the agentic retrieval strategy, including happy paths, hop caps, malformed JSON handling, ID filtering, and empty trees
Add tests covering PDF leaf chunking behavior and chunk title derivation
Extend retrieval tests to cover graceful handling of non-JSON responses in single-pass selection

Summary by CodeRabbit

New Features
- Added agentic retrieval strategy enabling intelligent multi-turn LLM interactions to navigate and retrieve content from document hierarchies.
- Configurable agentic mode via environment variables and YAML with adjustable maximum hops and model selection.
API Enhancements
- Response objects now include hop count metrics showing the number of LLM turns consumed during retrieval operations.

A new Strategy/CostStrategy implementation that lets the LLM explore the tree iteratively via JSON action turns (outline, expand, read, done) instead of being handed the whole view at once. Useful for trees too large to fit any single context window or where read-on- demand keeps token cost low for narrow queries. Result gains a HopsTaken field so callers can distinguish single-shot strategies from iterative ones; single-pass and chunked-tree set it to 1. Wiring: - config.RetrievalConfig.Agentic{ MaxHops, Model }; defaults to 6 hops, falls back to budget.ModelName when Model is empty - VLE_RETRIEVAL_AGENTIC_MAX_HOPS / VLE_RETRIEVAL_AGENTIC_MODEL env overrides - buildStrategy in cmd/engine + cmd/server now takes the storage so the agentic strategy can satisfy 'read' via a small ContentFetcher adapter Protocol: JSON-action text protocol rather than llmgate's Tools field; llmgate v0.2.0 declares ToolDef/ToolCall as scaffolding but provider adapters don't yet populate ToolCalls. The strategy stays forward-compatible — when llmgate wires native tool calling the surface here doesn't change. Tests use a scriptedLLM that returns canned per-turn replies and cover: happy-path expand/read/done, expand/expand/done (matches the plan's manual trace), MaxHops cap, graceful degradation on persistent non-JSON output, fabricated-ID filtering, empty tree guard, summary fallback for 'read' on internal nodes, and the tolerant ParseAction decoder.

coderabbitai · 2026-05-26T21:54:13Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4663fa82-2971-4416-b01a-8076745a4de1

📥 Commits

Reviewing files that changed from the base of the PR and between 2a672cb and 4f7b5d8.

📒 Files selected for processing (8)

cmd/engine/main.go
cmd/server/main.go
pkg/config/config.go
pkg/retrieval/agentic.go
pkg/retrieval/agentic_test.go
pkg/retrieval/chunked_tree.go
pkg/retrieval/single_pass.go
pkg/retrieval/strategy.go

📝 Walkthrough

Walkthrough

This PR introduces an LLM-powered agentic retrieval mode that enables interactive document navigation. It updates the Result schema to track hop counts, adds configuration support for the new strategy, implements a multi-turn agent loop with action parsing and observation rendering, provides comprehensive test coverage, and integrates the new strategy into engine and server initialization paths.

Changes

Agentic Retrieval Strategy Introduction

Layer / File(s)	Summary
Result schema and hop tracking `pkg/retrieval/strategy.go`	Result struct gains JSON serialization tags (`selected_ids`, `reasoning`, `model_used`, `usage`) and a new `HopsTaken` field to report LLM turn counts to API clients.
Agentic configuration schema and validation `pkg/config/config.go`	RetrievalConfig adds an Agentic block with MaxHops (default 6) and optional Model overrides; environment variables `VLE_RETRIEVAL_AGENTIC_MAX_HOPS` and `VLE_RETRIEVAL_AGENTIC_MODEL` are supported; Config.Validate() accepts `agentic` as a retrieval strategy and enforces non-negative hop counts.
Existing strategies adapted for HopsTaken `pkg/retrieval/single_pass.go`, `pkg/retrieval/chunked_tree.go`	SinglePass and ChunkedTree now populate Result.HopsTaken=1 to indicate a single LLM pass.
Agentic strategy implementation `pkg/retrieval/agentic.go`	Implements ContentFetcher interface, AgenticStrategy struct, and multi-turn SelectWithCost loop that builds initial prompts, calls LLM.Complete iteratively, parses JSON actions (outline/expand/read/done), renders observations, fetches section content on-demand, and deduplicates/filters final picks against tree IDs. Includes action parsing, prompt helpers (renderOutline, renderExpand, renderRead), and observation formatting.
Agentic strategy tests `pkg/retrieval/agentic_test.go`	Provides test doubles (scriptedLLM, mapFetcher) and a fixed retrieval tree fixture; tests happy path (expand→read→done), minimal navigation (expand→expand→done), hop capping, graceful non-JSON handling, ID filtering/deduplication, empty tree early return, and read fallback to summaries. Includes unit tests for ParseAction covering JSON, code-fenced JSON, embedded JSON, and error cases.
Engine and server integration `cmd/engine/main.go`, `cmd/server/main.go`	Extends buildStrategy functions to accept storage dependency and add storageFetcher adapters that implement ContentFetcher by fetching from storage.Storage and materializing readers into byte slices. Both entry points now pass configured storage to strategy builders, enabling agentic mode to fetch content at runtime.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit hops through documents deep,
With LLM turns that count and keep,
Content fetched in agentic dance,
Expanding trees at every glance,
Done actions seal the search with care—
One strategy to rule them all, fair! 🐰

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agentic-retrieval

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai · 2026-05-26T21:54:16Z

Reviewer's Guide

Adds an agentic, tool-like navigation retrieval strategy, improves PDF parsing for SEC-style filings, hardens selection JSON parsing with retries, and adjusts summarization and config to support the new strategy and better retrieval quality.

File-Level Changes

Change	Details	Files
Introduce agentic, multi-hop retrieval strategy with JSON action protocol and content fetching, including comprehensive tests.	Add AgenticStrategy implementing Strategy and CostStrategy using iterative outline/expand/read/done JSON actions. Define ContentFetcher and storage-backed adapters to lazily fetch section bodies for read actions. Track hops taken and filter/normalize picked IDs, with graceful degradation on malformed actions. Add scriptedLLM and map-based fetcher tests covering happy paths, hop caps, bad JSON, ID filtering, empty trees, and read fallbacks. Expose tolerant ParseAction JSON decoder accepting fenced or prose-wrapped actions.	`pkg/retrieval/agentic.go` `pkg/retrieval/agentic_test.go`
Make selection strategies resilient to non-JSON LLM responses while tracking usage and hop counts.	Refactor single-pass and chunked-tree selection to share runSelectionWithRetry with configurable retries. Ensure selection parse failures degrade to empty selections with logging instead of surfacing 500s. Include HopsTaken field in retrieval.Result and set it for single-pass and chunked-tree strategies. Add tests to validate graceful handling of non-JSON single-pass replies and correct usage accounting.	`pkg/retrieval/single_pass.go` `pkg/retrieval/chunked_tree.go` `pkg/retrieval/strategy.go` `pkg/retrieval/retrieval_test.go`
Extend retrieval configuration and engine wiring to support the agentic strategy with MaxHops and model overrides.	Add AgenticBlock to RetrievalConfig with MaxHops and Model, defaulting MaxHops and validating non-negative values. Wire environment overrides for agentic max hops and model via VLE_RETRIEVAL_AGENTIC_MAX_HOPS and VLE_RETRIEVAL_AGENTIC_MODEL. Update engine and server buildStrategy to accept storage and construct AgenticStrategy with storage-backed ContentFetcher. Ensure caching and default strategy behavior remain compatible when agentic is disabled.	`pkg/config/config.go` `cmd/engine/main.go` `cmd/server/main.go`
Improve PDF parsing for SEC filings by detecting bold headings and splitting oversized leaf sections into titled chunks with letter-spacing fixes.	Treat bold rows at body font size as headings by inferring a bold level and integrating it into heading detection. Detect letter-spaced text like 'U N I T E D S T A T E S' and collapse it back to normal words. Introduce chunkOversizedLeaves to split long leaf sections by word boundaries into smaller titled subsections, with helper functions for splitting and deriving titles. Track boldness at the glyph level via isBoldFont and collapseLetterSpacing, and relax heading word-count limits to handle long filing headings. Add tests ensuring oversized leaves are chunked, small sections remain intact, recursion behaves correctly, and chunk titles derive from colon headers or fall back.	`pkg/parser/pdf.go` `pkg/parser/chunk_test.go`
Retarget section summarization prompts toward retrieval quality and adjust token limits.	Increase MaxTokens for section summaries to allow richer, more detailed retrieval-oriented sentences. Rewrite user and system prompts to emphasize concrete topics, entities, identifiers, and key items and a single ≤60-word sentence. Specialize prompts for research and medical profiles while keeping retrieval-focused guidance consistent.	`pkg/ingest/ingest.go`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

The new agentic strategy currently ignores the ContextBudget token limits (it hardcodes MaxTokens: 1024 and never checks budget.Available()), which can allow retrieval calls to exceed configured budgets; consider plumbing the budget through and capping per-hop tokens accordingly.
Both single_pass.go and agentic.go use log.Printf inside the pkg/retrieval package, which makes logging behavior harder to control from the host application; consider either using the existing structured logger pattern (e.g., slog) or accepting a logger interface so callers can control logging.
The storageFetcher adapter is duplicated in both cmd/engine/main.go and cmd/server/main.go; factoring it into a shared helper (e.g., in a small internal package) would reduce duplication and keep the agentic wiring consistent between binaries.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new agentic strategy currently ignores the `ContextBudget` token limits (it hardcodes `MaxTokens: 1024` and never checks `budget.Available()`), which can allow retrieval calls to exceed configured budgets; consider plumbing the budget through and capping per-hop tokens accordingly.
- Both `single_pass.go` and `agentic.go` use `log.Printf` inside the `pkg/retrieval` package, which makes logging behavior harder to control from the host application; consider either using the existing structured logger pattern (e.g., `slog`) or accepting a logger interface so callers can control logging.
- The `storageFetcher` adapter is duplicated in both `cmd/engine/main.go` and `cmd/server/main.go`; factoring it into a shared helper (e.g., in a small internal package) would reduce duplication and keep the agentic wiring consistent between binaries.

## Individual Comments

### Comment 1
<location path="pkg/parser/pdf.go" line_range="286-295" />
<code_context>
+
+// splitContentByWords breaks a long string into pieces near target size at
+// word boundaries. The last piece may be smaller; pieces are never midword.
+func splitContentByWords(s string, target int) []string {
+	s = strings.TrimSpace(s)
+	if target < 200 {
+		target = 200
+	}
+	slack := target / 4
+	if len(s) <= target+slack {
+		return []string{s}
+	}
+	var chunks []string
+	for len(s) > 0 {
+		if len(s) <= target+slack {
+			chunks = append(chunks, strings.TrimSpace(s))
+			break
+		}
+		upper := target + slack
+		if upper > len(s) {
+			upper = len(s)
+		}
+		cut := strings.LastIndex(s[:upper], " ")
+		if cut < target/2 {
+			cut = upper // no good break: hard-cut at upper bound
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid slicing strings on byte indices to prevent breaking multi-byte runes in non-ASCII text

Because `upper` is calculated in bytes and used directly in `s[:upper]`, any case where `cut = upper` risks slicing through the middle of a multi-byte UTF-8 rune, producing invalid UTF-8 and corrupting non-ASCII content. To avoid this, compute the split position using rune indices (e.g., iterate over `[]rune(s)` or range over the string) and then convert back to a byte index, or otherwise ensure the cut only occurs at valid rune boundaries when no suitable space is found.
</issue_to_address>

### Comment 2
<location path="cmd/engine/main.go" line_range="270-276" />
<code_context>
 	}
 }

+// storageFetcher adapts a storage.Storage to retrieval.ContentFetcher.
+// The agentic strategy reads section bodies one at a time, so we
+// materialize the full reader contents into a []byte here rather than
+// streaming — section bodies are typically a few KB.
+type storageFetcher struct{ s storage.Storage }
+
+func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
+	rc, _, err := sf.s.Get(ctx, ref)
+	if err != nil {
</code_context>
<issue_to_address>
**suggestion:** Factor storageFetcher into a shared helper to avoid duplication between engine and server binaries

The `storageFetcher` type and its `Get` method are duplicated here and in `cmd/server/main.go`. To avoid divergence if retrieval semantics change (compression, limits, metrics, etc.), extract this adapter into a shared package (or the retrieval package) and use it from both binaries.

Suggested implementation:

```golang
	fetcher := retrieval.NewStorageFetcher(store)
	strategy := buildStrategy(cfg.Retrieval, llmClient, fetcher)

```

To fully implement the refactor and remove duplication between the engine and server binaries, you should also:

1. Extract the `storageFetcher` adapter into a shared helper, for example in the `retrieval` package:
   - Add something like:
     ```go
     // In package retrieval
     type storageFetcher struct{ s storage.Storage }

     func NewStorageFetcher(s storage.Storage) ContentFetcher {
         return storageFetcher{s: s}
     }

     func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
         rc, _, err := sf.s.Get(ctx, ref)
         if err != nil {
             return nil, err
         }
         defer rc.Close()

         data, err := io.ReadAll(rc)
         if err != nil {
             return nil, err
         }
         return data, nil
     }
     ```
   - Adjust types/imports to match the existing `retrieval` and `storage` packages.

2. In `cmd/engine/main.go`, remove the now-duplicated local `storageFetcher` type and its `Get` method entirely.

3. In `cmd/server/main.go`, replace its local `storageFetcher` implementation and usage with the shared helper:
   - Delete the duplicate type/method.
   - Change any call sites to use `retrieval.NewStorageFetcher(store)` (or whatever package name you choose).

4. Ensure both binaries import the package where `NewStorageFetcher` lives and that `buildStrategy` (and any other consumers) accept a `retrieval.ContentFetcher` rather than `storage.Storage` directly, if they don't already.
</issue_to_address>

### Comment 3
<location path="pkg/retrieval/retrieval_test.go" line_range="129" />
<code_context>
+// When the model returns prose without any JSON (Gemini's occasional JSON-mode
+// blip), the strategy must retry and then degrade gracefully — empty selection
+// with no error — instead of bubbling the parse failure up as a 500.
+func TestSinglePassGracefulOnNonJSON(t *testing.T) {
+	tr := buildTree()
+	m := &mockLLM{reply: "The most relevant section is the one about debt securities."}
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test where a non-JSON first reply is followed by a valid JSON reply to verify the retry path succeeds, not only the graceful-failure path.

This currently only covers the "persistent parse failure" case (all attempts non-JSON) and validates the graceful empty-selection behavior. Please also add a test where the first response is prose and a subsequent response is valid JSON (e.g. `reply[0] = "The most relevant..."`, `reply[1] = valid JSON`) to confirm `runSelectionWithRetry` successfully recovers, returning parsed IDs and aggregated usage instead of always degrading to an empty selection when the initial response is non-JSON.

Suggested implementation:

```golang
	}
	// 1 initial attempt + 2 retries = 3 LLM calls, all counted in usage.
}

// When the model first returns prose (non-JSON) and then returns valid JSON on
// retry, the strategy should successfully recover — returning parsed IDs and
// aggregated usage instead of degrading to an empty selection.
func TestSinglePassRecoverOnNonJSONThenJSON(t *testing.T) {
	tr := buildTree()
	m := &mockLLM{
		// First attempt: non-JSON prose (JSON-mode blip).
		// Second attempt: valid JSON response with selected IDs.
		reply: []string{
			"The most relevant section is the one about debt securities.",
			`{"selected_ids":["id-1","id-2"]}`,
		},
	}
	s := retrieval.NewSinglePass(m)

	res, err := s.SelectWithCost(context.Background(), tr, "q", retrieval.ContextBudget{MaxTokens: 1000})
	if err != nil {
		t.Fatalf("want nil error when JSON parse succeeds on retry, got %v", err)
	}
	if len(res.SelectedIDs) == 0 {
		t.Fatalf("want non-empty selection when JSON parse succeeds on retry, got %v", res.SelectedIDs)
	}
}

```

To make this test compile and correctly exercise the retry path, you will also need to update the `mockLLM` implementation in this file:

1. Change `mockLLM` to support multiple replies, e.g.:
   - Add fields like `replies []string` and `call int`.
   - Keep the existing `reply string` field for backward compatibility.
2. In the `Complete`/`Generate` method (whatever `mockLLM` implements for the LLM interface), return:
   - `replies[call]` if `len(replies) > 0` and `call < len(replies)`, then increment `call`.
   - Otherwise fall back to the existing single `reply` behavior.
3. Optionally expose a `calls int` counter if you want to assert the number of retries in this or other tests.

Example shape of the struct adaptation (adjust names/signatures to match your actual code):

```go
type mockLLM struct {
    // existing fields...
    reply   string      // existing single-reply field; keep for old tests
    replies []string    // new: sequence of replies for successive calls
    call    int         // new: current call index
}

func (m *mockLLM) Complete(ctx context.Context, prompt string, opts ...Option) (string, Usage, error) {
    var r string
    if len(m.replies) > 0 && m.call < len(m.replies) {
        r = m.replies[m.call]
        m.call++
    } else {
        r = m.reply
    }
    // existing logic that wraps r into the appropriate return type / usage
}
```

Adjust the method name, return types, and usage accounting to match the existing `mockLLM` implementation and interfaces used by `retrieval.NewSinglePass`.
</issue_to_address>

### Comment 4
<location path="pkg/parser/chunk_test.go" line_range="8" />
<code_context>
+	"testing"
+)
+
+func TestChunkOversizedLeavesSplits(t *testing.T) {
+	// 12 words per "sentence", 5 sentences ~ 60-65 words, ~360 chars; we want
+	// >2400 chars so build it from a longer paragraph + a colon-terminated header.
</code_context>
<issue_to_address>
**suggestion (testing):** Add focused tests for splitContentByWords edge cases (minimal target size, no spaces) to validate chunk boundaries.

Current tests cover `chunkOversizedLeaves` and title derivation, but `splitContentByWords`’s edge branches are only exercised indirectly. Please add direct unit tests for:

- Very small `target` values (e.g. 0 or 50) to verify the `target < 200` clamping and resulting chunk shapes.
- Long input without spaces to cover the “no good break → hard cut at upper bound” path and ensure it doesn’t loop or mis-split.

This will better protect future changes to the chunking logic and word-boundary handling.

Suggested implementation:

```golang
package parser

import (
	"strings"
	"testing"
)

// Test that very small targets are clamped (e.g. to 200) and that the
// resulting chunking behavior is consistent for all "too small" values.
func TestSplitContentByWords_ClampsSmallTarget(t *testing.T) {
	content := strings.Repeat("alpha beta gamma delta ", 30) // plenty of words

	// These targets should all fall under the internal clamp threshold (e.g. < 200)
	chunksTarget0 := splitContentByWords(content, 0)
	chunksTarget50 := splitContentByWords(content, 50)
	chunksTarget199 := splitContentByWords(content, 199)

	// This target should be at or above the clamp threshold and act as the baseline.
	chunksTarget200 := splitContentByWords(content, 200)

	if len(chunksTarget0) == 0 || len(chunksTarget50) == 0 || len(chunksTarget199) == 0 || len(chunksTarget200) == 0 {
		t.Fatalf("expected non-empty chunk lists for all targets; got 0 in one of them")
	}

	// All targets below the clamp threshold should behave identically to the clamped value.
	if len(chunksTarget0) != len(chunksTarget200) {
		t.Errorf("expected same number of chunks for target 0 and 200; got %d vs %d", len(chunksTarget0), len(chunksTarget200))
	}
	if len(chunksTarget50) != len(chunksTarget200) {
		t.Errorf("expected same number of chunks for target 50 and 200; got %d vs %d", len(chunksTarget50), len(chunksTarget200))
	}
	if len(chunksTarget199) != len(chunksTarget200) {
		t.Errorf("expected same number of chunks for target 199 and 200; got %d vs %d", len(chunksTarget199), len(chunksTarget200))
	}

	// Also assert that concatenating all chunks yields the original content, so we
	// don't silently lose or duplicate data when clamping.
	joinAndCompare := func(label string, chunks []string) {
		got := strings.Join(chunks, "")
		if got != content {
			t.Errorf("%s: reconstructed content mismatch; len(got)=%d, len(want)=%d", label, len(got), len(content))
		}
	}
	joinAndCompare("target=0", chunksTarget0)
	joinAndCompare("target=50", chunksTarget50)
	joinAndCompare("target=199", chunksTarget199)
	joinAndCompare("target=200", chunksTarget200)
}

// Test that inputs without spaces take the "no good break → hard cut at upper bound"
// path and that we don't loop or mis-split (no empty chunks, full coverage).
func TestSplitContentByWords_NoSpacesHardCut(t *testing.T) {
	// Long string with no spaces to force the hard-cut logic.
	content := strings.Repeat("X", 2500)
	target := 240 // a typical chunk size that should exercise the upper-bound cut

	chunks := splitContentByWords(content, target)
	if len(chunks) == 0 {
		t.Fatalf("expected at least one chunk for long content without spaces")
	}

	totalLen := 0
	for i, ch := range chunks {
		if ch == "" {
			t.Fatalf("chunk %d is empty; expected hard-cut logic to avoid empty chunks", i)
		}
		totalLen += len(ch)

		// No chunk should exceed some reasonable multiple of target; for a hard cut
		// implementation we expect at most target*2, and generally ≤ target.
		if len(ch) > target*2 {
			t.Errorf("chunk %d too large: got len=%d, want <= %d", i, len(ch), target*2)
		}
	}

	if totalLen != len(content) {
		t.Errorf("reconstructed length mismatch: got %d, want %d", totalLen, len(content))
	}
}

```

These tests assume the helper has signature:

```go
func splitContentByWords(content string, target int) []string
```

If your actual signature differs (e.g. additional parameters or different return type), adjust the test calls and assertions accordingly. If you prefer to keep all tests in `pkg/parser/chunk_test.go`, you can move these two test functions into that file instead.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-05-26T21:55:58Z

+func splitContentByWords(s string, target int) []string {
+	s = strings.TrimSpace(s)
+	if target < 200 {
+		target = 200
+	}
+	slack := target / 4
+	if len(s) <= target+slack {
+		return []string{s}
+	}
+	var chunks []string


issue (bug_risk): Avoid slicing strings on byte indices to prevent breaking multi-byte runes in non-ASCII text

Because upper is calculated in bytes and used directly in s[:upper], any case where cut = upper risks slicing through the middle of a multi-byte UTF-8 rune, producing invalid UTF-8 and corrupting non-ASCII content. To avoid this, compute the split position using rune indices (e.g., iterate over []rune(s) or range over the string) and then convert back to a byte index, or otherwise ensure the cut only occurs at valid rune boundaries when no suitable space is found.

sourcery-ai · 2026-05-26T21:55:58Z

+// storageFetcher adapts a storage.Storage to retrieval.ContentFetcher.
+// The agentic strategy reads section bodies one at a time, so we
+// materialize the full reader contents into a []byte here rather than
+// streaming — section bodies are typically a few KB.
+type storageFetcher struct{ s storage.Storage }
+
+func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {


suggestion: Factor storageFetcher into a shared helper to avoid duplication between engine and server binaries

The storageFetcher type and its Get method are duplicated here and in cmd/server/main.go. To avoid divergence if retrieval semantics change (compression, limits, metrics, etc.), extract this adapter into a shared package (or the retrieval package) and use it from both binaries.

Suggested implementation:

fetcher := retrieval.NewStorageFetcher(store) strategy := buildStrategy(cfg.Retrieval, llmClient, fetcher)

To fully implement the refactor and remove duplication between the engine and server binaries, you should also:

Extract the storageFetcher adapter into a shared helper, for example in the retrieval package:

Add something like:
// In package retrieval type storageFetcher struct{ s storage.Storage } func NewStorageFetcher(s storage.Storage) ContentFetcher { return storageFetcher{s: s} } func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) { rc, _, err := sf.s.Get(ctx, ref) if err != nil { return nil, err } defer rc.Close() data, err := io.ReadAll(rc) if err != nil { return nil, err } return data, nil }

Adjust types/imports to match the existing retrieval and storage packages.

In cmd/engine/main.go, remove the now-duplicated local storageFetcher type and its Get method entirely.

In cmd/server/main.go, replace its local storageFetcher implementation and usage with the shared helper:

Delete the duplicate type/method.

Change any call sites to use retrieval.NewStorageFetcher(store) (or whatever package name you choose).

Ensure both binaries import the package where NewStorageFetcher lives and that buildStrategy (and any other consumers) accept a retrieval.ContentFetcher rather than storage.Storage directly, if they don't already.

sourcery-ai · 2026-05-26T21:55:59Z

+// When the model returns prose without any JSON (Gemini's occasional JSON-mode
+// blip), the strategy must retry and then degrade gracefully — empty selection
+// with no error — instead of bubbling the parse failure up as a 500.
+func TestSinglePassGracefulOnNonJSON(t *testing.T) {


suggestion (testing): Add a test where a non-JSON first reply is followed by a valid JSON reply to verify the retry path succeeds, not only the graceful-failure path.

This currently only covers the "persistent parse failure" case (all attempts non-JSON) and validates the graceful empty-selection behavior. Please also add a test where the first response is prose and a subsequent response is valid JSON (e.g. reply[0] = "The most relevant...", reply[1] = valid JSON) to confirm runSelectionWithRetry successfully recovers, returning parsed IDs and aggregated usage instead of always degrading to an empty selection when the initial response is non-JSON.

Suggested implementation:

} // 1 initial attempt + 2 retries = 3 LLM calls, all counted in usage. } // When the model first returns prose (non-JSON) and then returns valid JSON on // retry, the strategy should successfully recover — returning parsed IDs and // aggregated usage instead of degrading to an empty selection. func TestSinglePassRecoverOnNonJSONThenJSON(t *testing.T) { tr := buildTree() m := &mockLLM{ // First attempt: non-JSON prose (JSON-mode blip). // Second attempt: valid JSON response with selected IDs. reply: []string{ "The most relevant section is the one about debt securities.", `{"selected_ids":["id-1","id-2"]}`, }, } s := retrieval.NewSinglePass(m) res, err := s.SelectWithCost(context.Background(), tr, "q", retrieval.ContextBudget{MaxTokens: 1000}) if err != nil { t.Fatalf("want nil error when JSON parse succeeds on retry, got %v", err) } if len(res.SelectedIDs) == 0 { t.Fatalf("want non-empty selection when JSON parse succeeds on retry, got %v", res.SelectedIDs) } }

To make this test compile and correctly exercise the retry path, you will also need to update the mockLLM implementation in this file:

Change mockLLM to support multiple replies, e.g.:

Add fields like replies []string and call int.

Keep the existing reply string field for backward compatibility.

In the Complete/Generate method (whatever mockLLM implements for the LLM interface), return:

replies[call] if len(replies) > 0 and call < len(replies), then increment call.

Otherwise fall back to the existing single reply behavior.

Optionally expose a calls int counter if you want to assert the number of retries in this or other tests.

Example shape of the struct adaptation (adjust names/signatures to match your actual code):

type mockLLM struct { // existing fields... reply string // existing single-reply field; keep for old tests replies []string // new: sequence of replies for successive calls call int // new: current call index } func (m *mockLLM) Complete(ctx context.Context, prompt string, opts ...Option) (string, Usage, error) { var r string if len(m.replies) > 0 && m.call < len(m.replies) { r = m.replies[m.call] m.call++ } else { r = m.reply } // existing logic that wraps r into the appropriate return type / usage }

Adjust the method name, return types, and usage accounting to match the existing mockLLM implementation and interfaces used by retrieval.NewSinglePass.

sourcery-ai · 2026-05-26T21:55:59Z

+	"testing"
+)
+
+func TestChunkOversizedLeavesSplits(t *testing.T) {


suggestion (testing): Add focused tests for splitContentByWords edge cases (minimal target size, no spaces) to validate chunk boundaries.

Current tests cover chunkOversizedLeaves and title derivation, but splitContentByWords’s edge branches are only exercised indirectly. Please add direct unit tests for:

Very small target values (e.g. 0 or 50) to verify the target < 200 clamping and resulting chunk shapes.

Long input without spaces to cover the “no good break → hard cut at upper bound” path and ensure it doesn’t loop or mis-split.

This will better protect future changes to the chunking logic and word-boundary handling.

Suggested implementation:

package parser import ( "strings" "testing" ) // Test that very small targets are clamped (e.g. to 200) and that the // resulting chunking behavior is consistent for all "too small" values. func TestSplitContentByWords_ClampsSmallTarget(t *testing.T) { content := strings.Repeat("alpha beta gamma delta ", 30) // plenty of words // These targets should all fall under the internal clamp threshold (e.g. < 200) chunksTarget0 := splitContentByWords(content, 0) chunksTarget50 := splitContentByWords(content, 50) chunksTarget199 := splitContentByWords(content, 199) // This target should be at or above the clamp threshold and act as the baseline. chunksTarget200 := splitContentByWords(content, 200) if len(chunksTarget0) == 0 || len(chunksTarget50) == 0 || len(chunksTarget199) == 0 || len(chunksTarget200) == 0 { t.Fatalf("expected non-empty chunk lists for all targets; got 0 in one of them") } // All targets below the clamp threshold should behave identically to the clamped value. if len(chunksTarget0) != len(chunksTarget200) { t.Errorf("expected same number of chunks for target 0 and 200; got %d vs %d", len(chunksTarget0), len(chunksTarget200)) } if len(chunksTarget50) != len(chunksTarget200) { t.Errorf("expected same number of chunks for target 50 and 200; got %d vs %d", len(chunksTarget50), len(chunksTarget200)) } if len(chunksTarget199) != len(chunksTarget200) { t.Errorf("expected same number of chunks for target 199 and 200; got %d vs %d", len(chunksTarget199), len(chunksTarget200)) } // Also assert that concatenating all chunks yields the original content, so we // don't silently lose or duplicate data when clamping. joinAndCompare := func(label string, chunks []string) { got := strings.Join(chunks, "") if got != content { t.Errorf("%s: reconstructed content mismatch; len(got)=%d, len(want)=%d", label, len(got), len(content)) } } joinAndCompare("target=0", chunksTarget0) joinAndCompare("target=50", chunksTarget50) joinAndCompare("target=199", chunksTarget199) joinAndCompare("target=200", chunksTarget200) } // Test that inputs without spaces take the "no good break → hard cut at upper bound" // path and that we don't loop or mis-split (no empty chunks, full coverage). func TestSplitContentByWords_NoSpacesHardCut(t *testing.T) { // Long string with no spaces to force the hard-cut logic. content := strings.Repeat("X", 2500) target := 240 // a typical chunk size that should exercise the upper-bound cut chunks := splitContentByWords(content, target) if len(chunks) == 0 { t.Fatalf("expected at least one chunk for long content without spaces") } totalLen := 0 for i, ch := range chunks { if ch == "" { t.Fatalf("chunk %d is empty; expected hard-cut logic to avoid empty chunks", i) } totalLen += len(ch) // No chunk should exceed some reasonable multiple of target; for a hard cut // implementation we expect at most target*2, and generally ≤ target. if len(ch) > target*2 { t.Errorf("chunk %d too large: got len=%d, want <= %d", i, len(ch), target*2) } } if totalLen != len(content) { t.Errorf("reconstructed length mismatch: got %d, want %d", totalLen, len(content)) } }

These tests assume the helper has signature:

func splitContentByWords(content string, target int) []string

If your actual signature differs (e.g. additional parameters or different return type), adjust the test calls and assertions accordingly. If you prefer to keep all tests in pkg/parser/chunk_test.go, you can move these two test functions into that file instead.

Copilot

Pull request overview

This PR expands the retrieval subsystem with a new iterative “agentic-navigation” strategy and related plumbing, while also improving parse-time document structure (PDF heading detection + oversized-leaf chunking) and ingest-time section summaries to boost downstream retrieval quality.

Changes:

Add agentic retrieval strategy with an outline/expand/read/done JSON-action loop, plus tests for hop caps, malformed JSON, ID filtering, and read fallbacks.
Harden single-pass and chunked-tree selection calls against non-JSON LLM outputs via retry + graceful degradation; add Result.HopsTaken for turn accounting.
Improve PDF parsing for SEC-style formatting (bold-as-heading, letter-spacing collapse) and split oversized leaf sections; update summary prompts to be retrieval-optimized; add config + wiring for the new strategy.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
pkg/retrieval/agentic.go	Implements the multi-hop agentic navigation loop, JSON action parsing, and read/expand/outline observations.
pkg/retrieval/agentic_test.go	Adds deterministic turn-by-turn tests covering agentic behavior and edge cases.
pkg/retrieval/single_pass.go	Adds selection retry+degrade behavior on JSON parse failures and usage aggregation.
pkg/retrieval/chunked_tree.go	Reuses the shared selection retry+degrade helper and sets `HopsTaken`.
pkg/retrieval/strategy.go	Extends `Result` with JSON tags and a new `HopsTaken` field.
pkg/retrieval/retrieval_test.go	Adds coverage for graceful handling of persistent non-JSON selection output.
pkg/parser/pdf.go	Improves heading detection (bold rows, longer headings), collapses letter-spaced text, and chunks oversized leaf sections.
pkg/parser/chunk_test.go	Adds tests for oversized-leaf chunking and chunk title derivation.
pkg/ingest/ingest.go	Updates summarization prompts to produce retrieval-optimized summaries and increases token cap.
pkg/config/config.go	Adds `RetrievalConfig.Agentic` block, env overrides, and validation for the agentic strategy.
cmd/server/main.go	Wires `agentic` strategy and adds a `storageFetcher` adapter for `read` actions.
cmd/engine/main.go	Same wiring + `storageFetcher` adapter for the standalone engine binary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+	for hop := 0; hop < maxHops; hop++ {
+		req := llmgate.Request{
+			Model:       model,
+			Messages:    msgs,
+			MaxTokens:   1024,
+			Temperature: 0,
+		}


+			// Graceful degradation: a malformed action doesn't 500 the
+			// query — we ask the model to retry once with a stronger
+			// instruction. If the next turn still misparses, we abort
+			// and return whatever the model has picked so far (often
+			// nothing). Matches the runSelectionWithRetry pattern from
+			// single_pass.go.
+			log.Printf("retrieval: agentic hop %d action parse failed: %v", hop+1, parseErr)


+	// Ran out of hops without a `done` action. Return whatever IDs the
+	// model proposed in the last action (if any) plus the hop count so
+	// the caller can see the cap was hit.
+	log.Printf("retrieval: agentic strategy hit max_hops=%d without done; returning %d ids", maxHops, len(finalIDs))


+func (a *AgenticStrategy) renderRead(ctx context.Context, t *tree.Tree, id tree.SectionID) (string, bool) {
+	sec := t.FindByID(id)
+	if sec == nil {


+	data, err := a.Fetcher.Get(ctx, sec.ContentRef)
+	if err != nil {
+		return fmt.Sprintf("error reading section %s: %v", sec.ID, err), true
+	}
+	body := string(data)
+	body = strings.TrimSpace(body)
+	if body == "" && sec.Summary != "" {
+		return fmt.Sprintf("section %s body was empty; summary:\n%s", sec.ID, sec.Summary), true
+	}
+	return fmt.Sprintf("section %s (%s):\n%s", sec.ID, sec.Title, body), true


+	// HopsTaken is the number of LLM turns the strategy issued to reach the
+	// final selection. Single-shot strategies set this to 1; iterative
+	// strategies (e.g. agentic) set it to the number of tool-using turns
+	// actually consumed, including the terminal "done" turn.


+// storageFetcher adapts a storage.Storage to retrieval.ContentFetcher.
+// The agentic strategy reads section bodies one at a time, so we
+// materialize the full reader contents into a []byte here rather than
+// streaming — section bodies are typically a few KB.
+type storageFetcher struct{ s storage.Storage }
+
+func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
+	rc, _, err := sf.s.Get(ctx, ref)
+	if err != nil {
+		return nil, err
+	}
+	defer rc.Close()
+	return io.ReadAll(rc)
+}


+// storageFetcher adapts a storage.Storage to retrieval.ContentFetcher.
+// The agentic strategy reads section bodies one at a time, so we
+// materialize the full reader contents into a []byte here rather than
+// streaming — section bodies are typically a few KB.
+type storageFetcher struct{ s storage.Storage }
+
+func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
+	rc, _, err := sf.s.Get(ctx, ref)
+	if err != nil {
+		return nil, err
+	}
+	defer rc.Close()
+	return io.ReadAll(rc)
+}


Copilot AI review requested due to automatic review settings May 26, 2026 21:54

Copilot started reviewing on behalf of hallelx2 May 26, 2026 21:54 View session

sourcery-ai Bot reviewed May 26, 2026

View reviewed changes

Copilot AI reviewed May 26, 2026

View reviewed changes

Merge branch 'main' into feat/agentic-retrieval

4f7b5d8

hallelx2 merged commit 5b31b9d into main May 26, 2026
5 of 8 checks passed

hallelx2 deleted the feat/agentic-retrieval branch May 26, 2026 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/retrieval: add agentic-navigation strategy (Phase 1.4)#13

pkg/retrieval: add agentic-navigation strategy (Phase 1.4)#13
hallelx2 merged 2 commits into
mainfrom
feat/agentic-retrieval

hallelx2 commented May 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

sourcery-ai Bot commented May 26, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot May 26, 2026

Uh oh!

sourcery-ai Bot May 26, 2026

Uh oh!

sourcery-ai Bot May 26, 2026

Uh oh!

sourcery-ai Bot May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hallelx2 commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Tests

Tradeoffs

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

sourcery-ai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hallelx2 commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading

sourcery-ai Bot commented May 26, 2026 •

edited

Loading