Skip to content

pkg/retrieval: add agentic-navigation strategy (Phase 1.4)#13

Merged
hallelx2 merged 2 commits into
mainfrom
feat/agentic-retrieval
May 26, 2026
Merged

pkg/retrieval: add agentic-navigation strategy (Phase 1.4)#13
hallelx2 merged 2 commits into
mainfrom
feat/agentic-retrieval

Conversation

@hallelx2
Copy link
Copy Markdown
Owner

@hallelx2 hallelx2 commented May 26, 2026

Summary

Adds a third retrieval strategy, agentic, alongside single-pass and chunked-tree. Instead of feeding the whole tree to the model in one shot or fanning out across slices in parallel, the agentic strategy lets the model navigate the tree iteratively: it emits a JSON action each turn (outline, expand, read, done) and we feed the observation back as the next user turn until the model picks a final set of section IDs or hits MaxHops.

This trades sequential latency for the ability to:

  • handle trees that don't fit any single context window (no MaxTokensPerCall ceiling)
  • read full bodies on demand, so a narrow query doesn't pay for the whole tree
  • adapt traversal depth per-query (a query that's obviously about chapter 3 doesn't burn tokens reading chapters 1-2)

What changed

  • pkg/retrieval/agentic.go — new AgenticStrategy implementing both Strategy and CostStrategy. Includes ContentFetcher interface (storage adapter), tolerant ParseAction JSON decoder, and the full navigation loop with graceful degradation on malformed actions.
  • pkg/retrieval/strategy.goResult now carries HopsTaken int (json hops_taken,omitempty). Existing strategies set it to 1.
  • pkg/config/config.go — new RetrievalConfig.Agentic{ MaxHops int; Model string } block; default MaxHops = 6; env overrides VLE_RETRIEVAL_AGENTIC_MAX_HOPS and VLE_RETRIEVAL_AGENTIC_MODEL; Validate() accepts "agentic" and rejects negative MaxHops.
  • cmd/engine/main.go + cmd/server/main.gobuildStrategy now takes the storage.Storage and adds an "agentic" case. A tiny storageFetcher adapter satisfies ContentFetcher by reading the storage object into a []byte.

Tests

pkg/retrieval/agentic_test.go adds a scriptedLLM (turn-by-turn canned replies) and covers:

  • TestAgenticHappyPath — expand → read → done; asserts ids, hops, reasoning, and that the read observation actually carries body content
  • TestAgenticExpandExpandDone — covers the plan's manual trace (two-level navigation without read)
  • TestAgenticHopCap — model never emits done; strategy stops at MaxHops with empty selection
  • TestAgenticBadJSONGraceful — every reply is prose, never JSON; returns nil error + empty selection
  • TestAgenticFiltersUnknownPicks — fabricated IDs dropped, picks deduplicated
  • TestAgenticEmptyTree — early-return guard, 0 LLM calls
  • TestAgenticReadFallbackWhenNoContentread on an internal section falls back to its summary
  • TestParseAction — code-fence + prose-before tolerance, validates required action field

Full suite: go test ./pkg/retrieval/... passes.

Tradeoffs

  • Latency vs quality: the loop is N sequential LLM calls (1 outline + Ks expands + Ks reads + 1 done). On Sonnet that's about N × 1-3s. For small documents, single-pass is faster. For large ones where chunked-tree would have to read everything in parallel anyway, agentic typically wins on both quality and total tokens.
  • Context window vs hops: unlike chunked-tree which needs MaxTokensPerCall, agentic just needs enough room for one observation + the running message history. We bound this implicitly via MaxHops (default 6); each read is the heaviest observation.
  • Native tool calling: llmgate v0.2.0 declares ToolDef/ToolCall but provider adapters don't populate ToolCalls yet. We use a JSON-action text protocol instead so the strategy works across all current providers. The surface stays the same once llmgate wires native tool calling — ParseAction becomes an internal fast-path, and the loop body is unchanged.
  • Failure modes: a model that won't emit valid JSON degrades gracefully (empty selection, MaxHops capped, no 500), matching the existing pattern in runSelectionWithRetry. A model that fabricates IDs has them dropped by filterToTreeIDs.

Summary by Sourcery

Add an agentic, tool-like retrieval strategy that incrementally navigates large document trees, hardens existing retrieval strategies against malformed LLM JSON, and enhances PDF parsing and summarization to improve retrieval quality on complex filings and long sections.

New Features:

  • Introduce agentic navigation retrieval strategy that iteratively explores document trees via outline/expand/read/done actions and tracks hops taken

Bug Fixes:

  • Make single-pass and chunked-tree retrieval strategies robust to non-JSON LLM responses by retrying and degrading gracefully to empty selections instead of surfacing errors

Enhancements:

  • Improve PDF parser heading detection by using bold font heuristics, handling letter-spaced text, and relaxing heading length limits
  • Automatically split oversized leaf sections into smaller, titled sub-sections to improve downstream summarization and retrieval quality
  • Refine section summarization prompts to produce retrieval-optimized, longer single-sentence summaries with richer concrete details
  • Track the number of LLM turns (hops) taken by retrieval strategies and expose it in retrieval results

Tests:

  • Add comprehensive tests for the agentic retrieval strategy, including happy paths, hop caps, malformed JSON handling, ID filtering, and empty trees
  • Add tests covering PDF leaf chunking behavior and chunk title derivation
  • Extend retrieval tests to cover graceful handling of non-JSON responses in single-pass selection

Summary by CodeRabbit

  • New Features

    • Added agentic retrieval strategy enabling intelligent multi-turn LLM interactions to navigate and retrieve content from document hierarchies.
    • Configurable agentic mode via environment variables and YAML with adjustable maximum hops and model selection.
  • API Enhancements

    • Response objects now include hop count metrics showing the number of LLM turns consumed during retrieval operations.

A new Strategy/CostStrategy implementation that lets the LLM explore
the tree iteratively via JSON action turns (outline, expand, read,
done) instead of being handed the whole view at once. Useful for
trees too large to fit any single context window or where read-on-
demand keeps token cost low for narrow queries.

Result gains a HopsTaken field so callers can distinguish single-shot
strategies from iterative ones; single-pass and chunked-tree set it
to 1.

Wiring:
- config.RetrievalConfig.Agentic{ MaxHops, Model }; defaults to 6
  hops, falls back to budget.ModelName when Model is empty
- VLE_RETRIEVAL_AGENTIC_MAX_HOPS / VLE_RETRIEVAL_AGENTIC_MODEL env
  overrides
- buildStrategy in cmd/engine + cmd/server now takes the storage so
  the agentic strategy can satisfy 'read' via a small ContentFetcher
  adapter

Protocol: JSON-action text protocol rather than llmgate's Tools
field; llmgate v0.2.0 declares ToolDef/ToolCall as scaffolding but
provider adapters don't yet populate ToolCalls. The strategy stays
forward-compatible — when llmgate wires native tool calling the
surface here doesn't change.

Tests use a scriptedLLM that returns canned per-turn replies and
cover: happy-path expand/read/done, expand/expand/done (matches the
plan's manual trace), MaxHops cap, graceful degradation on
persistent non-JSON output, fabricated-ID filtering, empty tree
guard, summary fallback for 'read' on internal nodes, and the
tolerant ParseAction decoder.
Copilot AI review requested due to automatic review settings May 26, 2026 21:54
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4663fa82-2971-4416-b01a-8076745a4de1

📥 Commits

Reviewing files that changed from the base of the PR and between 2a672cb and 4f7b5d8.

📒 Files selected for processing (8)
  • cmd/engine/main.go
  • cmd/server/main.go
  • pkg/config/config.go
  • pkg/retrieval/agentic.go
  • pkg/retrieval/agentic_test.go
  • pkg/retrieval/chunked_tree.go
  • pkg/retrieval/single_pass.go
  • pkg/retrieval/strategy.go

📝 Walkthrough

Walkthrough

This PR introduces an LLM-powered agentic retrieval mode that enables interactive document navigation. It updates the Result schema to track hop counts, adds configuration support for the new strategy, implements a multi-turn agent loop with action parsing and observation rendering, provides comprehensive test coverage, and integrates the new strategy into engine and server initialization paths.

Changes

Agentic Retrieval Strategy Introduction

Layer / File(s) Summary
Result schema and hop tracking
pkg/retrieval/strategy.go
Result struct gains JSON serialization tags (selected_ids, reasoning, model_used, usage) and a new HopsTaken field to report LLM turn counts to API clients.
Agentic configuration schema and validation
pkg/config/config.go
RetrievalConfig adds an Agentic block with MaxHops (default 6) and optional Model overrides; environment variables VLE_RETRIEVAL_AGENTIC_MAX_HOPS and VLE_RETRIEVAL_AGENTIC_MODEL are supported; Config.Validate() accepts agentic as a retrieval strategy and enforces non-negative hop counts.
Existing strategies adapted for HopsTaken
pkg/retrieval/single_pass.go, pkg/retrieval/chunked_tree.go
SinglePass and ChunkedTree now populate Result.HopsTaken=1 to indicate a single LLM pass.
Agentic strategy implementation
pkg/retrieval/agentic.go
Implements ContentFetcher interface, AgenticStrategy struct, and multi-turn SelectWithCost loop that builds initial prompts, calls LLM.Complete iteratively, parses JSON actions (outline/expand/read/done), renders observations, fetches section content on-demand, and deduplicates/filters final picks against tree IDs. Includes action parsing, prompt helpers (renderOutline, renderExpand, renderRead), and observation formatting.
Agentic strategy tests
pkg/retrieval/agentic_test.go
Provides test doubles (scriptedLLM, mapFetcher) and a fixed retrieval tree fixture; tests happy path (expand→read→done), minimal navigation (expand→expand→done), hop capping, graceful non-JSON handling, ID filtering/deduplication, empty tree early return, and read fallback to summaries. Includes unit tests for ParseAction covering JSON, code-fenced JSON, embedded JSON, and error cases.
Engine and server integration
cmd/engine/main.go, cmd/server/main.go
Extends buildStrategy functions to accept storage dependency and add storageFetcher adapters that implement ContentFetcher by fetching from storage.Storage and materializing readers into byte slices. Both entry points now pass configured storage to strategy builders, enabling agentic mode to fetch content at runtime.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit hops through documents deep,
With LLM turns that count and keep,
Content fetched in agentic dance,
Expanding trees at every glance,
Done actions seal the search with care—
One strategy to rule them all, fair! 🐰

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agentic-retrieval

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 26, 2026

Reviewer's Guide

Adds an agentic, tool-like navigation retrieval strategy, improves PDF parsing for SEC-style filings, hardens selection JSON parsing with retries, and adjusts summarization and config to support the new strategy and better retrieval quality.

File-Level Changes

Change Details Files
Introduce agentic, multi-hop retrieval strategy with JSON action protocol and content fetching, including comprehensive tests.
  • Add AgenticStrategy implementing Strategy and CostStrategy using iterative outline/expand/read/done JSON actions.
  • Define ContentFetcher and storage-backed adapters to lazily fetch section bodies for read actions.
  • Track hops taken and filter/normalize picked IDs, with graceful degradation on malformed actions.
  • Add scriptedLLM and map-based fetcher tests covering happy paths, hop caps, bad JSON, ID filtering, empty trees, and read fallbacks.
  • Expose tolerant ParseAction JSON decoder accepting fenced or prose-wrapped actions.
pkg/retrieval/agentic.go
pkg/retrieval/agentic_test.go
Make selection strategies resilient to non-JSON LLM responses while tracking usage and hop counts.
  • Refactor single-pass and chunked-tree selection to share runSelectionWithRetry with configurable retries.
  • Ensure selection parse failures degrade to empty selections with logging instead of surfacing 500s.
  • Include HopsTaken field in retrieval.Result and set it for single-pass and chunked-tree strategies.
  • Add tests to validate graceful handling of non-JSON single-pass replies and correct usage accounting.
pkg/retrieval/single_pass.go
pkg/retrieval/chunked_tree.go
pkg/retrieval/strategy.go
pkg/retrieval/retrieval_test.go
Extend retrieval configuration and engine wiring to support the agentic strategy with MaxHops and model overrides.
  • Add AgenticBlock to RetrievalConfig with MaxHops and Model, defaulting MaxHops and validating non-negative values.
  • Wire environment overrides for agentic max hops and model via VLE_RETRIEVAL_AGENTIC_MAX_HOPS and VLE_RETRIEVAL_AGENTIC_MODEL.
  • Update engine and server buildStrategy to accept storage and construct AgenticStrategy with storage-backed ContentFetcher.
  • Ensure caching and default strategy behavior remain compatible when agentic is disabled.
pkg/config/config.go
cmd/engine/main.go
cmd/server/main.go
Improve PDF parsing for SEC filings by detecting bold headings and splitting oversized leaf sections into titled chunks with letter-spacing fixes.
  • Treat bold rows at body font size as headings by inferring a bold level and integrating it into heading detection.
  • Detect letter-spaced text like 'U N I T E D S T A T E S' and collapse it back to normal words.
  • Introduce chunkOversizedLeaves to split long leaf sections by word boundaries into smaller titled subsections, with helper functions for splitting and deriving titles.
  • Track boldness at the glyph level via isBoldFont and collapseLetterSpacing, and relax heading word-count limits to handle long filing headings.
  • Add tests ensuring oversized leaves are chunked, small sections remain intact, recursion behaves correctly, and chunk titles derive from colon headers or fall back.
pkg/parser/pdf.go
pkg/parser/chunk_test.go
Retarget section summarization prompts toward retrieval quality and adjust token limits.
  • Increase MaxTokens for section summaries to allow richer, more detailed retrieval-oriented sentences.
  • Rewrite user and system prompts to emphasize concrete topics, entities, identifiers, and key items and a single ≤60-word sentence.
  • Specialize prompts for research and medical profiles while keeping retrieval-focused guidance consistent.
pkg/ingest/ingest.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The new agentic strategy currently ignores the ContextBudget token limits (it hardcodes MaxTokens: 1024 and never checks budget.Available()), which can allow retrieval calls to exceed configured budgets; consider plumbing the budget through and capping per-hop tokens accordingly.
  • Both single_pass.go and agentic.go use log.Printf inside the pkg/retrieval package, which makes logging behavior harder to control from the host application; consider either using the existing structured logger pattern (e.g., slog) or accepting a logger interface so callers can control logging.
  • The storageFetcher adapter is duplicated in both cmd/engine/main.go and cmd/server/main.go; factoring it into a shared helper (e.g., in a small internal package) would reduce duplication and keep the agentic wiring consistent between binaries.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new agentic strategy currently ignores the `ContextBudget` token limits (it hardcodes `MaxTokens: 1024` and never checks `budget.Available()`), which can allow retrieval calls to exceed configured budgets; consider plumbing the budget through and capping per-hop tokens accordingly.
- Both `single_pass.go` and `agentic.go` use `log.Printf` inside the `pkg/retrieval` package, which makes logging behavior harder to control from the host application; consider either using the existing structured logger pattern (e.g., `slog`) or accepting a logger interface so callers can control logging.
- The `storageFetcher` adapter is duplicated in both `cmd/engine/main.go` and `cmd/server/main.go`; factoring it into a shared helper (e.g., in a small internal package) would reduce duplication and keep the agentic wiring consistent between binaries.

## Individual Comments

### Comment 1
<location path="pkg/parser/pdf.go" line_range="286-295" />
<code_context>
+
+// splitContentByWords breaks a long string into pieces near target size at
+// word boundaries. The last piece may be smaller; pieces are never midword.
+func splitContentByWords(s string, target int) []string {
+	s = strings.TrimSpace(s)
+	if target < 200 {
+		target = 200
+	}
+	slack := target / 4
+	if len(s) <= target+slack {
+		return []string{s}
+	}
+	var chunks []string
+	for len(s) > 0 {
+		if len(s) <= target+slack {
+			chunks = append(chunks, strings.TrimSpace(s))
+			break
+		}
+		upper := target + slack
+		if upper > len(s) {
+			upper = len(s)
+		}
+		cut := strings.LastIndex(s[:upper], " ")
+		if cut < target/2 {
+			cut = upper // no good break: hard-cut at upper bound
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid slicing strings on byte indices to prevent breaking multi-byte runes in non-ASCII text

Because `upper` is calculated in bytes and used directly in `s[:upper]`, any case where `cut = upper` risks slicing through the middle of a multi-byte UTF-8 rune, producing invalid UTF-8 and corrupting non-ASCII content. To avoid this, compute the split position using rune indices (e.g., iterate over `[]rune(s)` or range over the string) and then convert back to a byte index, or otherwise ensure the cut only occurs at valid rune boundaries when no suitable space is found.
</issue_to_address>

### Comment 2
<location path="cmd/engine/main.go" line_range="270-276" />
<code_context>
 	}
 }

+// storageFetcher adapts a storage.Storage to retrieval.ContentFetcher.
+// The agentic strategy reads section bodies one at a time, so we
+// materialize the full reader contents into a []byte here rather than
+// streaming — section bodies are typically a few KB.
+type storageFetcher struct{ s storage.Storage }
+
+func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
+	rc, _, err := sf.s.Get(ctx, ref)
+	if err != nil {
</code_context>
<issue_to_address>
**suggestion:** Factor storageFetcher into a shared helper to avoid duplication between engine and server binaries

The `storageFetcher` type and its `Get` method are duplicated here and in `cmd/server/main.go`. To avoid divergence if retrieval semantics change (compression, limits, metrics, etc.), extract this adapter into a shared package (or the retrieval package) and use it from both binaries.

Suggested implementation:

```golang
	fetcher := retrieval.NewStorageFetcher(store)
	strategy := buildStrategy(cfg.Retrieval, llmClient, fetcher)

```

To fully implement the refactor and remove duplication between the engine and server binaries, you should also:

1. Extract the `storageFetcher` adapter into a shared helper, for example in the `retrieval` package:
   - Add something like:
     ```go
     // In package retrieval
     type storageFetcher struct{ s storage.Storage }

     func NewStorageFetcher(s storage.Storage) ContentFetcher {
         return storageFetcher{s: s}
     }

     func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
         rc, _, err := sf.s.Get(ctx, ref)
         if err != nil {
             return nil, err
         }
         defer rc.Close()

         data, err := io.ReadAll(rc)
         if err != nil {
             return nil, err
         }
         return data, nil
     }
     ```
   - Adjust types/imports to match the existing `retrieval` and `storage` packages.

2. In `cmd/engine/main.go`, remove the now-duplicated local `storageFetcher` type and its `Get` method entirely.

3. In `cmd/server/main.go`, replace its local `storageFetcher` implementation and usage with the shared helper:
   - Delete the duplicate type/method.
   - Change any call sites to use `retrieval.NewStorageFetcher(store)` (or whatever package name you choose).

4. Ensure both binaries import the package where `NewStorageFetcher` lives and that `buildStrategy` (and any other consumers) accept a `retrieval.ContentFetcher` rather than `storage.Storage` directly, if they don't already.
</issue_to_address>

### Comment 3
<location path="pkg/retrieval/retrieval_test.go" line_range="129" />
<code_context>
+// When the model returns prose without any JSON (Gemini's occasional JSON-mode
+// blip), the strategy must retry and then degrade gracefully — empty selection
+// with no error — instead of bubbling the parse failure up as a 500.
+func TestSinglePassGracefulOnNonJSON(t *testing.T) {
+	tr := buildTree()
+	m := &mockLLM{reply: "The most relevant section is the one about debt securities."}
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test where a non-JSON first reply is followed by a valid JSON reply to verify the retry path succeeds, not only the graceful-failure path.

This currently only covers the "persistent parse failure" case (all attempts non-JSON) and validates the graceful empty-selection behavior. Please also add a test where the first response is prose and a subsequent response is valid JSON (e.g. `reply[0] = "The most relevant..."`, `reply[1] = valid JSON`) to confirm `runSelectionWithRetry` successfully recovers, returning parsed IDs and aggregated usage instead of always degrading to an empty selection when the initial response is non-JSON.

Suggested implementation:

```golang
	}
	// 1 initial attempt + 2 retries = 3 LLM calls, all counted in usage.
}

// When the model first returns prose (non-JSON) and then returns valid JSON on
// retry, the strategy should successfully recover — returning parsed IDs and
// aggregated usage instead of degrading to an empty selection.
func TestSinglePassRecoverOnNonJSONThenJSON(t *testing.T) {
	tr := buildTree()
	m := &mockLLM{
		// First attempt: non-JSON prose (JSON-mode blip).
		// Second attempt: valid JSON response with selected IDs.
		reply: []string{
			"The most relevant section is the one about debt securities.",
			`{"selected_ids":["id-1","id-2"]}`,
		},
	}
	s := retrieval.NewSinglePass(m)

	res, err := s.SelectWithCost(context.Background(), tr, "q", retrieval.ContextBudget{MaxTokens: 1000})
	if err != nil {
		t.Fatalf("want nil error when JSON parse succeeds on retry, got %v", err)
	}
	if len(res.SelectedIDs) == 0 {
		t.Fatalf("want non-empty selection when JSON parse succeeds on retry, got %v", res.SelectedIDs)
	}
}

```

To make this test compile and correctly exercise the retry path, you will also need to update the `mockLLM` implementation in this file:

1. Change `mockLLM` to support multiple replies, e.g.:
   - Add fields like `replies []string` and `call int`.
   - Keep the existing `reply string` field for backward compatibility.
2. In the `Complete`/`Generate` method (whatever `mockLLM` implements for the LLM interface), return:
   - `replies[call]` if `len(replies) > 0` and `call < len(replies)`, then increment `call`.
   - Otherwise fall back to the existing single `reply` behavior.
3. Optionally expose a `calls int` counter if you want to assert the number of retries in this or other tests.

Example shape of the struct adaptation (adjust names/signatures to match your actual code):

```go
type mockLLM struct {
    // existing fields...
    reply   string      // existing single-reply field; keep for old tests
    replies []string    // new: sequence of replies for successive calls
    call    int         // new: current call index
}

func (m *mockLLM) Complete(ctx context.Context, prompt string, opts ...Option) (string, Usage, error) {
    var r string
    if len(m.replies) > 0 && m.call < len(m.replies) {
        r = m.replies[m.call]
        m.call++
    } else {
        r = m.reply
    }
    // existing logic that wraps r into the appropriate return type / usage
}
```

Adjust the method name, return types, and usage accounting to match the existing `mockLLM` implementation and interfaces used by `retrieval.NewSinglePass`.
</issue_to_address>

### Comment 4
<location path="pkg/parser/chunk_test.go" line_range="8" />
<code_context>
+	"testing"
+)
+
+func TestChunkOversizedLeavesSplits(t *testing.T) {
+	// 12 words per "sentence", 5 sentences ~ 60-65 words, ~360 chars; we want
+	// >2400 chars so build it from a longer paragraph + a colon-terminated header.
</code_context>
<issue_to_address>
**suggestion (testing):** Add focused tests for splitContentByWords edge cases (minimal target size, no spaces) to validate chunk boundaries.

Current tests cover `chunkOversizedLeaves` and title derivation, but `splitContentByWords`’s edge branches are only exercised indirectly. Please add direct unit tests for:

- Very small `target` values (e.g. 0 or 50) to verify the `target < 200` clamping and resulting chunk shapes.
- Long input without spaces to cover the “no good break → hard cut at upper bound” path and ensure it doesn’t loop or mis-split.

This will better protect future changes to the chunking logic and word-boundary handling.

Suggested implementation:

```golang
package parser

import (
	"strings"
	"testing"
)

// Test that very small targets are clamped (e.g. to 200) and that the
// resulting chunking behavior is consistent for all "too small" values.
func TestSplitContentByWords_ClampsSmallTarget(t *testing.T) {
	content := strings.Repeat("alpha beta gamma delta ", 30) // plenty of words

	// These targets should all fall under the internal clamp threshold (e.g. < 200)
	chunksTarget0 := splitContentByWords(content, 0)
	chunksTarget50 := splitContentByWords(content, 50)
	chunksTarget199 := splitContentByWords(content, 199)

	// This target should be at or above the clamp threshold and act as the baseline.
	chunksTarget200 := splitContentByWords(content, 200)

	if len(chunksTarget0) == 0 || len(chunksTarget50) == 0 || len(chunksTarget199) == 0 || len(chunksTarget200) == 0 {
		t.Fatalf("expected non-empty chunk lists for all targets; got 0 in one of them")
	}

	// All targets below the clamp threshold should behave identically to the clamped value.
	if len(chunksTarget0) != len(chunksTarget200) {
		t.Errorf("expected same number of chunks for target 0 and 200; got %d vs %d", len(chunksTarget0), len(chunksTarget200))
	}
	if len(chunksTarget50) != len(chunksTarget200) {
		t.Errorf("expected same number of chunks for target 50 and 200; got %d vs %d", len(chunksTarget50), len(chunksTarget200))
	}
	if len(chunksTarget199) != len(chunksTarget200) {
		t.Errorf("expected same number of chunks for target 199 and 200; got %d vs %d", len(chunksTarget199), len(chunksTarget200))
	}

	// Also assert that concatenating all chunks yields the original content, so we
	// don't silently lose or duplicate data when clamping.
	joinAndCompare := func(label string, chunks []string) {
		got := strings.Join(chunks, "")
		if got != content {
			t.Errorf("%s: reconstructed content mismatch; len(got)=%d, len(want)=%d", label, len(got), len(content))
		}
	}
	joinAndCompare("target=0", chunksTarget0)
	joinAndCompare("target=50", chunksTarget50)
	joinAndCompare("target=199", chunksTarget199)
	joinAndCompare("target=200", chunksTarget200)
}

// Test that inputs without spaces take the "no good break → hard cut at upper bound"
// path and that we don't loop or mis-split (no empty chunks, full coverage).
func TestSplitContentByWords_NoSpacesHardCut(t *testing.T) {
	// Long string with no spaces to force the hard-cut logic.
	content := strings.Repeat("X", 2500)
	target := 240 // a typical chunk size that should exercise the upper-bound cut

	chunks := splitContentByWords(content, target)
	if len(chunks) == 0 {
		t.Fatalf("expected at least one chunk for long content without spaces")
	}

	totalLen := 0
	for i, ch := range chunks {
		if ch == "" {
			t.Fatalf("chunk %d is empty; expected hard-cut logic to avoid empty chunks", i)
		}
		totalLen += len(ch)

		// No chunk should exceed some reasonable multiple of target; for a hard cut
		// implementation we expect at most target*2, and generally ≤ target.
		if len(ch) > target*2 {
			t.Errorf("chunk %d too large: got len=%d, want <= %d", i, len(ch), target*2)
		}
	}

	if totalLen != len(content) {
		t.Errorf("reconstructed length mismatch: got %d, want %d", totalLen, len(content))
	}
}

```

These tests assume the helper has signature:

```go
func splitContentByWords(content string, target int) []string
```

If your actual signature differs (e.g. additional parameters or different return type), adjust the test calls and assertions accordingly. If you prefer to keep all tests in `pkg/parser/chunk_test.go`, you can move these two test functions into that file instead.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread pkg/parser/pdf.go
Comment on lines +286 to +295
func splitContentByWords(s string, target int) []string {
s = strings.TrimSpace(s)
if target < 200 {
target = 200
}
slack := target / 4
if len(s) <= target+slack {
return []string{s}
}
var chunks []string
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Avoid slicing strings on byte indices to prevent breaking multi-byte runes in non-ASCII text

Because upper is calculated in bytes and used directly in s[:upper], any case where cut = upper risks slicing through the middle of a multi-byte UTF-8 rune, producing invalid UTF-8 and corrupting non-ASCII content. To avoid this, compute the split position using rune indices (e.g., iterate over []rune(s) or range over the string) and then convert back to a byte index, or otherwise ensure the cut only occurs at valid rune boundaries when no suitable space is found.

Comment thread cmd/engine/main.go
Comment on lines +270 to +276
// storageFetcher adapts a storage.Storage to retrieval.ContentFetcher.
// The agentic strategy reads section bodies one at a time, so we
// materialize the full reader contents into a []byte here rather than
// streaming — section bodies are typically a few KB.
type storageFetcher struct{ s storage.Storage }

func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Factor storageFetcher into a shared helper to avoid duplication between engine and server binaries

The storageFetcher type and its Get method are duplicated here and in cmd/server/main.go. To avoid divergence if retrieval semantics change (compression, limits, metrics, etc.), extract this adapter into a shared package (or the retrieval package) and use it from both binaries.

Suggested implementation:

	fetcher := retrieval.NewStorageFetcher(store)
	strategy := buildStrategy(cfg.Retrieval, llmClient, fetcher)

To fully implement the refactor and remove duplication between the engine and server binaries, you should also:

  1. Extract the storageFetcher adapter into a shared helper, for example in the retrieval package:

    • Add something like:
      // In package retrieval
      type storageFetcher struct{ s storage.Storage }
      
      func NewStorageFetcher(s storage.Storage) ContentFetcher {
          return storageFetcher{s: s}
      }
      
      func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
          rc, _, err := sf.s.Get(ctx, ref)
          if err != nil {
              return nil, err
          }
          defer rc.Close()
      
          data, err := io.ReadAll(rc)
          if err != nil {
              return nil, err
          }
          return data, nil
      }
    • Adjust types/imports to match the existing retrieval and storage packages.
  2. In cmd/engine/main.go, remove the now-duplicated local storageFetcher type and its Get method entirely.

  3. In cmd/server/main.go, replace its local storageFetcher implementation and usage with the shared helper:

    • Delete the duplicate type/method.
    • Change any call sites to use retrieval.NewStorageFetcher(store) (or whatever package name you choose).
  4. Ensure both binaries import the package where NewStorageFetcher lives and that buildStrategy (and any other consumers) accept a retrieval.ContentFetcher rather than storage.Storage directly, if they don't already.

// When the model returns prose without any JSON (Gemini's occasional JSON-mode
// blip), the strategy must retry and then degrade gracefully — empty selection
// with no error — instead of bubbling the parse failure up as a 500.
func TestSinglePassGracefulOnNonJSON(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test where a non-JSON first reply is followed by a valid JSON reply to verify the retry path succeeds, not only the graceful-failure path.

This currently only covers the "persistent parse failure" case (all attempts non-JSON) and validates the graceful empty-selection behavior. Please also add a test where the first response is prose and a subsequent response is valid JSON (e.g. reply[0] = "The most relevant...", reply[1] = valid JSON) to confirm runSelectionWithRetry successfully recovers, returning parsed IDs and aggregated usage instead of always degrading to an empty selection when the initial response is non-JSON.

Suggested implementation:

	}
	// 1 initial attempt + 2 retries = 3 LLM calls, all counted in usage.
}

// When the model first returns prose (non-JSON) and then returns valid JSON on
// retry, the strategy should successfully recover — returning parsed IDs and
// aggregated usage instead of degrading to an empty selection.
func TestSinglePassRecoverOnNonJSONThenJSON(t *testing.T) {
	tr := buildTree()
	m := &mockLLM{
		// First attempt: non-JSON prose (JSON-mode blip).
		// Second attempt: valid JSON response with selected IDs.
		reply: []string{
			"The most relevant section is the one about debt securities.",
			`{"selected_ids":["id-1","id-2"]}`,
		},
	}
	s := retrieval.NewSinglePass(m)

	res, err := s.SelectWithCost(context.Background(), tr, "q", retrieval.ContextBudget{MaxTokens: 1000})
	if err != nil {
		t.Fatalf("want nil error when JSON parse succeeds on retry, got %v", err)
	}
	if len(res.SelectedIDs) == 0 {
		t.Fatalf("want non-empty selection when JSON parse succeeds on retry, got %v", res.SelectedIDs)
	}
}

To make this test compile and correctly exercise the retry path, you will also need to update the mockLLM implementation in this file:

  1. Change mockLLM to support multiple replies, e.g.:
    • Add fields like replies []string and call int.
    • Keep the existing reply string field for backward compatibility.
  2. In the Complete/Generate method (whatever mockLLM implements for the LLM interface), return:
    • replies[call] if len(replies) > 0 and call < len(replies), then increment call.
    • Otherwise fall back to the existing single reply behavior.
  3. Optionally expose a calls int counter if you want to assert the number of retries in this or other tests.

Example shape of the struct adaptation (adjust names/signatures to match your actual code):

type mockLLM struct {
    // existing fields...
    reply   string      // existing single-reply field; keep for old tests
    replies []string    // new: sequence of replies for successive calls
    call    int         // new: current call index
}

func (m *mockLLM) Complete(ctx context.Context, prompt string, opts ...Option) (string, Usage, error) {
    var r string
    if len(m.replies) > 0 && m.call < len(m.replies) {
        r = m.replies[m.call]
        m.call++
    } else {
        r = m.reply
    }
    // existing logic that wraps r into the appropriate return type / usage
}

Adjust the method name, return types, and usage accounting to match the existing mockLLM implementation and interfaces used by retrieval.NewSinglePass.

Comment thread pkg/parser/chunk_test.go
"testing"
)

func TestChunkOversizedLeavesSplits(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add focused tests for splitContentByWords edge cases (minimal target size, no spaces) to validate chunk boundaries.

Current tests cover chunkOversizedLeaves and title derivation, but splitContentByWords’s edge branches are only exercised indirectly. Please add direct unit tests for:

  • Very small target values (e.g. 0 or 50) to verify the target < 200 clamping and resulting chunk shapes.
  • Long input without spaces to cover the “no good break → hard cut at upper bound” path and ensure it doesn’t loop or mis-split.

This will better protect future changes to the chunking logic and word-boundary handling.

Suggested implementation:

package parser

import (
	"strings"
	"testing"
)

// Test that very small targets are clamped (e.g. to 200) and that the
// resulting chunking behavior is consistent for all "too small" values.
func TestSplitContentByWords_ClampsSmallTarget(t *testing.T) {
	content := strings.Repeat("alpha beta gamma delta ", 30) // plenty of words

	// These targets should all fall under the internal clamp threshold (e.g. < 200)
	chunksTarget0 := splitContentByWords(content, 0)
	chunksTarget50 := splitContentByWords(content, 50)
	chunksTarget199 := splitContentByWords(content, 199)

	// This target should be at or above the clamp threshold and act as the baseline.
	chunksTarget200 := splitContentByWords(content, 200)

	if len(chunksTarget0) == 0 || len(chunksTarget50) == 0 || len(chunksTarget199) == 0 || len(chunksTarget200) == 0 {
		t.Fatalf("expected non-empty chunk lists for all targets; got 0 in one of them")
	}

	// All targets below the clamp threshold should behave identically to the clamped value.
	if len(chunksTarget0) != len(chunksTarget200) {
		t.Errorf("expected same number of chunks for target 0 and 200; got %d vs %d", len(chunksTarget0), len(chunksTarget200))
	}
	if len(chunksTarget50) != len(chunksTarget200) {
		t.Errorf("expected same number of chunks for target 50 and 200; got %d vs %d", len(chunksTarget50), len(chunksTarget200))
	}
	if len(chunksTarget199) != len(chunksTarget200) {
		t.Errorf("expected same number of chunks for target 199 and 200; got %d vs %d", len(chunksTarget199), len(chunksTarget200))
	}

	// Also assert that concatenating all chunks yields the original content, so we
	// don't silently lose or duplicate data when clamping.
	joinAndCompare := func(label string, chunks []string) {
		got := strings.Join(chunks, "")
		if got != content {
			t.Errorf("%s: reconstructed content mismatch; len(got)=%d, len(want)=%d", label, len(got), len(content))
		}
	}
	joinAndCompare("target=0", chunksTarget0)
	joinAndCompare("target=50", chunksTarget50)
	joinAndCompare("target=199", chunksTarget199)
	joinAndCompare("target=200", chunksTarget200)
}

// Test that inputs without spaces take the "no good break → hard cut at upper bound"
// path and that we don't loop or mis-split (no empty chunks, full coverage).
func TestSplitContentByWords_NoSpacesHardCut(t *testing.T) {
	// Long string with no spaces to force the hard-cut logic.
	content := strings.Repeat("X", 2500)
	target := 240 // a typical chunk size that should exercise the upper-bound cut

	chunks := splitContentByWords(content, target)
	if len(chunks) == 0 {
		t.Fatalf("expected at least one chunk for long content without spaces")
	}

	totalLen := 0
	for i, ch := range chunks {
		if ch == "" {
			t.Fatalf("chunk %d is empty; expected hard-cut logic to avoid empty chunks", i)
		}
		totalLen += len(ch)

		// No chunk should exceed some reasonable multiple of target; for a hard cut
		// implementation we expect at most target*2, and generally ≤ target.
		if len(ch) > target*2 {
			t.Errorf("chunk %d too large: got len=%d, want <= %d", i, len(ch), target*2)
		}
	}

	if totalLen != len(content) {
		t.Errorf("reconstructed length mismatch: got %d, want %d", totalLen, len(content))
	}
}

These tests assume the helper has signature:

func splitContentByWords(content string, target int) []string

If your actual signature differs (e.g. additional parameters or different return type), adjust the test calls and assertions accordingly. If you prefer to keep all tests in pkg/parser/chunk_test.go, you can move these two test functions into that file instead.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the retrieval subsystem with a new iterative “agentic-navigation” strategy and related plumbing, while also improving parse-time document structure (PDF heading detection + oversized-leaf chunking) and ingest-time section summaries to boost downstream retrieval quality.

Changes:

  • Add agentic retrieval strategy with an outline/expand/read/done JSON-action loop, plus tests for hop caps, malformed JSON, ID filtering, and read fallbacks.
  • Harden single-pass and chunked-tree selection calls against non-JSON LLM outputs via retry + graceful degradation; add Result.HopsTaken for turn accounting.
  • Improve PDF parsing for SEC-style formatting (bold-as-heading, letter-spacing collapse) and split oversized leaf sections; update summary prompts to be retrieval-optimized; add config + wiring for the new strategy.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pkg/retrieval/agentic.go Implements the multi-hop agentic navigation loop, JSON action parsing, and read/expand/outline observations.
pkg/retrieval/agentic_test.go Adds deterministic turn-by-turn tests covering agentic behavior and edge cases.
pkg/retrieval/single_pass.go Adds selection retry+degrade behavior on JSON parse failures and usage aggregation.
pkg/retrieval/chunked_tree.go Reuses the shared selection retry+degrade helper and sets HopsTaken.
pkg/retrieval/strategy.go Extends Result with JSON tags and a new HopsTaken field.
pkg/retrieval/retrieval_test.go Adds coverage for graceful handling of persistent non-JSON selection output.
pkg/parser/pdf.go Improves heading detection (bold rows, longer headings), collapses letter-spaced text, and chunks oversized leaf sections.
pkg/parser/chunk_test.go Adds tests for oversized-leaf chunking and chunk title derivation.
pkg/ingest/ingest.go Updates summarization prompts to produce retrieval-optimized summaries and increases token cap.
pkg/config/config.go Adds RetrievalConfig.Agentic block, env overrides, and validation for the agentic strategy.
cmd/server/main.go Wires agentic strategy and adds a storageFetcher adapter for read actions.
cmd/engine/main.go Same wiring + storageFetcher adapter for the standalone engine binary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/retrieval/agentic.go
Comment on lines +134 to +140
for hop := 0; hop < maxHops; hop++ {
req := llmgate.Request{
Model: model,
Messages: msgs,
MaxTokens: 1024,
Temperature: 0,
}
Comment thread pkg/retrieval/agentic.go
Comment on lines +163 to +169
// Graceful degradation: a malformed action doesn't 500 the
// query — we ask the model to retry once with a stronger
// instruction. If the next turn still misparses, we abort
// and return whatever the model has picked so far (often
// nothing). Matches the runSelectionWithRetry pattern from
// single_pass.go.
log.Printf("retrieval: agentic hop %d action parse failed: %v", hop+1, parseErr)
Comment thread pkg/retrieval/agentic.go
Comment on lines +236 to +239
// Ran out of hops without a `done` action. Return whatever IDs the
// model proposed in the last action (if any) plus the hop count so
// the caller can see the cap was hit.
log.Printf("retrieval: agentic strategy hit max_hops=%d without done; returning %d ids", maxHops, len(finalIDs))
Comment thread pkg/retrieval/agentic.go
Comment on lines +275 to +277
func (a *AgenticStrategy) renderRead(ctx context.Context, t *tree.Tree, id tree.SectionID) (string, bool) {
sec := t.FindByID(id)
if sec == nil {
Comment thread pkg/retrieval/agentic.go
Comment on lines +288 to +297
data, err := a.Fetcher.Get(ctx, sec.ContentRef)
if err != nil {
return fmt.Sprintf("error reading section %s: %v", sec.ID, err), true
}
body := string(data)
body = strings.TrimSpace(body)
if body == "" && sec.Summary != "" {
return fmt.Sprintf("section %s body was empty; summary:\n%s", sec.ID, sec.Summary), true
}
return fmt.Sprintf("section %s (%s):\n%s", sec.ID, sec.Title, body), true
Comment thread pkg/retrieval/strategy.go
Comment on lines +68 to +71
// HopsTaken is the number of LLM turns the strategy issued to reach the
// final selection. Single-shot strategies set this to 1; iterative
// strategies (e.g. agentic) set it to the number of tool-using turns
// actually consumed, including the terminal "done" turn.
Comment thread cmd/server/main.go
Comment on lines +346 to +359
// storageFetcher adapts a storage.Storage to retrieval.ContentFetcher.
// The agentic strategy reads section bodies one at a time, so we
// materialize the full reader contents into a []byte here rather than
// streaming — section bodies are typically a few KB.
type storageFetcher struct{ s storage.Storage }

func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
rc, _, err := sf.s.Get(ctx, ref)
if err != nil {
return nil, err
}
defer rc.Close()
return io.ReadAll(rc)
}
Comment thread cmd/engine/main.go
Comment on lines +270 to +283
// storageFetcher adapts a storage.Storage to retrieval.ContentFetcher.
// The agentic strategy reads section bodies one at a time, so we
// materialize the full reader contents into a []byte here rather than
// streaming — section bodies are typically a few KB.
type storageFetcher struct{ s storage.Storage }

func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
rc, _, err := sf.s.Get(ctx, ref)
if err != nil {
return nil, err
}
defer rc.Close()
return io.ReadAll(rc)
}
@hallelx2 hallelx2 merged commit 5b31b9d into main May 26, 2026
5 of 8 checks passed
@hallelx2 hallelx2 deleted the feat/agentic-retrieval branch May 26, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants