pkg/retrieval: add agentic-navigation strategy (Phase 1.4)#13
Conversation
A new Strategy/CostStrategy implementation that lets the LLM explore
the tree iteratively via JSON action turns (outline, expand, read,
done) instead of being handed the whole view at once. Useful for
trees too large to fit any single context window or where read-on-
demand keeps token cost low for narrow queries.
Result gains a HopsTaken field so callers can distinguish single-shot
strategies from iterative ones; single-pass and chunked-tree set it
to 1.
Wiring:
- config.RetrievalConfig.Agentic{ MaxHops, Model }; defaults to 6
hops, falls back to budget.ModelName when Model is empty
- VLE_RETRIEVAL_AGENTIC_MAX_HOPS / VLE_RETRIEVAL_AGENTIC_MODEL env
overrides
- buildStrategy in cmd/engine + cmd/server now takes the storage so
the agentic strategy can satisfy 'read' via a small ContentFetcher
adapter
Protocol: JSON-action text protocol rather than llmgate's Tools
field; llmgate v0.2.0 declares ToolDef/ToolCall as scaffolding but
provider adapters don't yet populate ToolCalls. The strategy stays
forward-compatible — when llmgate wires native tool calling the
surface here doesn't change.
Tests use a scriptedLLM that returns canned per-turn replies and
cover: happy-path expand/read/done, expand/expand/done (matches the
plan's manual trace), MaxHops cap, graceful degradation on
persistent non-JSON output, fabricated-ID filtering, empty tree
guard, summary fallback for 'read' on internal nodes, and the
tolerant ParseAction decoder.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThis PR introduces an LLM-powered agentic retrieval mode that enables interactive document navigation. It updates the Result schema to track hop counts, adds configuration support for the new strategy, implements a multi-turn agent loop with action parsing and observation rendering, provides comprehensive test coverage, and integrates the new strategy into engine and server initialization paths. ChangesAgentic Retrieval Strategy Introduction
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer's GuideAdds an agentic, tool-like navigation retrieval strategy, improves PDF parsing for SEC-style filings, hardens selection JSON parsing with retries, and adjusts summarization and config to support the new strategy and better retrieval quality. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The new agentic strategy currently ignores the
ContextBudgettoken limits (it hardcodesMaxTokens: 1024and never checksbudget.Available()), which can allow retrieval calls to exceed configured budgets; consider plumbing the budget through and capping per-hop tokens accordingly. - Both
single_pass.goandagentic.gouselog.Printfinside thepkg/retrievalpackage, which makes logging behavior harder to control from the host application; consider either using the existing structured logger pattern (e.g.,slog) or accepting a logger interface so callers can control logging. - The
storageFetcheradapter is duplicated in bothcmd/engine/main.goandcmd/server/main.go; factoring it into a shared helper (e.g., in a small internal package) would reduce duplication and keep the agentic wiring consistent between binaries.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new agentic strategy currently ignores the `ContextBudget` token limits (it hardcodes `MaxTokens: 1024` and never checks `budget.Available()`), which can allow retrieval calls to exceed configured budgets; consider plumbing the budget through and capping per-hop tokens accordingly.
- Both `single_pass.go` and `agentic.go` use `log.Printf` inside the `pkg/retrieval` package, which makes logging behavior harder to control from the host application; consider either using the existing structured logger pattern (e.g., `slog`) or accepting a logger interface so callers can control logging.
- The `storageFetcher` adapter is duplicated in both `cmd/engine/main.go` and `cmd/server/main.go`; factoring it into a shared helper (e.g., in a small internal package) would reduce duplication and keep the agentic wiring consistent between binaries.
## Individual Comments
### Comment 1
<location path="pkg/parser/pdf.go" line_range="286-295" />
<code_context>
+
+// splitContentByWords breaks a long string into pieces near target size at
+// word boundaries. The last piece may be smaller; pieces are never midword.
+func splitContentByWords(s string, target int) []string {
+ s = strings.TrimSpace(s)
+ if target < 200 {
+ target = 200
+ }
+ slack := target / 4
+ if len(s) <= target+slack {
+ return []string{s}
+ }
+ var chunks []string
+ for len(s) > 0 {
+ if len(s) <= target+slack {
+ chunks = append(chunks, strings.TrimSpace(s))
+ break
+ }
+ upper := target + slack
+ if upper > len(s) {
+ upper = len(s)
+ }
+ cut := strings.LastIndex(s[:upper], " ")
+ if cut < target/2 {
+ cut = upper // no good break: hard-cut at upper bound
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid slicing strings on byte indices to prevent breaking multi-byte runes in non-ASCII text
Because `upper` is calculated in bytes and used directly in `s[:upper]`, any case where `cut = upper` risks slicing through the middle of a multi-byte UTF-8 rune, producing invalid UTF-8 and corrupting non-ASCII content. To avoid this, compute the split position using rune indices (e.g., iterate over `[]rune(s)` or range over the string) and then convert back to a byte index, or otherwise ensure the cut only occurs at valid rune boundaries when no suitable space is found.
</issue_to_address>
### Comment 2
<location path="cmd/engine/main.go" line_range="270-276" />
<code_context>
}
}
+// storageFetcher adapts a storage.Storage to retrieval.ContentFetcher.
+// The agentic strategy reads section bodies one at a time, so we
+// materialize the full reader contents into a []byte here rather than
+// streaming — section bodies are typically a few KB.
+type storageFetcher struct{ s storage.Storage }
+
+func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
+ rc, _, err := sf.s.Get(ctx, ref)
+ if err != nil {
</code_context>
<issue_to_address>
**suggestion:** Factor storageFetcher into a shared helper to avoid duplication between engine and server binaries
The `storageFetcher` type and its `Get` method are duplicated here and in `cmd/server/main.go`. To avoid divergence if retrieval semantics change (compression, limits, metrics, etc.), extract this adapter into a shared package (or the retrieval package) and use it from both binaries.
Suggested implementation:
```golang
fetcher := retrieval.NewStorageFetcher(store)
strategy := buildStrategy(cfg.Retrieval, llmClient, fetcher)
```
To fully implement the refactor and remove duplication between the engine and server binaries, you should also:
1. Extract the `storageFetcher` adapter into a shared helper, for example in the `retrieval` package:
- Add something like:
```go
// In package retrieval
type storageFetcher struct{ s storage.Storage }
func NewStorageFetcher(s storage.Storage) ContentFetcher {
return storageFetcher{s: s}
}
func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) {
rc, _, err := sf.s.Get(ctx, ref)
if err != nil {
return nil, err
}
defer rc.Close()
data, err := io.ReadAll(rc)
if err != nil {
return nil, err
}
return data, nil
}
```
- Adjust types/imports to match the existing `retrieval` and `storage` packages.
2. In `cmd/engine/main.go`, remove the now-duplicated local `storageFetcher` type and its `Get` method entirely.
3. In `cmd/server/main.go`, replace its local `storageFetcher` implementation and usage with the shared helper:
- Delete the duplicate type/method.
- Change any call sites to use `retrieval.NewStorageFetcher(store)` (or whatever package name you choose).
4. Ensure both binaries import the package where `NewStorageFetcher` lives and that `buildStrategy` (and any other consumers) accept a `retrieval.ContentFetcher` rather than `storage.Storage` directly, if they don't already.
</issue_to_address>
### Comment 3
<location path="pkg/retrieval/retrieval_test.go" line_range="129" />
<code_context>
+// When the model returns prose without any JSON (Gemini's occasional JSON-mode
+// blip), the strategy must retry and then degrade gracefully — empty selection
+// with no error — instead of bubbling the parse failure up as a 500.
+func TestSinglePassGracefulOnNonJSON(t *testing.T) {
+ tr := buildTree()
+ m := &mockLLM{reply: "The most relevant section is the one about debt securities."}
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test where a non-JSON first reply is followed by a valid JSON reply to verify the retry path succeeds, not only the graceful-failure path.
This currently only covers the "persistent parse failure" case (all attempts non-JSON) and validates the graceful empty-selection behavior. Please also add a test where the first response is prose and a subsequent response is valid JSON (e.g. `reply[0] = "The most relevant..."`, `reply[1] = valid JSON`) to confirm `runSelectionWithRetry` successfully recovers, returning parsed IDs and aggregated usage instead of always degrading to an empty selection when the initial response is non-JSON.
Suggested implementation:
```golang
}
// 1 initial attempt + 2 retries = 3 LLM calls, all counted in usage.
}
// When the model first returns prose (non-JSON) and then returns valid JSON on
// retry, the strategy should successfully recover — returning parsed IDs and
// aggregated usage instead of degrading to an empty selection.
func TestSinglePassRecoverOnNonJSONThenJSON(t *testing.T) {
tr := buildTree()
m := &mockLLM{
// First attempt: non-JSON prose (JSON-mode blip).
// Second attempt: valid JSON response with selected IDs.
reply: []string{
"The most relevant section is the one about debt securities.",
`{"selected_ids":["id-1","id-2"]}`,
},
}
s := retrieval.NewSinglePass(m)
res, err := s.SelectWithCost(context.Background(), tr, "q", retrieval.ContextBudget{MaxTokens: 1000})
if err != nil {
t.Fatalf("want nil error when JSON parse succeeds on retry, got %v", err)
}
if len(res.SelectedIDs) == 0 {
t.Fatalf("want non-empty selection when JSON parse succeeds on retry, got %v", res.SelectedIDs)
}
}
```
To make this test compile and correctly exercise the retry path, you will also need to update the `mockLLM` implementation in this file:
1. Change `mockLLM` to support multiple replies, e.g.:
- Add fields like `replies []string` and `call int`.
- Keep the existing `reply string` field for backward compatibility.
2. In the `Complete`/`Generate` method (whatever `mockLLM` implements for the LLM interface), return:
- `replies[call]` if `len(replies) > 0` and `call < len(replies)`, then increment `call`.
- Otherwise fall back to the existing single `reply` behavior.
3. Optionally expose a `calls int` counter if you want to assert the number of retries in this or other tests.
Example shape of the struct adaptation (adjust names/signatures to match your actual code):
```go
type mockLLM struct {
// existing fields...
reply string // existing single-reply field; keep for old tests
replies []string // new: sequence of replies for successive calls
call int // new: current call index
}
func (m *mockLLM) Complete(ctx context.Context, prompt string, opts ...Option) (string, Usage, error) {
var r string
if len(m.replies) > 0 && m.call < len(m.replies) {
r = m.replies[m.call]
m.call++
} else {
r = m.reply
}
// existing logic that wraps r into the appropriate return type / usage
}
```
Adjust the method name, return types, and usage accounting to match the existing `mockLLM` implementation and interfaces used by `retrieval.NewSinglePass`.
</issue_to_address>
### Comment 4
<location path="pkg/parser/chunk_test.go" line_range="8" />
<code_context>
+ "testing"
+)
+
+func TestChunkOversizedLeavesSplits(t *testing.T) {
+ // 12 words per "sentence", 5 sentences ~ 60-65 words, ~360 chars; we want
+ // >2400 chars so build it from a longer paragraph + a colon-terminated header.
</code_context>
<issue_to_address>
**suggestion (testing):** Add focused tests for splitContentByWords edge cases (minimal target size, no spaces) to validate chunk boundaries.
Current tests cover `chunkOversizedLeaves` and title derivation, but `splitContentByWords`’s edge branches are only exercised indirectly. Please add direct unit tests for:
- Very small `target` values (e.g. 0 or 50) to verify the `target < 200` clamping and resulting chunk shapes.
- Long input without spaces to cover the “no good break → hard cut at upper bound” path and ensure it doesn’t loop or mis-split.
This will better protect future changes to the chunking logic and word-boundary handling.
Suggested implementation:
```golang
package parser
import (
"strings"
"testing"
)
// Test that very small targets are clamped (e.g. to 200) and that the
// resulting chunking behavior is consistent for all "too small" values.
func TestSplitContentByWords_ClampsSmallTarget(t *testing.T) {
content := strings.Repeat("alpha beta gamma delta ", 30) // plenty of words
// These targets should all fall under the internal clamp threshold (e.g. < 200)
chunksTarget0 := splitContentByWords(content, 0)
chunksTarget50 := splitContentByWords(content, 50)
chunksTarget199 := splitContentByWords(content, 199)
// This target should be at or above the clamp threshold and act as the baseline.
chunksTarget200 := splitContentByWords(content, 200)
if len(chunksTarget0) == 0 || len(chunksTarget50) == 0 || len(chunksTarget199) == 0 || len(chunksTarget200) == 0 {
t.Fatalf("expected non-empty chunk lists for all targets; got 0 in one of them")
}
// All targets below the clamp threshold should behave identically to the clamped value.
if len(chunksTarget0) != len(chunksTarget200) {
t.Errorf("expected same number of chunks for target 0 and 200; got %d vs %d", len(chunksTarget0), len(chunksTarget200))
}
if len(chunksTarget50) != len(chunksTarget200) {
t.Errorf("expected same number of chunks for target 50 and 200; got %d vs %d", len(chunksTarget50), len(chunksTarget200))
}
if len(chunksTarget199) != len(chunksTarget200) {
t.Errorf("expected same number of chunks for target 199 and 200; got %d vs %d", len(chunksTarget199), len(chunksTarget200))
}
// Also assert that concatenating all chunks yields the original content, so we
// don't silently lose or duplicate data when clamping.
joinAndCompare := func(label string, chunks []string) {
got := strings.Join(chunks, "")
if got != content {
t.Errorf("%s: reconstructed content mismatch; len(got)=%d, len(want)=%d", label, len(got), len(content))
}
}
joinAndCompare("target=0", chunksTarget0)
joinAndCompare("target=50", chunksTarget50)
joinAndCompare("target=199", chunksTarget199)
joinAndCompare("target=200", chunksTarget200)
}
// Test that inputs without spaces take the "no good break → hard cut at upper bound"
// path and that we don't loop or mis-split (no empty chunks, full coverage).
func TestSplitContentByWords_NoSpacesHardCut(t *testing.T) {
// Long string with no spaces to force the hard-cut logic.
content := strings.Repeat("X", 2500)
target := 240 // a typical chunk size that should exercise the upper-bound cut
chunks := splitContentByWords(content, target)
if len(chunks) == 0 {
t.Fatalf("expected at least one chunk for long content without spaces")
}
totalLen := 0
for i, ch := range chunks {
if ch == "" {
t.Fatalf("chunk %d is empty; expected hard-cut logic to avoid empty chunks", i)
}
totalLen += len(ch)
// No chunk should exceed some reasonable multiple of target; for a hard cut
// implementation we expect at most target*2, and generally ≤ target.
if len(ch) > target*2 {
t.Errorf("chunk %d too large: got len=%d, want <= %d", i, len(ch), target*2)
}
}
if totalLen != len(content) {
t.Errorf("reconstructed length mismatch: got %d, want %d", totalLen, len(content))
}
}
```
These tests assume the helper has signature:
```go
func splitContentByWords(content string, target int) []string
```
If your actual signature differs (e.g. additional parameters or different return type), adjust the test calls and assertions accordingly. If you prefer to keep all tests in `pkg/parser/chunk_test.go`, you can move these two test functions into that file instead.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| func splitContentByWords(s string, target int) []string { | ||
| s = strings.TrimSpace(s) | ||
| if target < 200 { | ||
| target = 200 | ||
| } | ||
| slack := target / 4 | ||
| if len(s) <= target+slack { | ||
| return []string{s} | ||
| } | ||
| var chunks []string |
There was a problem hiding this comment.
issue (bug_risk): Avoid slicing strings on byte indices to prevent breaking multi-byte runes in non-ASCII text
Because upper is calculated in bytes and used directly in s[:upper], any case where cut = upper risks slicing through the middle of a multi-byte UTF-8 rune, producing invalid UTF-8 and corrupting non-ASCII content. To avoid this, compute the split position using rune indices (e.g., iterate over []rune(s) or range over the string) and then convert back to a byte index, or otherwise ensure the cut only occurs at valid rune boundaries when no suitable space is found.
| // storageFetcher adapts a storage.Storage to retrieval.ContentFetcher. | ||
| // The agentic strategy reads section bodies one at a time, so we | ||
| // materialize the full reader contents into a []byte here rather than | ||
| // streaming — section bodies are typically a few KB. | ||
| type storageFetcher struct{ s storage.Storage } | ||
|
|
||
| func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) { |
There was a problem hiding this comment.
suggestion: Factor storageFetcher into a shared helper to avoid duplication between engine and server binaries
The storageFetcher type and its Get method are duplicated here and in cmd/server/main.go. To avoid divergence if retrieval semantics change (compression, limits, metrics, etc.), extract this adapter into a shared package (or the retrieval package) and use it from both binaries.
Suggested implementation:
fetcher := retrieval.NewStorageFetcher(store)
strategy := buildStrategy(cfg.Retrieval, llmClient, fetcher)To fully implement the refactor and remove duplication between the engine and server binaries, you should also:
-
Extract the
storageFetcheradapter into a shared helper, for example in theretrievalpackage:- Add something like:
// In package retrieval type storageFetcher struct{ s storage.Storage } func NewStorageFetcher(s storage.Storage) ContentFetcher { return storageFetcher{s: s} } func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) { rc, _, err := sf.s.Get(ctx, ref) if err != nil { return nil, err } defer rc.Close() data, err := io.ReadAll(rc) if err != nil { return nil, err } return data, nil }
- Adjust types/imports to match the existing
retrievalandstoragepackages.
- Add something like:
-
In
cmd/engine/main.go, remove the now-duplicated localstorageFetchertype and itsGetmethod entirely. -
In
cmd/server/main.go, replace its localstorageFetcherimplementation and usage with the shared helper:- Delete the duplicate type/method.
- Change any call sites to use
retrieval.NewStorageFetcher(store)(or whatever package name you choose).
-
Ensure both binaries import the package where
NewStorageFetcherlives and thatbuildStrategy(and any other consumers) accept aretrieval.ContentFetcherrather thanstorage.Storagedirectly, if they don't already.
| // When the model returns prose without any JSON (Gemini's occasional JSON-mode | ||
| // blip), the strategy must retry and then degrade gracefully — empty selection | ||
| // with no error — instead of bubbling the parse failure up as a 500. | ||
| func TestSinglePassGracefulOnNonJSON(t *testing.T) { |
There was a problem hiding this comment.
suggestion (testing): Add a test where a non-JSON first reply is followed by a valid JSON reply to verify the retry path succeeds, not only the graceful-failure path.
This currently only covers the "persistent parse failure" case (all attempts non-JSON) and validates the graceful empty-selection behavior. Please also add a test where the first response is prose and a subsequent response is valid JSON (e.g. reply[0] = "The most relevant...", reply[1] = valid JSON) to confirm runSelectionWithRetry successfully recovers, returning parsed IDs and aggregated usage instead of always degrading to an empty selection when the initial response is non-JSON.
Suggested implementation:
}
// 1 initial attempt + 2 retries = 3 LLM calls, all counted in usage.
}
// When the model first returns prose (non-JSON) and then returns valid JSON on
// retry, the strategy should successfully recover — returning parsed IDs and
// aggregated usage instead of degrading to an empty selection.
func TestSinglePassRecoverOnNonJSONThenJSON(t *testing.T) {
tr := buildTree()
m := &mockLLM{
// First attempt: non-JSON prose (JSON-mode blip).
// Second attempt: valid JSON response with selected IDs.
reply: []string{
"The most relevant section is the one about debt securities.",
`{"selected_ids":["id-1","id-2"]}`,
},
}
s := retrieval.NewSinglePass(m)
res, err := s.SelectWithCost(context.Background(), tr, "q", retrieval.ContextBudget{MaxTokens: 1000})
if err != nil {
t.Fatalf("want nil error when JSON parse succeeds on retry, got %v", err)
}
if len(res.SelectedIDs) == 0 {
t.Fatalf("want non-empty selection when JSON parse succeeds on retry, got %v", res.SelectedIDs)
}
}To make this test compile and correctly exercise the retry path, you will also need to update the mockLLM implementation in this file:
- Change
mockLLMto support multiple replies, e.g.:- Add fields like
replies []stringandcall int. - Keep the existing
reply stringfield for backward compatibility.
- Add fields like
- In the
Complete/Generatemethod (whatevermockLLMimplements for the LLM interface), return:replies[call]iflen(replies) > 0andcall < len(replies), then incrementcall.- Otherwise fall back to the existing single
replybehavior.
- Optionally expose a
calls intcounter if you want to assert the number of retries in this or other tests.
Example shape of the struct adaptation (adjust names/signatures to match your actual code):
type mockLLM struct {
// existing fields...
reply string // existing single-reply field; keep for old tests
replies []string // new: sequence of replies for successive calls
call int // new: current call index
}
func (m *mockLLM) Complete(ctx context.Context, prompt string, opts ...Option) (string, Usage, error) {
var r string
if len(m.replies) > 0 && m.call < len(m.replies) {
r = m.replies[m.call]
m.call++
} else {
r = m.reply
}
// existing logic that wraps r into the appropriate return type / usage
}Adjust the method name, return types, and usage accounting to match the existing mockLLM implementation and interfaces used by retrieval.NewSinglePass.
| "testing" | ||
| ) | ||
|
|
||
| func TestChunkOversizedLeavesSplits(t *testing.T) { |
There was a problem hiding this comment.
suggestion (testing): Add focused tests for splitContentByWords edge cases (minimal target size, no spaces) to validate chunk boundaries.
Current tests cover chunkOversizedLeaves and title derivation, but splitContentByWords’s edge branches are only exercised indirectly. Please add direct unit tests for:
- Very small
targetvalues (e.g. 0 or 50) to verify thetarget < 200clamping and resulting chunk shapes. - Long input without spaces to cover the “no good break → hard cut at upper bound” path and ensure it doesn’t loop or mis-split.
This will better protect future changes to the chunking logic and word-boundary handling.
Suggested implementation:
package parser
import (
"strings"
"testing"
)
// Test that very small targets are clamped (e.g. to 200) and that the
// resulting chunking behavior is consistent for all "too small" values.
func TestSplitContentByWords_ClampsSmallTarget(t *testing.T) {
content := strings.Repeat("alpha beta gamma delta ", 30) // plenty of words
// These targets should all fall under the internal clamp threshold (e.g. < 200)
chunksTarget0 := splitContentByWords(content, 0)
chunksTarget50 := splitContentByWords(content, 50)
chunksTarget199 := splitContentByWords(content, 199)
// This target should be at or above the clamp threshold and act as the baseline.
chunksTarget200 := splitContentByWords(content, 200)
if len(chunksTarget0) == 0 || len(chunksTarget50) == 0 || len(chunksTarget199) == 0 || len(chunksTarget200) == 0 {
t.Fatalf("expected non-empty chunk lists for all targets; got 0 in one of them")
}
// All targets below the clamp threshold should behave identically to the clamped value.
if len(chunksTarget0) != len(chunksTarget200) {
t.Errorf("expected same number of chunks for target 0 and 200; got %d vs %d", len(chunksTarget0), len(chunksTarget200))
}
if len(chunksTarget50) != len(chunksTarget200) {
t.Errorf("expected same number of chunks for target 50 and 200; got %d vs %d", len(chunksTarget50), len(chunksTarget200))
}
if len(chunksTarget199) != len(chunksTarget200) {
t.Errorf("expected same number of chunks for target 199 and 200; got %d vs %d", len(chunksTarget199), len(chunksTarget200))
}
// Also assert that concatenating all chunks yields the original content, so we
// don't silently lose or duplicate data when clamping.
joinAndCompare := func(label string, chunks []string) {
got := strings.Join(chunks, "")
if got != content {
t.Errorf("%s: reconstructed content mismatch; len(got)=%d, len(want)=%d", label, len(got), len(content))
}
}
joinAndCompare("target=0", chunksTarget0)
joinAndCompare("target=50", chunksTarget50)
joinAndCompare("target=199", chunksTarget199)
joinAndCompare("target=200", chunksTarget200)
}
// Test that inputs without spaces take the "no good break → hard cut at upper bound"
// path and that we don't loop or mis-split (no empty chunks, full coverage).
func TestSplitContentByWords_NoSpacesHardCut(t *testing.T) {
// Long string with no spaces to force the hard-cut logic.
content := strings.Repeat("X", 2500)
target := 240 // a typical chunk size that should exercise the upper-bound cut
chunks := splitContentByWords(content, target)
if len(chunks) == 0 {
t.Fatalf("expected at least one chunk for long content without spaces")
}
totalLen := 0
for i, ch := range chunks {
if ch == "" {
t.Fatalf("chunk %d is empty; expected hard-cut logic to avoid empty chunks", i)
}
totalLen += len(ch)
// No chunk should exceed some reasonable multiple of target; for a hard cut
// implementation we expect at most target*2, and generally ≤ target.
if len(ch) > target*2 {
t.Errorf("chunk %d too large: got len=%d, want <= %d", i, len(ch), target*2)
}
}
if totalLen != len(content) {
t.Errorf("reconstructed length mismatch: got %d, want %d", totalLen, len(content))
}
}These tests assume the helper has signature:
func splitContentByWords(content string, target int) []stringIf your actual signature differs (e.g. additional parameters or different return type), adjust the test calls and assertions accordingly. If you prefer to keep all tests in pkg/parser/chunk_test.go, you can move these two test functions into that file instead.
There was a problem hiding this comment.
Pull request overview
This PR expands the retrieval subsystem with a new iterative “agentic-navigation” strategy and related plumbing, while also improving parse-time document structure (PDF heading detection + oversized-leaf chunking) and ingest-time section summaries to boost downstream retrieval quality.
Changes:
- Add
agenticretrieval strategy with an outline/expand/read/done JSON-action loop, plus tests for hop caps, malformed JSON, ID filtering, and read fallbacks. - Harden
single-passandchunked-treeselection calls against non-JSON LLM outputs via retry + graceful degradation; addResult.HopsTakenfor turn accounting. - Improve PDF parsing for SEC-style formatting (bold-as-heading, letter-spacing collapse) and split oversized leaf sections; update summary prompts to be retrieval-optimized; add config + wiring for the new strategy.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/retrieval/agentic.go | Implements the multi-hop agentic navigation loop, JSON action parsing, and read/expand/outline observations. |
| pkg/retrieval/agentic_test.go | Adds deterministic turn-by-turn tests covering agentic behavior and edge cases. |
| pkg/retrieval/single_pass.go | Adds selection retry+degrade behavior on JSON parse failures and usage aggregation. |
| pkg/retrieval/chunked_tree.go | Reuses the shared selection retry+degrade helper and sets HopsTaken. |
| pkg/retrieval/strategy.go | Extends Result with JSON tags and a new HopsTaken field. |
| pkg/retrieval/retrieval_test.go | Adds coverage for graceful handling of persistent non-JSON selection output. |
| pkg/parser/pdf.go | Improves heading detection (bold rows, longer headings), collapses letter-spaced text, and chunks oversized leaf sections. |
| pkg/parser/chunk_test.go | Adds tests for oversized-leaf chunking and chunk title derivation. |
| pkg/ingest/ingest.go | Updates summarization prompts to produce retrieval-optimized summaries and increases token cap. |
| pkg/config/config.go | Adds RetrievalConfig.Agentic block, env overrides, and validation for the agentic strategy. |
| cmd/server/main.go | Wires agentic strategy and adds a storageFetcher adapter for read actions. |
| cmd/engine/main.go | Same wiring + storageFetcher adapter for the standalone engine binary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for hop := 0; hop < maxHops; hop++ { | ||
| req := llmgate.Request{ | ||
| Model: model, | ||
| Messages: msgs, | ||
| MaxTokens: 1024, | ||
| Temperature: 0, | ||
| } |
| // Graceful degradation: a malformed action doesn't 500 the | ||
| // query — we ask the model to retry once with a stronger | ||
| // instruction. If the next turn still misparses, we abort | ||
| // and return whatever the model has picked so far (often | ||
| // nothing). Matches the runSelectionWithRetry pattern from | ||
| // single_pass.go. | ||
| log.Printf("retrieval: agentic hop %d action parse failed: %v", hop+1, parseErr) |
| // Ran out of hops without a `done` action. Return whatever IDs the | ||
| // model proposed in the last action (if any) plus the hop count so | ||
| // the caller can see the cap was hit. | ||
| log.Printf("retrieval: agentic strategy hit max_hops=%d without done; returning %d ids", maxHops, len(finalIDs)) |
| func (a *AgenticStrategy) renderRead(ctx context.Context, t *tree.Tree, id tree.SectionID) (string, bool) { | ||
| sec := t.FindByID(id) | ||
| if sec == nil { |
| data, err := a.Fetcher.Get(ctx, sec.ContentRef) | ||
| if err != nil { | ||
| return fmt.Sprintf("error reading section %s: %v", sec.ID, err), true | ||
| } | ||
| body := string(data) | ||
| body = strings.TrimSpace(body) | ||
| if body == "" && sec.Summary != "" { | ||
| return fmt.Sprintf("section %s body was empty; summary:\n%s", sec.ID, sec.Summary), true | ||
| } | ||
| return fmt.Sprintf("section %s (%s):\n%s", sec.ID, sec.Title, body), true |
| // HopsTaken is the number of LLM turns the strategy issued to reach the | ||
| // final selection. Single-shot strategies set this to 1; iterative | ||
| // strategies (e.g. agentic) set it to the number of tool-using turns | ||
| // actually consumed, including the terminal "done" turn. |
| // storageFetcher adapts a storage.Storage to retrieval.ContentFetcher. | ||
| // The agentic strategy reads section bodies one at a time, so we | ||
| // materialize the full reader contents into a []byte here rather than | ||
| // streaming — section bodies are typically a few KB. | ||
| type storageFetcher struct{ s storage.Storage } | ||
|
|
||
| func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) { | ||
| rc, _, err := sf.s.Get(ctx, ref) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer rc.Close() | ||
| return io.ReadAll(rc) | ||
| } |
| // storageFetcher adapts a storage.Storage to retrieval.ContentFetcher. | ||
| // The agentic strategy reads section bodies one at a time, so we | ||
| // materialize the full reader contents into a []byte here rather than | ||
| // streaming — section bodies are typically a few KB. | ||
| type storageFetcher struct{ s storage.Storage } | ||
|
|
||
| func (sf storageFetcher) Get(ctx context.Context, ref string) ([]byte, error) { | ||
| rc, _, err := sf.s.Get(ctx, ref) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer rc.Close() | ||
| return io.ReadAll(rc) | ||
| } |
Summary
Adds a third retrieval strategy,
agentic, alongsidesingle-passandchunked-tree. Instead of feeding the whole tree to the model in one shot or fanning out across slices in parallel, the agentic strategy lets the model navigate the tree iteratively: it emits a JSON action each turn (outline,expand,read,done) and we feed the observation back as the next user turn until the model picks a final set of section IDs or hitsMaxHops.This trades sequential latency for the ability to:
MaxTokensPerCallceiling)What changed
pkg/retrieval/agentic.go— newAgenticStrategyimplementing bothStrategyandCostStrategy. IncludesContentFetcherinterface (storage adapter), tolerantParseActionJSON decoder, and the full navigation loop with graceful degradation on malformed actions.pkg/retrieval/strategy.go—Resultnow carriesHopsTaken int(jsonhops_taken,omitempty). Existing strategies set it to 1.pkg/config/config.go— newRetrievalConfig.Agentic{ MaxHops int; Model string }block; defaultMaxHops = 6; env overridesVLE_RETRIEVAL_AGENTIC_MAX_HOPSandVLE_RETRIEVAL_AGENTIC_MODEL;Validate()accepts"agentic"and rejects negativeMaxHops.cmd/engine/main.go+cmd/server/main.go—buildStrategynow takes thestorage.Storageand adds an"agentic"case. A tinystorageFetcheradapter satisfiesContentFetcherby reading the storage object into a[]byte.Tests
pkg/retrieval/agentic_test.goadds ascriptedLLM(turn-by-turn canned replies) and covers:TestAgenticHappyPath— expand → read → done; asserts ids, hops, reasoning, and that the read observation actually carries body contentTestAgenticExpandExpandDone— covers the plan's manual trace (two-level navigation without read)TestAgenticHopCap— model never emits done; strategy stops at MaxHops with empty selectionTestAgenticBadJSONGraceful— every reply is prose, never JSON; returns nil error + empty selectionTestAgenticFiltersUnknownPicks— fabricated IDs dropped, picks deduplicatedTestAgenticEmptyTree— early-return guard, 0 LLM callsTestAgenticReadFallbackWhenNoContent—readon an internal section falls back to its summaryTestParseAction— code-fence + prose-before tolerance, validates requiredactionfieldFull suite:
go test ./pkg/retrieval/...passes.Tradeoffs
MaxTokensPerCall, agentic just needs enough room for one observation + the running message history. We bound this implicitly via MaxHops (default 6); eachreadis the heaviest observation.ToolDef/ToolCallbut provider adapters don't populateToolCallsyet. We use a JSON-action text protocol instead so the strategy works across all current providers. The surface stays the same once llmgate wires native tool calling —ParseActionbecomes an internal fast-path, and the loop body is unchanged.runSelectionWithRetry. A model that fabricates IDs has them dropped byfilterToTreeIDs.Summary by Sourcery
Add an agentic, tool-like retrieval strategy that incrementally navigates large document trees, hardens existing retrieval strategies against malformed LLM JSON, and enhances PDF parsing and summarization to improve retrieval quality on complex filings and long sections.
New Features:
Bug Fixes:
Enhancements:
Tests:
Summary by CodeRabbit
New Features
API Enhancements