feat(ingest): run HyDE in parallel with summarize by hallelx2 · Pull Request #15 · hallelx2/vectorless-engine

hallelx2 · 2026-05-27T00:21:41Z

Summary

Summarize and HyDE now run as concurrent goroutines instead of
sequential stages. On a 200-section 10-K this roughly halves total
ingest wall time.
HyDE's user prompt no longer references s.Summary (which may not
be persisted yet when HyDE runs in parallel). Title + first 4K of
content carry strictly more signal anyway.
New ingest.global_llm_concurrency knob (default 12) caps total
LLM-in-flight across both stages so the provider's per-tenant rate
limit isn't blown. Per-stage caps (summary_concurrency,
ingest.hyde.concurrency) still apply.
Per-stage failure semantics unchanged: both stages remain
non-fatal; p.fail is only called on parse / persist errors.

Rationale (Option A over B)

HyDE only needs Title + Content to produce useful questions —
the summary was a 60-word hint derived from the same content the
prompt already gets in full. Removing it lets the two stages run
fully concurrently with no per-section ordering. Per-section
pipelining (Option B) would deliver the same wall-time win at
significantly higher orchestration cost.

Test plan

go build ./... clean
go vet ./... clean
go test ./... — all tests pass
New TestRunParallelStagesInterleaves proves HyDE goroutine
can complete while summarize is still blocked
New TestGlobalLLMSemaphoreCapsInFlight proves the shared
semaphore never lets peak in-flight exceed the configured cap
New TestHyDEPromptOmitsSummary regression guard against
reintroducing the prompt dependency on s.Summary
TestPipelineRunParallelSummarizeAndHyDEIntegration (gated on
TEST_DATABASE_URL) runs the full pipeline against the
rust-ownership.md fixture and asserts (a) doc reaches
ready, (b) every section has a summary, (c) every leaf has
candidate_questions, (d) first HyDE call's timestamp precedes
the last summarize call's

Summary by Sourcery

Run the summarize and HyDE ingest stages in parallel under a configurable global LLM concurrency cap, and update configuration, prompts, and tests to support and validate the new orchestration.

New Features:

Run summarize and HyDE ingest stages concurrently rather than sequentially to reduce ingest wall time
Introduce a global LLM concurrency cap shared across summarize and HyDE stages, configurable via YAML and environment variable

Enhancements:

Adjust HyDE prompt generation to rely on section title and content only, removing the dependency on stored summaries so it can safely run in parallel
Wire ingest configuration through engine and server binaries to pass the new global LLM concurrency limit into the pipeline

Documentation:

Document the new global LLM concurrency setting in example engine and server configuration files, including defaults and disabling behavior

Tests:

Add unit tests covering parallel stage execution semantics, independent error handling, and behavior when HyDE is disabled
Add tests validating the global LLM semaphore behavior, including respecting caps, cancellation, and no-cap paths
Add a HyDE prompt regression test ensuring summaries are not included when generating candidate questions
Add an end-to-end ingest integration test that verifies summaries and candidate questions are populated and stages actually interleave in production-like conditions

Summarize and HyDE now run as concurrent goroutines instead of strictly sequential stages. HyDE's input is (title, content) — the section summary was a weak hint and is now omitted from the prompt, which removes the only ordering dependency between the two stages. A new ingest.global_llm_concurrency knob (default 12) caps total LLM-in-flight across both stages so the provider's per-tenant limit isn't blown. Option A (fully concurrent stages) was chosen over per-section pipelining because HyDE has no hard dependency on summary text: title + the first 4K of content carry strictly more signal than a 60-word summary derived from that same content. Test coverage: - runParallelStages: interleave proved by blocking summarize while HyDE completes - global semaphore: peak in-flight never exceeds the cap under load - cancellation: acquire returns ok=false on a canceled ctx - prompt regression guard: s.Summary text must not appear in the HyDE user prompt - integration: gated on TEST_DATABASE_URL, ingests the rust markdown fixture end-to-end, asserts every section has a summary and every leaf has candidate_questions, and verifies the first HyDE call's timestamp precedes the last summarize call's

sourcery-ai · 2026-05-27T00:21:46Z

Reviewer's Guide

Runs the summarize and HyDE ingest stages concurrently while introducing a shared global LLM concurrency cap, updating HyDE prompts to no longer depend on persisted summaries, wiring configuration/CLI, and adding unit/integration tests to validate concurrency and prompt behavior.

Sequence diagram for parallel summarize and HyDE stages with global LLM concurrency

sequenceDiagram
  title Parallel summarize and HyDE with shared global LLM cap
  participant Pipeline
  participant runParallelStages
  participant SummarizeStage as summarize
  participant HyDEStage as generateCandidateQuestions
  participant GlobalLLMSemaphore as globalLLMSem

  Pipeline->>runParallelStages: runParallelStages(ctx, summarizeFn, hydeFn)
  par summarize goroutine
    runParallelStages->>SummarizeStage: summarize(ctx, docID, profile)
    loop per section
      SummarizeStage->>Pipeline: acquireGlobalLLM(ctx)
      alt globalLLMSem enabled
        Pipeline->>GlobalLLMSemaphore: send struct{}
        GlobalLLMSemaphore-->>Pipeline: acquired
      else globalLLMSem disabled
        Pipeline-->>SummarizeStage: no-op
      end
      SummarizeStage->>SummarizeStage: summaryFor(ctx, section, childLines, profile)
      SummarizeStage-->>GlobalLLMSemaphore: release slot
    end
  and HyDE goroutine
    runParallelStages->>HyDEStage: generateCandidateQuestions(ctx, docID, profile)
    loop per leaf section
      HyDEStage->>Pipeline: acquireGlobalLLM(ctx)
      alt globalLLMSem enabled
        Pipeline->>GlobalLLMSemaphore: send struct{}
        GlobalLLMSemaphore-->>Pipeline: acquired
      else globalLLMSem disabled
        Pipeline-->>HyDEStage: no-op
      end
      HyDEStage->>HyDEStage: candidateQuestionsFor(ctx, section, profile)
      HyDEStage-->>GlobalLLMSemaphore: release slot
    end
  end
  runParallelStages-->>Pipeline: summarizeErr, hydeErr
  Pipeline->>Pipeline: log summarizeErr / hydeErr and SetDocumentStatus(StatusReady)

File-Level Changes

Change	Details	Files
Run summarize and HyDE ingest stages in parallel with preserved non-fatal stage semantics.	Refactored Pipeline.Run to construct summarize and HyDE stage functions and execute them via a new runParallelStages helper using goroutines and a WaitGroup. Ensured HyDE stage is conditionally invoked based on HyDEEnabled and that errors from each stage are captured and logged independently without failing the pipeline. Added timing log for combined summarize+HyDE elapsed time.	`pkg/ingest/ingest.go`
Introduce a global LLM concurrency cap shared across summarize and HyDE stages.	Extended Pipeline struct with GlobalLLMConcurrency and a backing globalLLMSem channel plus constructor defaulting/validation logic. Implemented acquireGlobalLLM helper that blocks on the shared semaphore (when enabled) and returns a release function while respecting context cancellation. Wrapped calls to summaryFor and candidateQuestionsFor with acquireGlobalLLM/release to enforce the shared cap in addition to per-stage semaphores.	`pkg/ingest/ingest.go` `pkg/ingest/hyde.go`
Expose configuration and CLI wiring for the global LLM concurrency cap.	Added GlobalLLMConcurrency field to IngestConfig with defaults, validation, and environment override (VLE_INGEST_GLOBAL_LLM_CONCURRENCY). Plumbed Ingest.GlobalLLMConcurrency into Pipeline construction in engine and server binaries. Documented the new global_llm_concurrency option in example config files for both engine and server.	`pkg/config/config.go` `cmd/engine/main.go` `cmd/server/main.go` `config.example.yaml` `config.server.example.yaml`
Update HyDE prompt construction to remove dependency on section summaries so HyDE can run fully concurrently with summarize.	Modified candidateQuestionsFor to omit s.Summary from the user prompt, relying on title and content only while preserving system prompt and behavior. Updated HyDE stage docstring to explicitly state it is safe to run concurrently with summarize because it no longer depends on summaries.	`pkg/ingest/hyde.go`
Add unit and integration tests to validate parallel stage behavior, global concurrency cap, and prompt regression guarantees.	Added runParallelStages unit tests covering interleaving, independent error reporting, and nil HyDE function behavior. Added tests for acquireGlobalLLM behavior with/without a cap and under context cancellation, plus a recording-LLM test proving the shared semaphore never exceeds the configured cap. Introduced a prompt regression test to ensure HyDE prompts never include section summaries. Added an end-to-end ingest Pipeline.Run integration test (gated on TEST_DATABASE_URL) that runs against a rust-ownership.md fixture and asserts document readiness, summaries for all sections, candidate_questions for leaves, and real interleaving between summarize and HyDE LLM calls.	`pkg/ingest/ingest_test.go` `pkg/ingest/integration_test.go`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2026-05-27T00:21:52Z

Warning

Review limit reached

@hallelx2, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 53 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 448d6a31-fe7c-490c-ab2e-b6dc156e29ae

📥 Commits

Reviewing files that changed from the base of the PR and between 5b31b9d and 08daf81.

📒 Files selected for processing (9)

cmd/engine/main.go
cmd/server/main.go
config.example.yaml
config.server.example.yaml
pkg/config/config.go
pkg/ingest/hyde.go
pkg/ingest/ingest.go
pkg/ingest/ingest_test.go
pkg/ingest/integration_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/hyde-parallel-summarize

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

runParallelStages writes summarizeErr/hydeErr from separate goroutines without synchronization, which is a data race; consider returning errors via channels or capturing them in local variables guarded by a mutex instead of assigning to outer variables from goroutines.
The GlobalLLMConcurrency semantics are inconsistent: comments/config say 0 disables the global cap, but NewPipeline currently treats 0 as "set default 12" so there is no way to disable the cap; either adjust the constructor logic to leave 0 as 0 (and keep globalLLMSem nil) or update the comments/config description to match the actual behavior.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- runParallelStages writes summarizeErr/hydeErr from separate goroutines without synchronization, which is a data race; consider returning errors via channels or capturing them in local variables guarded by a mutex instead of assigning to outer variables from goroutines.
- The GlobalLLMConcurrency semantics are inconsistent: comments/config say 0 disables the global cap, but NewPipeline currently treats 0 as "set default 12" so there is no way to disable the cap; either adjust the constructor logic to leave 0 as 0 (and keep globalLLMSem nil) or update the comments/config description to match the actual behavior.

## Individual Comments

### Comment 1
<location path="pkg/ingest/ingest.go" line_range="133-142" />
<code_context>
 	if p.HyDEConcurrency <= 0 {
 		p.HyDEConcurrency = 4
 	}
+	// Default the global cap to a value that comfortably exceeds the
+	// sum of the two default per-stage caps (4 + 4 = 8) while leaving
+	// some headroom — but stays well below typical provider per-tenant
+	// concurrency limits.
+	if p.GlobalLLMConcurrency < 0 {
+		p.GlobalLLMConcurrency = 0
+	}
+	if p.GlobalLLMConcurrency == 0 {
+		p.GlobalLLMConcurrency = 12
+	}
+	if p.GlobalLLMConcurrency > 0 {
+		p.globalLLMSem = make(chan struct{}, p.GlobalLLMConcurrency)
+	}
</code_context>
<issue_to_address>
**issue (bug_risk):** GlobalLLMConcurrency handling conflicts with the documented "0 disables the global cap" semantics

Current behavior treats `GlobalLLMConcurrency == 0` as "use default (12)" and always initializes `globalLLMSem` when the value is > 0, so there is no way to disable the global semaphore despite comments stating that `0` disables the cap.

To align with the documented semantics, you could either:
- Distinguish "unspecified" vs. "explicit zero" (e.g., pointer in config or a sentinel like `-1`), or
- Only apply the default of 12 when constructing a fresh `Pipeline` with the field at its zero value, and otherwise respect an explicit `0` as "disabled".

As written, the code in `Pipeline`, `IngestConfig`, and the example configs documents behavior that this initialization does not implement.
</issue_to_address>

### Comment 2
<location path="pkg/ingest/integration_test.go" line_range="173" />
<code_context>
+	}
+
+	hasChildren := map[tree.SectionID]bool{}
+	for _, s := range sections {
+		if s.ParentID != "" {
+			hasChildren[s.ParentID] = true
</code_context>
<issue_to_address>
**suggestion (testing):** Also assert that non-leaf sections do NOT get candidate_questions to fully exercise the HyDE targeting contract

The integration test confirms that all leaves get `candidate_questions`, but it should also assert the inverse: sections listed in `hasChildren` (internal nodes) have an empty `CandidateQuestions` slice. This will better enforce the contract that HyDE only targets leaves and help catch regressions where questions are written to internal sections.

Suggested implementation:

```golang
	var missingSummary, missingQuestions, unexpectedQuestions []tree.SectionID
	for _, s := range sections {
		if strings.TrimSpace(s.Summary) == "" {
			missingSummary = append(missingSummary, s.ID)
		}
		// HyDE only targets leaves (internal nodes are skipped on purpose).
		// Assert that all leaves have candidate_questions and all internal nodes do not.
		if !hasChildren[s.ID] {
			if len(s.CandidateQuestions) == 0 {
				missingQuestions = append(missingQuestions, s.ID)
			}
		} else if len(s.CandidateQuestions) > 0 {
			unexpectedQuestions = append(unexpectedQuestions, s.ID)
		}

```

To fully enforce the "HyDE only targets leaves" contract, add an assertion after the loop that `len(unexpectedQuestions) == 0`, similar to how `missingQuestions` is currently asserted. For example, fail the test with a helpful message if `unexpectedQuestions` is non-empty, indicating that internal sections (those present in `hasChildren`) incorrectly received `CandidateQuestions`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-05-27T00:23:26Z

+	// Default the global cap to a value that comfortably exceeds the
+	// sum of the two default per-stage caps (4 + 4 = 8) while leaving
+	// some headroom — but stays well below typical provider per-tenant
+	// concurrency limits.
+	if p.GlobalLLMConcurrency < 0 {
+		p.GlobalLLMConcurrency = 0
+	}
+	if p.GlobalLLMConcurrency == 0 {
+		p.GlobalLLMConcurrency = 12
+	}


issue (bug_risk): GlobalLLMConcurrency handling conflicts with the documented "0 disables the global cap" semantics

Current behavior treats GlobalLLMConcurrency == 0 as "use default (12)" and always initializes globalLLMSem when the value is > 0, so there is no way to disable the global semaphore despite comments stating that 0 disables the cap.

To align with the documented semantics, you could either:

Distinguish "unspecified" vs. "explicit zero" (e.g., pointer in config or a sentinel like -1), or

Only apply the default of 12 when constructing a fresh Pipeline with the field at its zero value, and otherwise respect an explicit 0 as "disabled".

As written, the code in Pipeline, IngestConfig, and the example configs documents behavior that this initialization does not implement.

sourcery-ai · 2026-05-27T00:23:26Z

+	}
+
+	hasChildren := map[tree.SectionID]bool{}
+	for _, s := range sections {


suggestion (testing): Also assert that non-leaf sections do NOT get candidate_questions to fully exercise the HyDE targeting contract

The integration test confirms that all leaves get candidate_questions, but it should also assert the inverse: sections listed in hasChildren (internal nodes) have an empty CandidateQuestions slice. This will better enforce the contract that HyDE only targets leaves and help catch regressions where questions are written to internal sections.

Suggested implementation:

var missingSummary, missingQuestions, unexpectedQuestions []tree.SectionID for _, s := range sections { if strings.TrimSpace(s.Summary) == "" { missingSummary = append(missingSummary, s.ID) } // HyDE only targets leaves (internal nodes are skipped on purpose). // Assert that all leaves have candidate_questions and all internal nodes do not. if !hasChildren[s.ID] { if len(s.CandidateQuestions) == 0 { missingQuestions = append(missingQuestions, s.ID) } } else if len(s.CandidateQuestions) > 0 { unexpectedQuestions = append(unexpectedQuestions, s.ID) }

To fully enforce the "HyDE only targets leaves" contract, add an assertion after the loop that len(unexpectedQuestions) == 0, similar to how missingQuestions is currently asserted. For example, fail the test with a helpful message if unexpectedQuestions is non-empty, indicating that internal sections (those present in hasChildren) incorrectly received CandidateQuestions.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 27, 2026 00:21

Copilot started reviewing on behalf of hallelx2 May 27, 2026 00:21 View session

sourcery-ai Bot reviewed May 27, 2026

View reviewed changes

Copilot AI reviewed May 27, 2026

View reviewed changes

hallelx2 merged commit 1100390 into main May 27, 2026
6 of 9 checks passed

hallelx2 deleted the feat/hyde-parallel-summarize branch May 27, 2026 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest): run HyDE in parallel with summarize#15

feat(ingest): run HyDE in parallel with summarize#15
hallelx2 merged 1 commit into
mainfrom
feat/hyde-parallel-summarize

hallelx2 commented May 27, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented May 27, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented May 27, 2026

Review limit reached

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot May 27, 2026

Uh oh!

sourcery-ai Bot May 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hallelx2 commented May 27, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Rationale (Option A over B)

Test plan

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for parallel summarize and HyDE stages with global LLM concurrency

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented May 27, 2026

Review limit reached

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hallelx2 commented May 27, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 27, 2026 •

edited

Loading