Skip to content

feat(ingest): run HyDE in parallel with summarize#15

Merged
hallelx2 merged 1 commit into
mainfrom
feat/hyde-parallel-summarize
May 27, 2026
Merged

feat(ingest): run HyDE in parallel with summarize#15
hallelx2 merged 1 commit into
mainfrom
feat/hyde-parallel-summarize

Conversation

@hallelx2
Copy link
Copy Markdown
Owner

@hallelx2 hallelx2 commented May 27, 2026

Summary

  • Summarize and HyDE now run as concurrent goroutines instead of
    sequential stages. On a 200-section 10-K this roughly halves total
    ingest wall time.
  • HyDE's user prompt no longer references s.Summary (which may not
    be persisted yet when HyDE runs in parallel). Title + first 4K of
    content carry strictly more signal anyway.
  • New ingest.global_llm_concurrency knob (default 12) caps total
    LLM-in-flight across both stages so the provider's per-tenant rate
    limit isn't blown. Per-stage caps (summary_concurrency,
    ingest.hyde.concurrency) still apply.
  • Per-stage failure semantics unchanged: both stages remain
    non-fatal; p.fail is only called on parse / persist errors.

Rationale (Option A over B)

HyDE only needs Title + Content to produce useful questions —
the summary was a 60-word hint derived from the same content the
prompt already gets in full. Removing it lets the two stages run
fully concurrently with no per-section ordering. Per-section
pipelining (Option B) would deliver the same wall-time win at
significantly higher orchestration cost.

Test plan

  • go build ./... clean
  • go vet ./... clean
  • go test ./... — all tests pass
  • New TestRunParallelStagesInterleaves proves HyDE goroutine
    can complete while summarize is still blocked
  • New TestGlobalLLMSemaphoreCapsInFlight proves the shared
    semaphore never lets peak in-flight exceed the configured cap
  • New TestHyDEPromptOmitsSummary regression guard against
    reintroducing the prompt dependency on s.Summary
  • TestPipelineRunParallelSummarizeAndHyDEIntegration (gated on
    TEST_DATABASE_URL) runs the full pipeline against the
    rust-ownership.md fixture and asserts (a) doc reaches
    ready, (b) every section has a summary, (c) every leaf has
    candidate_questions, (d) first HyDE call's timestamp precedes
    the last summarize call's

Summary by Sourcery

Run the summarize and HyDE ingest stages in parallel under a configurable global LLM concurrency cap, and update configuration, prompts, and tests to support and validate the new orchestration.

New Features:

  • Run summarize and HyDE ingest stages concurrently rather than sequentially to reduce ingest wall time
  • Introduce a global LLM concurrency cap shared across summarize and HyDE stages, configurable via YAML and environment variable

Enhancements:

  • Adjust HyDE prompt generation to rely on section title and content only, removing the dependency on stored summaries so it can safely run in parallel
  • Wire ingest configuration through engine and server binaries to pass the new global LLM concurrency limit into the pipeline

Documentation:

  • Document the new global LLM concurrency setting in example engine and server configuration files, including defaults and disabling behavior

Tests:

  • Add unit tests covering parallel stage execution semantics, independent error handling, and behavior when HyDE is disabled
  • Add tests validating the global LLM semaphore behavior, including respecting caps, cancellation, and no-cap paths
  • Add a HyDE prompt regression test ensuring summaries are not included when generating candidate questions
  • Add an end-to-end ingest integration test that verifies summaries and candidate questions are populated and stages actually interleave in production-like conditions

Summarize and HyDE now run as concurrent goroutines instead of
strictly sequential stages. HyDE's input is (title, content) — the
section summary was a weak hint and is now omitted from the prompt,
which removes the only ordering dependency between the two stages.
A new ingest.global_llm_concurrency knob (default 12) caps total
LLM-in-flight across both stages so the provider's per-tenant limit
isn't blown.

Option A (fully concurrent stages) was chosen over per-section
pipelining because HyDE has no hard dependency on summary text:
title + the first 4K of content carry strictly more signal than a
60-word summary derived from that same content.

Test coverage:
 - runParallelStages: interleave proved by blocking summarize while
   HyDE completes
 - global semaphore: peak in-flight never exceeds the cap under load
 - cancellation: acquire returns ok=false on a canceled ctx
 - prompt regression guard: s.Summary text must not appear in the
   HyDE user prompt
 - integration: gated on TEST_DATABASE_URL, ingests the rust
   markdown fixture end-to-end, asserts every section has a summary
   and every leaf has candidate_questions, and verifies the first
   HyDE call's timestamp precedes the last summarize call's
Copilot AI review requested due to automatic review settings May 27, 2026 00:21
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 27, 2026

Reviewer's Guide

Runs the summarize and HyDE ingest stages concurrently while introducing a shared global LLM concurrency cap, updating HyDE prompts to no longer depend on persisted summaries, wiring configuration/CLI, and adding unit/integration tests to validate concurrency and prompt behavior.

Sequence diagram for parallel summarize and HyDE stages with global LLM concurrency

sequenceDiagram
  title Parallel summarize and HyDE with shared global LLM cap
  participant Pipeline
  participant runParallelStages
  participant SummarizeStage as summarize
  participant HyDEStage as generateCandidateQuestions
  participant GlobalLLMSemaphore as globalLLMSem

  Pipeline->>runParallelStages: runParallelStages(ctx, summarizeFn, hydeFn)
  par summarize goroutine
    runParallelStages->>SummarizeStage: summarize(ctx, docID, profile)
    loop per section
      SummarizeStage->>Pipeline: acquireGlobalLLM(ctx)
      alt globalLLMSem enabled
        Pipeline->>GlobalLLMSemaphore: send struct{}
        GlobalLLMSemaphore-->>Pipeline: acquired
      else globalLLMSem disabled
        Pipeline-->>SummarizeStage: no-op
      end
      SummarizeStage->>SummarizeStage: summaryFor(ctx, section, childLines, profile)
      SummarizeStage-->>GlobalLLMSemaphore: release slot
    end
  and HyDE goroutine
    runParallelStages->>HyDEStage: generateCandidateQuestions(ctx, docID, profile)
    loop per leaf section
      HyDEStage->>Pipeline: acquireGlobalLLM(ctx)
      alt globalLLMSem enabled
        Pipeline->>GlobalLLMSemaphore: send struct{}
        GlobalLLMSemaphore-->>Pipeline: acquired
      else globalLLMSem disabled
        Pipeline-->>HyDEStage: no-op
      end
      HyDEStage->>HyDEStage: candidateQuestionsFor(ctx, section, profile)
      HyDEStage-->>GlobalLLMSemaphore: release slot
    end
  end
  runParallelStages-->>Pipeline: summarizeErr, hydeErr
  Pipeline->>Pipeline: log summarizeErr / hydeErr and SetDocumentStatus(StatusReady)
Loading

File-Level Changes

Change Details Files
Run summarize and HyDE ingest stages in parallel with preserved non-fatal stage semantics.
  • Refactored Pipeline.Run to construct summarize and HyDE stage functions and execute them via a new runParallelStages helper using goroutines and a WaitGroup.
  • Ensured HyDE stage is conditionally invoked based on HyDEEnabled and that errors from each stage are captured and logged independently without failing the pipeline.
  • Added timing log for combined summarize+HyDE elapsed time.
pkg/ingest/ingest.go
Introduce a global LLM concurrency cap shared across summarize and HyDE stages.
  • Extended Pipeline struct with GlobalLLMConcurrency and a backing globalLLMSem channel plus constructor defaulting/validation logic.
  • Implemented acquireGlobalLLM helper that blocks on the shared semaphore (when enabled) and returns a release function while respecting context cancellation.
  • Wrapped calls to summaryFor and candidateQuestionsFor with acquireGlobalLLM/release to enforce the shared cap in addition to per-stage semaphores.
pkg/ingest/ingest.go
pkg/ingest/hyde.go
Expose configuration and CLI wiring for the global LLM concurrency cap.
  • Added GlobalLLMConcurrency field to IngestConfig with defaults, validation, and environment override (VLE_INGEST_GLOBAL_LLM_CONCURRENCY).
  • Plumbed Ingest.GlobalLLMConcurrency into Pipeline construction in engine and server binaries.
  • Documented the new global_llm_concurrency option in example config files for both engine and server.
pkg/config/config.go
cmd/engine/main.go
cmd/server/main.go
config.example.yaml
config.server.example.yaml
Update HyDE prompt construction to remove dependency on section summaries so HyDE can run fully concurrently with summarize.
  • Modified candidateQuestionsFor to omit s.Summary from the user prompt, relying on title and content only while preserving system prompt and behavior.
  • Updated HyDE stage docstring to explicitly state it is safe to run concurrently with summarize because it no longer depends on summaries.
pkg/ingest/hyde.go
Add unit and integration tests to validate parallel stage behavior, global concurrency cap, and prompt regression guarantees.
  • Added runParallelStages unit tests covering interleaving, independent error reporting, and nil HyDE function behavior.
  • Added tests for acquireGlobalLLM behavior with/without a cap and under context cancellation, plus a recording-LLM test proving the shared semaphore never exceeds the configured cap.
  • Introduced a prompt regression test to ensure HyDE prompts never include section summaries.
  • Added an end-to-end ingest Pipeline.Run integration test (gated on TEST_DATABASE_URL) that runs against a rust-ownership.md fixture and asserts document readiness, summaries for all sections, candidate_questions for leaves, and real interleaving between summarize and HyDE LLM calls.
pkg/ingest/ingest_test.go
pkg/ingest/integration_test.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Warning

Review limit reached

@hallelx2, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 53 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 448d6a31-fe7c-490c-ab2e-b6dc156e29ae

📥 Commits

Reviewing files that changed from the base of the PR and between 5b31b9d and 08daf81.

📒 Files selected for processing (9)
  • cmd/engine/main.go
  • cmd/server/main.go
  • config.example.yaml
  • config.server.example.yaml
  • pkg/config/config.go
  • pkg/ingest/hyde.go
  • pkg/ingest/ingest.go
  • pkg/ingest/ingest_test.go
  • pkg/ingest/integration_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/hyde-parallel-summarize

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • runParallelStages writes summarizeErr/hydeErr from separate goroutines without synchronization, which is a data race; consider returning errors via channels or capturing them in local variables guarded by a mutex instead of assigning to outer variables from goroutines.
  • The GlobalLLMConcurrency semantics are inconsistent: comments/config say 0 disables the global cap, but NewPipeline currently treats 0 as "set default 12" so there is no way to disable the cap; either adjust the constructor logic to leave 0 as 0 (and keep globalLLMSem nil) or update the comments/config description to match the actual behavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- runParallelStages writes summarizeErr/hydeErr from separate goroutines without synchronization, which is a data race; consider returning errors via channels or capturing them in local variables guarded by a mutex instead of assigning to outer variables from goroutines.
- The GlobalLLMConcurrency semantics are inconsistent: comments/config say 0 disables the global cap, but NewPipeline currently treats 0 as "set default 12" so there is no way to disable the cap; either adjust the constructor logic to leave 0 as 0 (and keep globalLLMSem nil) or update the comments/config description to match the actual behavior.

## Individual Comments

### Comment 1
<location path="pkg/ingest/ingest.go" line_range="133-142" />
<code_context>
 	if p.HyDEConcurrency <= 0 {
 		p.HyDEConcurrency = 4
 	}
+	// Default the global cap to a value that comfortably exceeds the
+	// sum of the two default per-stage caps (4 + 4 = 8) while leaving
+	// some headroom — but stays well below typical provider per-tenant
+	// concurrency limits.
+	if p.GlobalLLMConcurrency < 0 {
+		p.GlobalLLMConcurrency = 0
+	}
+	if p.GlobalLLMConcurrency == 0 {
+		p.GlobalLLMConcurrency = 12
+	}
+	if p.GlobalLLMConcurrency > 0 {
+		p.globalLLMSem = make(chan struct{}, p.GlobalLLMConcurrency)
+	}
</code_context>
<issue_to_address>
**issue (bug_risk):** GlobalLLMConcurrency handling conflicts with the documented "0 disables the global cap" semantics

Current behavior treats `GlobalLLMConcurrency == 0` as "use default (12)" and always initializes `globalLLMSem` when the value is > 0, so there is no way to disable the global semaphore despite comments stating that `0` disables the cap.

To align with the documented semantics, you could either:
- Distinguish "unspecified" vs. "explicit zero" (e.g., pointer in config or a sentinel like `-1`), or
- Only apply the default of 12 when constructing a fresh `Pipeline` with the field at its zero value, and otherwise respect an explicit `0` as "disabled".

As written, the code in `Pipeline`, `IngestConfig`, and the example configs documents behavior that this initialization does not implement.
</issue_to_address>

### Comment 2
<location path="pkg/ingest/integration_test.go" line_range="173" />
<code_context>
+	}
+
+	hasChildren := map[tree.SectionID]bool{}
+	for _, s := range sections {
+		if s.ParentID != "" {
+			hasChildren[s.ParentID] = true
</code_context>
<issue_to_address>
**suggestion (testing):** Also assert that non-leaf sections do NOT get candidate_questions to fully exercise the HyDE targeting contract

The integration test confirms that all leaves get `candidate_questions`, but it should also assert the inverse: sections listed in `hasChildren` (internal nodes) have an empty `CandidateQuestions` slice. This will better enforce the contract that HyDE only targets leaves and help catch regressions where questions are written to internal sections.

Suggested implementation:

```golang
	var missingSummary, missingQuestions, unexpectedQuestions []tree.SectionID
	for _, s := range sections {
		if strings.TrimSpace(s.Summary) == "" {
			missingSummary = append(missingSummary, s.ID)
		}
		// HyDE only targets leaves (internal nodes are skipped on purpose).
		// Assert that all leaves have candidate_questions and all internal nodes do not.
		if !hasChildren[s.ID] {
			if len(s.CandidateQuestions) == 0 {
				missingQuestions = append(missingQuestions, s.ID)
			}
		} else if len(s.CandidateQuestions) > 0 {
			unexpectedQuestions = append(unexpectedQuestions, s.ID)
		}

```

To fully enforce the "HyDE only targets leaves" contract, add an assertion after the loop that `len(unexpectedQuestions) == 0`, similar to how `missingQuestions` is currently asserted. For example, fail the test with a helpful message if `unexpectedQuestions` is non-empty, indicating that internal sections (those present in `hasChildren`) incorrectly received `CandidateQuestions`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread pkg/ingest/ingest.go
Comment on lines +133 to +142
// Default the global cap to a value that comfortably exceeds the
// sum of the two default per-stage caps (4 + 4 = 8) while leaving
// some headroom — but stays well below typical provider per-tenant
// concurrency limits.
if p.GlobalLLMConcurrency < 0 {
p.GlobalLLMConcurrency = 0
}
if p.GlobalLLMConcurrency == 0 {
p.GlobalLLMConcurrency = 12
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): GlobalLLMConcurrency handling conflicts with the documented "0 disables the global cap" semantics

Current behavior treats GlobalLLMConcurrency == 0 as "use default (12)" and always initializes globalLLMSem when the value is > 0, so there is no way to disable the global semaphore despite comments stating that 0 disables the cap.

To align with the documented semantics, you could either:

  • Distinguish "unspecified" vs. "explicit zero" (e.g., pointer in config or a sentinel like -1), or
  • Only apply the default of 12 when constructing a fresh Pipeline with the field at its zero value, and otherwise respect an explicit 0 as "disabled".

As written, the code in Pipeline, IngestConfig, and the example configs documents behavior that this initialization does not implement.

}

hasChildren := map[tree.SectionID]bool{}
for _, s := range sections {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Also assert that non-leaf sections do NOT get candidate_questions to fully exercise the HyDE targeting contract

The integration test confirms that all leaves get candidate_questions, but it should also assert the inverse: sections listed in hasChildren (internal nodes) have an empty CandidateQuestions slice. This will better enforce the contract that HyDE only targets leaves and help catch regressions where questions are written to internal sections.

Suggested implementation:

	var missingSummary, missingQuestions, unexpectedQuestions []tree.SectionID
	for _, s := range sections {
		if strings.TrimSpace(s.Summary) == "" {
			missingSummary = append(missingSummary, s.ID)
		}
		// HyDE only targets leaves (internal nodes are skipped on purpose).
		// Assert that all leaves have candidate_questions and all internal nodes do not.
		if !hasChildren[s.ID] {
			if len(s.CandidateQuestions) == 0 {
				missingQuestions = append(missingQuestions, s.ID)
			}
		} else if len(s.CandidateQuestions) > 0 {
			unexpectedQuestions = append(unexpectedQuestions, s.ID)
		}

To fully enforce the "HyDE only targets leaves" contract, add an assertion after the loop that len(unexpectedQuestions) == 0, similar to how missingQuestions is currently asserted. For example, fail the test with a helpful message if unexpectedQuestions is non-empty, indicating that internal sections (those present in hasChildren) incorrectly received CandidateQuestions.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@hallelx2 hallelx2 merged commit 1100390 into main May 27, 2026
6 of 9 checks passed
@hallelx2 hallelx2 deleted the feat/hyde-parallel-summarize branch May 27, 2026 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants