Add CompactionProgressListener to OnDiskGraphIndexCompactor#7
Merged
Conversation
Introduce a dedicated CompactionProgressListener functional interface and a compact(Path, CompactionProgressListener) overload so callers can track streaming compaction I/O progress without parsing log messages. The listener is called every ten batches (and at level completion) with (completedBatches, totalBatches) for each graph level processed. The existing compact(Path) delegate passes null and preserves full backward compatibility. Motivated by HerdDB issue datastax#530: the index-optimizer's GET /status endpoint showed batches_written=0 / pct_complete=0 throughout multi- minute streaming compaction runs because there was no hook to propagate the per-batch progress counters back to the HTTP status object.
eolivelli
added a commit
to eolivelli/herddb
that referenced
this pull request
May 12, 2026
…progress callback (#531) Fixes #530. ## Root cause `GET /status` on the index-optimizer always returned `batches_written: 0`, `batches_total: 0`, `pct_complete: 0.00` throughout a streaming compaction because `RemoteSegmentGraphMerger.mergeStreaming()` never forwarded the `batchListener` field to `OnDiskGraphIndexCompactor`. The legacy in-memory path (`mergeLegacy`) already wired the callback to `buildGraph()`; the streaming path had no equivalent plumbing. ## Changes - **`RemoteSegmentGraphMerger.java`** - On entering the `"compacting"` phase, fires an initial `(0, keptCount)` notification so `/status` immediately shows a non-zero denominator — not only after the first batch completes. - Creates a `CompactionProgressListener` that delegates to the `batchListener` and passes it to the new `OnDiskGraphIndexCompactor.compact(Path, CompactionProgressListener)` overload (added in `eolivelli/jvector#7`, now on `eolivelli/jvector` main). - Introduces `fireBatchProgress(LongBinaryOperator, long, long)` static helper with `@SuppressFBWarnings` to bridge the JDK's void-less `LongBinaryOperator` to the typed `void onProgress()` contract without SpotBugs false positives. ## Tests - **`StreamingCompactionBatchProgressTest`** (new, plain test — no cluster infra required): builds 3 real on-disk segments via `PersistentVectorStore`, enables `VectorIndexCompactor.streamingCompactionEnabled`, runs a full streaming merge, and asserts: 1. An initial `(0, keptCount=600)` notification fires immediately when the `"compacting"` phase begins (denominator always non-zero from the first instant). 2. At least one `(completed ≥ 10, total > 0)` notification arrives from `OnDiskGraphIndexCompactor` (proves the jvector callback actually fires). 3. A final `(completed == total)` notification marks level completion (100% signal). 4. `completed` values are monotonically non-decreasing within a level. 🤖 Implemented by the `pr-worker` agent.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CompactionProgressListener— a dedicated@FunctionalInterfacein theio.github.jbellis.jvector.graph.diskpackage whose single methodonProgress(long completedBatches, long totalBatches)is called every ten batches (and at the final batch of each level) during streaming compaction.compact(Path, CompactionProgressListener)overload toOnDiskGraphIndexCompactor; the existingcompact(Path)delegates withnull— fully backward-compatible.compactLevels()→runBatchesWithBackpressure(), alongside the existing% 10SLF4J log line.completed == totalso callers always see a 100 % notification at level completion (previously the final batch was only logged iftotal % 10 == 0).Motivation
HerdDB issue eolivelli/herddb#530: the index-optimizer's
GET /statusendpoint returnedbatches_written: 0 / pct_complete: 0.00throughout multi-minute streaming compaction runs because there was no callback to propagate the per-batch counters back to the HTTP status object. Parsing the SLF4J log messages from the caller side would have been fragile; a typed interface is the right abstraction.Test plan
mvn test)StreamingCompactionBatchProgressTest(in PR [index-optimizer] GET /status returns batches_written=0 / pct_complete=0 throughout compaction herddb#530-fix) verifies thatonProgressis called with increasing(completed, total)during a real 3-segment streaming merge🤖 Implemented by the
pr-workeragent.