Eliminate float[] and ByteBuffer allocations in compaction inline-record path#16
Merged
Merged
Conversation
…ord path Allocation profiling of a HerdDB indexing run showed CompactWriter.writeInlineNodeRecord accounting for 24% of alloc-event bytes: a per-neighbor float[] inside ProductQuantization.encodeTo (21%) and a per-record ByteBuffer.allocate via ByteBufferIndexWriter.cloneBuffer (2.4%). This change removes both: - Add ProductQuantization.encodeTo(vector, scratch, dest) that uses VectorUtil.subInto into a caller-provided buffer when a global centroid is configured. The existing 2-arg overload keeps its allocating behavior. The compactor passes Scratch.tmpVec (per-worker, dimension sized, already dead after retainDiverse returns) as the scratch. - Plumb the level-0 FileChannel into CompactWriter via setInlineChannel and write the per-thread record buffer directly to disk from the worker. Drops ByteBufferIndexWriter.cloneBuffer and the WriteResult.data field; the consumer in compactLevels is now a no-op since records hit disk inside the worker. FileChannel.write(ByteBuffer, long) is positional and thread-safe. 275 jvector-tests pass; the level-0 byte layout is unchanged so existing compacted indexes remain readable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Profiling a HerdDB indexing workload showed
CompactWriter.writeInlineNodeRecordaccounting for 24% of alloc-event bytes:writeInlineNodeRecord:222pq.encodeTo→VectorUtil.sub(vector, globalCentroid)→ freshArrayVectorFloat(float[]) per neighborwriteInlineNodeRecord:248bwriter.cloneBuffer()→ByteBuffer.allocate(recordSize)per recordArrayVectorFloat.<init>was the single largest allocator in the whole profile (20.6% self) and fully attributable to the PQ encode call inside compaction.This PR removes both allocations on this hot path.
Changes
ProductQuantization.encodeTo(vector, scratch, dest)— new 3-arg overload that, when a global centroid is configured, usesVectorUtil.subIntoto write the centered vector into a caller-supplied scratch buffer instead of allocating a fresh one. The existing 2-arg overload keeps its allocating behavior (it's theVectorCompressor.encodeToimplementation; all other callers stay unchanged).OnDiskGraphIndexCompactor.processBaseNode— passesscratch.tmpVecas the scratch buffer.tmpVecis per-worker, dimension-sized, and already dead by the timeretainDiversereturns, so no extra allocation is needed.CompactWriter.setInlineChannel(FileChannel)+ direct positional writes — the level-0FileChannelis plumbed intoCompactWriter, andwriteInlineNodeRecordwrites the per-thread record buffer directly to disk viaFileChannel.write(ByteBuffer, long). This positional API is thread-safe and lets us dropByteBufferIndexWriter.cloneBufferand theWriteResult.datafield; the level-0 consumer incompactLevelsis now a no-op since records hit disk inside the worker.The on-disk byte layout at level 0 is unchanged, so existing compacted indexes remain readable.
Test plan
mvn -pl jvector-tests -am test— 275 tests pass, 0 failures / 0 errors (2 pre-existing skipped).mvn -pl jvector-tests -am -Dtest='io.github.jbellis.jvector.graph.disk.*Test*,TestProductQuantization,TestPQRetrainer*' test— focused on the touched code paths (42 tests pass).ArrayVectorFloat.<init>drops to near zero in the compactor stack andHeapByteBuffer.<init>drops the 2.3% attributable tocloneBuffer.Generated with Claude Code