Skip to content

[MMAP] Add in-memory (mmap) stream index mapping.#310

Open
rfsaliev wants to merge 17 commits intomainfrom
rfsaliev/mmap-loading
Open

[MMAP] Add in-memory (mmap) stream index mapping.#310
rfsaliev wants to merge 17 commits intomainfrom
rfsaliev/mmap-loading

Conversation

@rfsaliev
Copy link
Copy Markdown
Member

@rfsaliev rfsaliev commented Apr 3, 2026

This pull request adds to CPP Runtime API support for loading vector search indices (FlatIndex and VamanaIndex) from memory-mapped files and memory buffers, in addition to existing stream-based loading. It introduces new API methods, updates internal implementations to manage mapped resources, and adds comprehensive tests to ensure correct behavior for all supported storage types.

Major new features:

  • Added map_to_file and map_to_memory static methods to both FlatIndex and VamanaIndex classes, allowing indices to be loaded directly from memory-mapped files or memory buffers, in addition to streams. [1] [2]
  • Implemented the corresponding logic in the source files, including resource management to keep memory-mapped streams alive as long as the index exists. [1] [2]

Internal implementation changes:

  • Added and updated constructors and helper methods in FlatIndexImpl and VamanaIndexImpl to support mapping from streams, and to manage the lifetime of mapped streams via unique_ptr members. [1] [2] [3] [4] [5]
  • Included necessary headers for memory stream support in relevant files. [1] [2] [3] [4]
  • Updated dataset loading logic to enforce that view allocators are only used with memory streams, improving safety and correctness.

Testing improvements:

  • Added a generic test helper (write_and_map_index) and comprehensive test cases to verify correct loading and querying of indices from memory-mapped files for all supported storage kinds and index types. [1] [2] [3] [4] [5] [6] [7]

@rfsaliev rfsaliev requested a review from razdoburdin April 3, 2026 13:25
@rfsaliev rfsaliev force-pushed the rfsaliev/mmap-loading branch 2 times, most recently from 1b35ca7 to 9e5588f Compare April 14, 2026 13:32
Comment on lines +98 to +113

// Load from a memory-mapped file.
// The file is expected to be in the format produced by save().
static Status map_to_file(
VamanaIndex** index, const char* path, MetricType metric, StorageKind storage_kind
) noexcept;

// Load from a memory buffer.
// The buffer is expected to be in the format produced by save().
static Status map_to_memory(
VamanaIndex** index,
const void* data,
size_t size,
MetricType metric,
StorageKind storage_kind
) noexcept;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexanderguzhva, is this API suitable?

Base automatically changed from dev/razdoburdin_streaming to main April 16, 2026 09:55
…ests

- Introduced `memstream.h` and `memstream.cpp` for memory-mapped stream functionality.
- Updated `io.h` and `simple.h` to include memory stream support.
- Enhanced `simple.cpp` and `flat.cpp` tests to validate loading from memory streams.
@rfsaliev rfsaliev force-pushed the rfsaliev/mmap-loading branch from 9e5588f to a5ed391 Compare April 16, 2026 14:25
@rfsaliev rfsaliev marked this pull request as ready for review April 20, 2026 09:54
@rfsaliev rfsaliev requested a review from Copilot April 20, 2026 10:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds mmap/in-memory stream support for loading indices and datasets (zero-copy “view” loading) across the core library and C++ runtime API, plus tests to validate behavior.

Changes:

  • Introduces mmstream/mmstreambuf, is_memory_stream, and pointer helpers (current_ptr/begin_ptr/end_ptr) plus a MemoryStreamAllocator to enable view-based, zero-copy loading.
  • Updates loading paths (e.g., SimpleData, SimpleGraph, Vamana graph selection) to support view allocators and to enforce memory-stream requirements.
  • Extends C++ runtime API with map_to_file / map_to_memory for Flat and Vamana indices and adds new coverage.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
include/svs/core/io/memstream.h Adds mmapped streambuf/istream + memory-stream utilities and allocator for zero-copy loads
tests/svs/core/io/memstream.cpp Adds unit tests for mmstream, pointer helpers, and memory-stream detection
include/svs/core/data/simple.h Adds view-only load path from ContextFreeLoadTable + memory-stream enforcement
include/svs/core/data/io.h Skips populate() for view allocators; enforces memory-stream constraint
include/svs/core/graph/graph.h Allows forwarding allocator args during stream load to support view allocators
include/svs/orchestrators/vamana.h Adjusts graph type selection for view-allocated data (mmapped/index views)
include/svs/lib/array.h Generalizes “view allocator” DenseArray specialization to support MemoryStreamAllocator
include/svs/quantization/scalar/scalar.h Fixes min/max init and adjusts SQDataset load APIs + adds mutable get_datum
bindings/cpp/include/svs/runtime/flat_index.h Adds map_to_file / map_to_memory for FlatIndex
bindings/cpp/src/flat_index.cpp Implements FlatIndex mapping APIs
bindings/cpp/src/flat_index_impl.h Holds mapped stream lifetime and adds map_to_stream implementation
bindings/cpp/include/svs/runtime/vamana_index.h Adds map_to_file / map_to_memory for VamanaIndex
bindings/cpp/src/vamana_index.cpp Implements VamanaIndex mapping APIs
bindings/cpp/src/vamana_index_impl.h Holds mapped stream lifetime and adds map_to_stream implementation
bindings/cpp/tests/runtime_test.cpp Adds runtime tests for mapping APIs across storage kinds
tests/svs/core/data/simple.cpp Adds tests for loading SimpleDataView from stringstream + error-path test
tests/svs/quantization/scalar/scalar.cpp Adds SQDataset view-load test from stringstream
tests/svs/index/flat/flat.cpp Adds FlatIndex view-load tests from stringstream and mmapped file
tests/svs/index/vamana/index.cpp Adds Vamana view-load tests from stringstream/mmapped file + SQ view-load test
tests/CMakeLists.txt Registers new memstream test source

Comment thread include/svs/core/io/memstream.h Outdated
Comment thread include/svs/core/io/memstream.h Outdated
Comment thread include/svs/core/io/memstream.h Outdated
Comment thread include/svs/core/io/memstream.h
Comment thread include/svs/core/data/simple.h Outdated
Comment thread bindings/cpp/src/vamana_index.cpp
Comment thread bindings/cpp/tests/runtime_test.cpp Outdated
Comment thread tests/svs/index/flat/flat.cpp Outdated
@rfsaliev rfsaliev force-pushed the rfsaliev/mmap-loading branch from 748365f to f04b1c2 Compare April 22, 2026 12:43
@rfsaliev rfsaliev requested a review from Copilot April 22, 2026 12:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 8 comments.

Comment thread include/svs/core/io/memstream.h
Comment thread include/svs/core/io/memstream.h
Comment thread include/svs/core/io/memstream.h
Comment thread include/svs/core/io/memstream.h
Comment thread include/svs/core/io/memstream.h
Comment thread tests/svs/core/io/memstream.cpp
Comment thread tests/svs/core/io/memstream.cpp
Comment thread bindings/cpp/tests/runtime_test.cpp
rfsaliev and others added 2 commits April 22, 2026 17:24
Copy link
Copy Markdown
Member

@ibhati ibhati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

@ethanglaser
Copy link
Copy Markdown
Member

Please address CI failure before merging

@ethanglaser
Copy link
Copy Markdown
Member

ethanglaser commented Apr 23, 2026

And if it is expected to require updates from private source to work correctly then we'll need to update https://github.com/intel/ScalableVectorSearch/blob/main/bindings/cpp/CMakeLists.txt#L126 with LTO shared lib build from the private PR (can publish as nightly here)

rfsaliev and others added 2 commits April 24, 2026 10:17
Co-authored-by: Copilot <copilot@github.com>
…ed allocator handling

Co-authored-by: Copilot <copilot@github.com>
rfsaliev and others added 2 commits April 28, 2026 11:19
…get_allocator to 'view' DenseArray

Co-authored-by: Copilot <copilot@github.com>
@rfsaliev rfsaliev requested a review from Copilot April 28, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants