ntvecdb is a minimal, embedded vector database designed to demonstrate how modern vector databases work under the hood. It supports storing high-dimensional embeddings, approximate nearest-neighbor search using HNSW, persistence to disk, concurrent access, and batch queries.
This project was built as part of a systems-focused exercise. The emphasis is on algorithmic understanding, architecture, and explicit trade-offs, rather than production completeness.
- HNSW index implemented from scratch for scalable ANN search
- Exact (brute-force) search for correctness validation
- Thread-safe engine
- Multiple concurrent readers
- Serialized writers
- Batch search API for bulk query workloads
- Explicit persistence (save/load index to disk)
- Embedded, in-process design (DuckDB/SQLite-style)
- Minimal CLI for demonstration
- No external dependencies
- C++17 compatible compiler (clang or gcc)
- CMake ≥ 3.10
git clone <your-repo-url>
cd ntvecdb
mkdir build && cd build
cmake ..
make./correctness # HNSW vs brute-force comparison + persistence
./engine_sanity # VecDBEngine API validation
./concurrency_test # Multi-threaded stress test
./benchmark # Performance comparison./vecdb # Shows usagentvecdb is designed as an embedded, in-process vector database, similar in spirit to DuckDB.
Application / CLI
|
VecDBEngine
|
HNSWIndex
|
In-memory graph
- The engine owns the index and enforces all concurrency guarantees.
- The index is kept entirely in memory for fast access.
- Persistence is explicit and opt-in via save/load operations.
- There is no server or IPC layer; concurrency is handled within a single process.
This design keeps the system simple, testable, and focused on the core problem: efficient similarity search.
The HNSW (Hierarchical Navigable Small World) index is a multi-layer graph structure that enables sub-linear approximate nearest-neighbor search.
High-level idea:
- Vectors are stored as nodes in a layered graph
- Upper layers provide long-range navigation
- Lower layers refine local neighborhoods
- Search starts from an entry point at the top layer and greedily descends
This implementation focuses on clarity and correctness rather than aggressive optimization, closely following the ideas from the original HNSW paper.
Reference: Yu. A. Malkov and D. A. Yashunin, "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. [arXiv:1603.09320]
- Each vector has:
- A unique integer ID
- A fixed-dimension float embedding
- All embeddings in an index share the same dimensionality
- (Future work) Metadata can be attached to vectors for filtered search
| Method | Description |
|---|---|
create_index(dim, M, efConstruction) |
Create a new HNSW index |
insert(id, embedding) |
Insert a vector |
search(query, k, efSearch) |
Single query search |
batch_search(queries, k, efSearch) |
Bulk query search |
save_db(db_name) |
Save index to database root |
load_db(db_name) |
Load index from database root |
save_index_path(path) |
Save index to absolute path |
load_index_path(path) |
Load index from absolute path |
set_db_root(root) |
Set database root directory |
has_index() |
Check if index is loaded |
batch_search executes multiple queries under a single shared lock, amortizing synchronization overhead. This is useful for bulk query workloads.
- Implemented using
std::shared_mutex - Multiple concurrent readers (search, batch_search, has_index)
- Exclusive writers (insert, create, load, save)
This guarantees:
- No data races
- No partial visibility of index state
- Deterministic behavior under concurrency
- Each write operation is atomic and isolated
- Reads observe either the state before or after a write
- No partial updates are ever visible
- The index lives in memory during execution
- Persistence is explicit via save/load operations
- Index state is serialized to disk as binary files (meta.bin + nodes.bin)
- On startup, the engine can load an existing index
This keeps durability simple and predictable while avoiding the complexity of WAL or recovery protocols.
A minimal CLI is provided for demonstration and testing.
Example usage:
vecdb create wiki 128
vecdb insert wiki 1 0.1,0.2,0.3,...
vecdb search wiki 0.1,0.2,0.3,... 5 50
vecdb batch-search wiki queries.txt 5 50
vecdb save wiki
vecdb load wikiThe CLI is intentionally lightweight and delegates all logic to the engine.
-
correctness Compares HNSW search results against brute-force ground truth. Also validates save/load persistence.
-
engine_sanity Tests VecDBEngine API: create, insert, search, save, and load operations.
-
concurrency_test Spawns multiple reader and writer threads to validate thread safety and measure contention overhead.
-
benchmark Measures search latency for brute-force vs HNSW across different dataset sizes.
Performance comparison of brute-force search vs HNSW (dim=128, k=10, 100 queries, efSearch=100):
| Vectors | Brute Force (ms) | HNSW (ms) | Speedup |
|---|---|---|---|
| 1,000 | 43.10 | 69.37 | 0.62x |
| 5,000 | 215.41 | 113.35 | 1.90x |
| 10,000 | 432.23 | 125.12 | 3.45x |
At small dataset sizes, HNSW's graph construction overhead exceeds the benefit. As the dataset grows, HNSW's sub-linear search demonstrates clear speedups over linear scan.
(Results from Apple M3, your numbers may vary.)
The current implementation prioritizes read-heavy workloads:
- Multiple readers can execute searches in parallel (
shared_lock) batch_searchamortizes lock acquisition for bulk queries- Writers are serialized and block readers during graph updates
This aligns well with typical vector search use cases: embeddings are generated offline (or infrequently), and the system serves many concurrent search requests.
HNSW is not optimized for high-frequency small writes:
- Each insert traverses the graph to find optimal neighbors
- Neighbor connections are updated, potentially affecting multiple nodes
- Writers hold an exclusive lock, blocking all concurrent readers
For workloads with frequent inserts (e.g., real-time embedding ingestion), this becomes a bottleneck.
A common solution in production vector databases (e.g., Milvus, Pinecone) is a dual-index approach:
Writes → Append Buffer (fast) → [Background Merge] → HNSW Index
↓ ↓
Reads query both (buffer + index)
- Write path: New vectors append to an in-memory buffer (O(1), no graph update)
- Read path: Query both the buffer (brute-force scan) and the HNSW index, merge results
- Background merge: Periodically flush the buffer into the HNSW index
This would provide:
- Fast writes (no blocking graph traversal)
- Still-fast reads (small buffer scan + HNSW lookup)
- Eventually consistent index state
This extension is not implemented but represents a natural evolution for OLTP-style workloads.
- In-memory only (no memory-mapped storage)
- No delete or update operations
- No metadata filtering (yet)
- No multi-process coordination
- No WAL or crash recovery
- Metadata-aware search
- Incremental persistence / WAL
- Memory-mapped index
- Multi-process server mode (IPC)
- Distributed indexing / sharding
ntvecdb focuses on core vector database concepts:
- scalable ANN search
- concurrency control
- persistence
- clear separation of concerns
The project intentionally avoids overengineering and prioritizes correctness, clarity, and explainability.