ntvecdb — A Minimal Vector Database with HNSW

ntvecdb is a minimal, embedded vector database designed to demonstrate how modern vector databases work under the hood. It supports storing high-dimensional embeddings, approximate nearest-neighbor search using HNSW, persistence to disk, concurrent access, and batch queries.

This project was built as part of a systems-focused exercise. The emphasis is on algorithmic understanding, architecture, and explicit trade-offs, rather than production completeness.

Features

HNSW index implemented from scratch for scalable ANN search
Exact (brute-force) search for correctness validation
Thread-safe engine
- Multiple concurrent readers
- Serialized writers
Batch search API for bulk query workloads
Explicit persistence (save/load index to disk)
Embedded, in-process design (DuckDB/SQLite-style)
Minimal CLI for demonstration
No external dependencies

Quick Start

Prerequisites

C++17 compatible compiler (clang or gcc)
CMake ≥ 3.10

Build

git clone <your-repo-url>
cd ntvecdb
mkdir build && cd build
cmake ..
make

Run Tests

./correctness        # HNSW vs brute-force comparison + persistence
./engine_sanity      # VecDBEngine API validation
./concurrency_test   # Multi-threaded stress test
./benchmark          # Performance comparison

Run CLI

./vecdb              # Shows usage

Architecture Overview

ntvecdb is designed as an embedded, in-process vector database, similar in spirit to DuckDB.

Application / CLI
        |
   VecDBEngine
        |
     HNSWIndex
        |
   In-memory graph

Key design choices

The engine owns the index and enforces all concurrency guarantees.
The index is kept entirely in memory for fast access.
Persistence is explicit and opt-in via save/load operations.
There is no server or IPC layer; concurrency is handled within a single process.

This design keeps the system simple, testable, and focused on the core problem: efficient similarity search.

HNSW Index (High Level)

The HNSW (Hierarchical Navigable Small World) index is a multi-layer graph structure that enables sub-linear approximate nearest-neighbor search.

High-level idea:

Vectors are stored as nodes in a layered graph
Upper layers provide long-range navigation
Lower layers refine local neighborhoods
Search starts from an entry point at the top layer and greedily descends

This implementation focuses on clarity and correctness rather than aggressive optimization, closely following the ideas from the original HNSW paper.

Reference: Yu. A. Malkov and D. A. Yashunin, "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. [arXiv:1603.09320]

Data Model

Each vector has:
- A unique integer ID
- A fixed-dimension float embedding
All embeddings in an index share the same dimensionality
(Future work) Metadata can be attached to vectors for filtered search

API Overview

VecDBEngine Methods

Method	Description
`create_index(dim, M, efConstruction)`	Create a new HNSW index
`insert(id, embedding)`	Insert a vector
`search(query, k, efSearch)`	Single query search
`batch_search(queries, k, efSearch)`	Bulk query search
`save_db(db_name)`	Save index to database root
`load_db(db_name)`	Load index from database root
`save_index_path(path)`	Save index to absolute path
`load_index_path(path)`	Load index from absolute path
`set_db_root(root)`	Set database root directory
`has_index()`	Check if index is loaded

Batch Search

batch_search executes multiple queries under a single shared lock, amortizing synchronization overhead. This is useful for bulk query workloads.

Concurrency Model

Implemented using std::shared_mutex
Multiple concurrent readers (search, batch_search, has_index)
Exclusive writers (insert, create, load, save)

This guarantees:

No data races
No partial visibility of index state
Deterministic behavior under concurrency

Operation-level Atomicity

Each write operation is atomic and isolated
Reads observe either the state before or after a write
No partial updates are ever visible

Persistence Design

The index lives in memory during execution
Persistence is explicit via save/load operations
Index state is serialized to disk as binary files (meta.bin + nodes.bin)
On startup, the engine can load an existing index

This keeps durability simple and predictable while avoiding the complexity of WAL or recovery protocols.

CLI

A minimal CLI is provided for demonstration and testing.

Example usage:

vecdb create wiki 128
vecdb insert wiki 1 0.1,0.2,0.3,...
vecdb search wiki 0.1,0.2,0.3,... 5 50
vecdb batch-search wiki queries.txt 5 50
vecdb save wiki
vecdb load wiki

The CLI is intentionally lightweight and delegates all logic to the engine.

Testing

correctness Compares HNSW search results against brute-force ground truth. Also validates save/load persistence.
engine_sanity Tests VecDBEngine API: create, insert, search, save, and load operations.
concurrency_test Spawns multiple reader and writer threads to validate thread safety and measure contention overhead.
benchmark Measures search latency for brute-force vs HNSW across different dataset sizes.

Benchmark

Performance comparison of brute-force search vs HNSW (dim=128, k=10, 100 queries, efSearch=100):

Vectors	Brute Force (ms)	HNSW (ms)	Speedup
1,000	43.10	69.37	0.62x
5,000	215.41	113.35	1.90x
10,000	432.23	125.12	3.45x

At small dataset sizes, HNSW's graph construction overhead exceeds the benefit. As the dataset grows, HNSW's sub-linear search demonstrates clear speedups over linear scan.

(Results from Apple M3, your numbers may vary.)

Access Patterns: OLAP vs OLTP

Current Design: Read-Optimized (OLAP-friendly)

The current implementation prioritizes read-heavy workloads:

Multiple readers can execute searches in parallel (shared_lock)
batch_search amortizes lock acquisition for bulk queries
Writers are serialized and block readers during graph updates

This aligns well with typical vector search use cases: embeddings are generated offline (or infrequently), and the system serves many concurrent search requests.

The OLTP Challenge

HNSW is not optimized for high-frequency small writes:

Each insert traverses the graph to find optimal neighbors
Neighbor connections are updated, potentially affecting multiple nodes
Writers hold an exclusive lock, blocking all concurrent readers

For workloads with frequent inserts (e.g., real-time embedding ingestion), this becomes a bottleneck.

Potential Improvement: Dual-Index Architecture

A common solution in production vector databases (e.g., Milvus, Pinecone) is a dual-index approach:

Writes → Append Buffer (fast) → [Background Merge] → HNSW Index
                ↓                                         ↓
            Reads query both (buffer + index)

Write path: New vectors append to an in-memory buffer (O(1), no graph update)
Read path: Query both the buffer (brute-force scan) and the HNSW index, merge results
Background merge: Periodically flush the buffer into the HNSW index

This would provide:

Fast writes (no blocking graph traversal)
Still-fast reads (small buffer scan + HNSW lookup)
Eventually consistent index state

This extension is not implemented but represents a natural evolution for OLTP-style workloads.

Trade-offs & Future Improvements

Known limitations

In-memory only (no memory-mapped storage)
No delete or update operations
No metadata filtering (yet)
No multi-process coordination
No WAL or crash recovery

Possible extensions

Metadata-aware search
Incremental persistence / WAL
Memory-mapped index
Multi-process server mode (IPC)
Distributed indexing / sharding

Summary

ntvecdb focuses on core vector database concepts:

scalable ANN search
concurrency control
persistence
clear separation of concerns

The project intentionally avoids overengineering and prioritizes correctness, clarity, and explainability.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
include		include
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ntvecdb — A Minimal Vector Database with HNSW

Features

Quick Start

Prerequisites

Build

Run Tests

Run CLI

Architecture Overview

Key design choices

HNSW Index (High Level)

Data Model

API Overview

VecDBEngine Methods

Batch Search

Concurrency Model

Operation-level Atomicity

Persistence Design

CLI

Testing

Benchmark

Access Patterns: OLAP vs OLTP

Current Design: Read-Optimized (OLAP-friendly)

The OLTP Challenge

Potential Improvement: Dual-Index Architecture

Trade-offs & Future Improvements

Known limitations

Possible extensions

Summary

About

Uh oh!

Releases

Packages

Languages

ntrehan/ntvecdb

Folders and files

Latest commit

History

Repository files navigation

ntvecdb — A Minimal Vector Database with HNSW

Features

Quick Start

Prerequisites

Build

Run Tests

Run CLI

Architecture Overview

Key design choices

HNSW Index (High Level)

Data Model

API Overview

VecDBEngine Methods

Batch Search

Concurrency Model

Operation-level Atomicity

Persistence Design

CLI

Testing

Benchmark

Access Patterns: OLAP vs OLTP

Current Design: Read-Optimized (OLAP-friendly)

The OLTP Challenge

Potential Improvement: Dual-Index Architecture

Trade-offs & Future Improvements

Known limitations

Possible extensions

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages