Features

LogosDB is a fast semantic vector database written in C/C++ that provides approximate nearest-neighbor search over embedding vectors with associated text metadata.

Authors: Jose (@jose-compu)

Features

Vectors and metadata are stored as flat binary files, memory-mapped for zero-copy reads.
Approximate nearest-neighbor search via HNSW (hnswlib), O(log n) query time.
Each vector row carries optional text and ISO 8601 timestamp metadata (JSONL sidecar).
The basic operations are Put(embedding, text, timestamp) and Search(query, top_k).
Bulk vector access for direct tensor construction (e.g. loading into GPU memory).
Thread-safe writes via internal mutex; concurrent reads are lock-free.
Crash recovery: HNSW index is automatically backfilled from the append-only vector store on open.
Scales to millions of vectors.

Documentation

The public interface is in include/logosdb/logosdb.h. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Guide to header files:

include/logosdb/logosdb.h: Main interface to the DB. Start here. Contains:
- C API with opaque handles and errptr convention (RocksDB/LevelDB style)
- C++ convenience wrapper (logosdb::DB) with RAII and exceptions
- logosdb::Options for HNSW tuning parameters
- logosdb::SearchHit result struct

Limitations

This is not a general-purpose vector database. It is purpose-built for embedding-based memory retrieval in LLM inference (funes.cpp).
Only a single process (possibly multi-threaded) can access a particular database at a time.
There is no client-server support built into the library. An application that needs such support will have to wrap their own server around the library.
Vectors must be L2-normalized before insertion (inner-product similarity is used).
Embedding generation is external — the caller provides pre-computed float vectors.

Getting the Source

git clone --recurse-submodules <repository-url>
cd logosdb

Building

This project supports CMake out of the box.

Quick start:

mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && cmake --build .

This builds:

Target	Description
`logosdb`	Static library (`liblogosdb.a`)
`logosdb-cli`	Command-line tool for put, search, info
`logosdb-bench`	Benchmark: HNSW vs brute-force, with ChromaDB comparison
`logosdb-test`	Unit tests

Usage (C API)

#include <logosdb/logosdb.h>

char *err = NULL;
logosdb_options_t *opts = logosdb_options_create();
logosdb_options_set_dim(opts, 2048);

logosdb_t *db = logosdb_open("/tmp/mydb", opts, &err);
logosdb_options_destroy(opts);

float vec[2048] = { /* ... */ };
logosdb_put(db, vec, 2048, "My commute is 42 minutes",
            "2025-06-25T10:00:00Z", &err);

logosdb_search_result_t *res = logosdb_search(db, query_vec, 2048, 5, &err);
for (int i = 0; i < logosdb_result_count(res); i++) {
    printf("#%d score=%.4f text=%s\n", i,
           logosdb_result_score(res, i),
           logosdb_result_text(res, i));
}
logosdb_result_free(res);
logosdb_close(db);

Usage (C++ wrapper)

#include <logosdb/logosdb.h>

logosdb::DB db("/tmp/mydb", {.dim = 2048});
db.put(embedding, "My commute is 42 minutes", "2025-06-25T10:00:00Z");

auto results = db.search(query, 5);
for (auto &r : results) {
    printf("id=%llu score=%.4f text=%s\n", r.id, r.score, r.text.c_str());
}

CLI

# Database info
logosdb-cli info /tmp/mydb

# Search with a binary query vector file
logosdb-cli search /tmp/mydb --query-file q.bin --top-k 5

Performance

Here is a performance report from the included logosdb-bench program. The results are somewhat noisy, but should be enough to get a ballpark performance estimate.

Setup

We use databases with 1K, 10K, and 100K vectors. Each vector has 2048 dimensions (matching typical LLM embedding sizes). Vectors are L2-normalized random unit vectors.

LogosDB:    version 0.1.0
CPU:        Apple M-series (ARM64)
Dim:        2048
HNSW M:     16, ef_construction: 200, ef_search: 50

Write performance

put (1K vectors):    ~50 µs/op   (~20,000 inserts/sec)
put (10K vectors):   ~80 µs/op   (~12,500 inserts/sec)
put (100K vectors):  ~120 µs/op  (~8,300 inserts/sec)

Each "op" above corresponds to a write of a single vector + metadata + HNSW index update.

Search performance

HNSW top-5 (1K):     ~0.1 ms/query
HNSW top-5 (10K):    ~0.3 ms/query
HNSW top-5 (100K):   ~1.2 ms/query

Brute-force top-5 (1K):    ~0.3 ms/query
Brute-force top-5 (10K):   ~2.5 ms/query
Brute-force top-5 (100K):  ~25 ms/query

HNSW maintains sub-linear scaling while brute-force grows linearly with database size. At 100K vectors, HNSW is roughly 20x faster.

Benchmark vs ChromaDB

logosdb-bench --dim 2048 --counts 1000,10000,100000

Metric	ChromaDB	LogosDB
Language	Python + C (hnswlib)	Pure C/C++
Search algorithm	HNSW	HNSW (same hnswlib)
Storage	SQLite + Parquet	Binary mmap + JSONL
Startup overhead	Python runtime + deps	Zero (linked library)
Embedding generation	Built-in (Sentence Transformers)	External (caller provides vectors)
Target use case	General-purpose vector store	Embedded LLM inference memory
Search latency (100K, dim=2048)	~5-10 ms	~1-3 ms
Memory footprint (100K, dim=2048)	~1.5 GB (Python + SQLite)	~800 MB (mmap)
Cold start	~2-5 s (Python imports)	<10 ms
Dependencies	Python, NumPy, SQLite, hnswlib	hnswlib (header-only, vendored)

LogosDB uses the same HNSW implementation as ChromaDB (hnswlib) but eliminates Python overhead, SQLite serialization, and Sentence Transformer coupling. The result is a leaner library optimized for the single use case of embedded semantic memory for LLM inference.

Repository contents

include/logosdb/logosdb.h     Public C/C++ API (start here)
src/logosdb.cpp               Core engine: wires storage + index + metadata
src/storage.h / storage.cpp   Fixed-stride binary vector file with mmap
src/metadata.h / metadata.cpp Append-only JSONL text + timestamp store
src/hnsw_index.h / .cpp       Thin wrapper around hnswlib
tools/logosdb-cli.cpp         Command-line interface
tools/logosdb-bench.cpp       Benchmark tool
tests/test_basic.cpp          Unit tests (76 checks)
third_party/hnswlib/          Vendored hnswlib (header-only)
CHANGELOG                     Release history
LICENSE                       MIT license text

License

MIT — see LICENSE for the full text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Documentation

Limitations

Getting the Source

Building

Usage (C API)

Usage (C++ wrapper)

CLI

Performance

Setup

Write performance

Search performance

Benchmark vs ChromaDB

Repository contents

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
include/logosdb		include/logosdb
src		src
tests		tests
third_party/hnswlib		third_party/hnswlib
tools		tools
.gitignore		.gitignore
CHANGELOG		CHANGELOG
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Features

Documentation

Limitations

Getting the Source

Building

Usage (C API)

Usage (C++ wrapper)

CLI

Performance

Setup

Write performance

Search performance

Benchmark vs ChromaDB

Repository contents

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages