LogosDB is a fast semantic vector database written in C/C++ that provides approximate nearest-neighbor search over embedding vectors with associated text metadata.
Authors: Jose (@jose-compu)
- Vectors and metadata are stored as flat binary files, memory-mapped for zero-copy reads.
- Approximate nearest-neighbor search via HNSW (hnswlib), O(log n) query time.
- Each vector row carries optional text and ISO 8601 timestamp metadata (JSONL sidecar).
- The basic operations are
Put(embedding, text, timestamp)andSearch(query, top_k). - Bulk vector access for direct tensor construction (e.g. loading into GPU memory).
- Thread-safe writes via internal mutex; concurrent reads are lock-free.
- Crash recovery: HNSW index is automatically backfilled from the append-only vector store on open.
- Scales to millions of vectors.
The public interface is in include/logosdb/logosdb.h. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.
Guide to header files:
- include/logosdb/logosdb.h: Main interface to the DB. Start here. Contains:
- C API with opaque handles and
errptrconvention (RocksDB/LevelDB style) - C++ convenience wrapper (
logosdb::DB) with RAII and exceptions logosdb::Optionsfor HNSW tuning parameterslogosdb::SearchHitresult struct
- C API with opaque handles and
- This is not a general-purpose vector database. It is purpose-built for embedding-based memory retrieval in LLM inference (funes.cpp).
- Only a single process (possibly multi-threaded) can access a particular database at a time.
- There is no client-server support built into the library. An application that needs such support will have to wrap their own server around the library.
- Vectors must be L2-normalized before insertion (inner-product similarity is used).
- Embedding generation is external — the caller provides pre-computed float vectors.
git clone --recurse-submodules <repository-url>
cd logosdbThis project supports CMake out of the box.
Quick start:
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && cmake --build .This builds:
| Target | Description |
|---|---|
logosdb |
Static library (liblogosdb.a) |
logosdb-cli |
Command-line tool for put, search, info |
logosdb-bench |
Benchmark: HNSW vs brute-force, with ChromaDB comparison |
logosdb-test |
Unit tests |
#include <logosdb/logosdb.h>
char *err = NULL;
logosdb_options_t *opts = logosdb_options_create();
logosdb_options_set_dim(opts, 2048);
logosdb_t *db = logosdb_open("/tmp/mydb", opts, &err);
logosdb_options_destroy(opts);
float vec[2048] = { /* ... */ };
logosdb_put(db, vec, 2048, "My commute is 42 minutes",
"2025-06-25T10:00:00Z", &err);
logosdb_search_result_t *res = logosdb_search(db, query_vec, 2048, 5, &err);
for (int i = 0; i < logosdb_result_count(res); i++) {
printf("#%d score=%.4f text=%s\n", i,
logosdb_result_score(res, i),
logosdb_result_text(res, i));
}
logosdb_result_free(res);
logosdb_close(db);#include <logosdb/logosdb.h>
logosdb::DB db("/tmp/mydb", {.dim = 2048});
db.put(embedding, "My commute is 42 minutes", "2025-06-25T10:00:00Z");
auto results = db.search(query, 5);
for (auto &r : results) {
printf("id=%llu score=%.4f text=%s\n", r.id, r.score, r.text.c_str());
}# Database info
logosdb-cli info /tmp/mydb
# Search with a binary query vector file
logosdb-cli search /tmp/mydb --query-file q.bin --top-k 5Here is a performance report from the included logosdb-bench program. The results are somewhat noisy, but should be enough to get a ballpark performance estimate.
We use databases with 1K, 10K, and 100K vectors. Each vector has 2048 dimensions (matching typical LLM embedding sizes). Vectors are L2-normalized random unit vectors.
LogosDB: version 0.1.0
CPU: Apple M-series (ARM64)
Dim: 2048
HNSW M: 16, ef_construction: 200, ef_search: 50
put (1K vectors): ~50 µs/op (~20,000 inserts/sec)
put (10K vectors): ~80 µs/op (~12,500 inserts/sec)
put (100K vectors): ~120 µs/op (~8,300 inserts/sec)
Each "op" above corresponds to a write of a single vector + metadata + HNSW index update.
HNSW top-5 (1K): ~0.1 ms/query
HNSW top-5 (10K): ~0.3 ms/query
HNSW top-5 (100K): ~1.2 ms/query
Brute-force top-5 (1K): ~0.3 ms/query
Brute-force top-5 (10K): ~2.5 ms/query
Brute-force top-5 (100K): ~25 ms/query
HNSW maintains sub-linear scaling while brute-force grows linearly with database size. At 100K vectors, HNSW is roughly 20x faster.
logosdb-bench --dim 2048 --counts 1000,10000,100000| Metric | ChromaDB | LogosDB |
|---|---|---|
| Language | Python + C (hnswlib) | Pure C/C++ |
| Search algorithm | HNSW | HNSW (same hnswlib) |
| Storage | SQLite + Parquet | Binary mmap + JSONL |
| Startup overhead | Python runtime + deps | Zero (linked library) |
| Embedding generation | Built-in (Sentence Transformers) | External (caller provides vectors) |
| Target use case | General-purpose vector store | Embedded LLM inference memory |
| Search latency (100K, dim=2048) | ~5-10 ms | ~1-3 ms |
| Memory footprint (100K, dim=2048) | ~1.5 GB (Python + SQLite) | ~800 MB (mmap) |
| Cold start | ~2-5 s (Python imports) | <10 ms |
| Dependencies | Python, NumPy, SQLite, hnswlib | hnswlib (header-only, vendored) |
LogosDB uses the same HNSW implementation as ChromaDB (hnswlib) but eliminates Python overhead, SQLite serialization, and Sentence Transformer coupling. The result is a leaner library optimized for the single use case of embedded semantic memory for LLM inference.
include/logosdb/logosdb.h Public C/C++ API (start here)
src/logosdb.cpp Core engine: wires storage + index + metadata
src/storage.h / storage.cpp Fixed-stride binary vector file with mmap
src/metadata.h / metadata.cpp Append-only JSONL text + timestamp store
src/hnsw_index.h / .cpp Thin wrapper around hnswlib
tools/logosdb-cli.cpp Command-line interface
tools/logosdb-bench.cpp Benchmark tool
tests/test_basic.cpp Unit tests (76 checks)
third_party/hnswlib/ Vendored hnswlib (header-only)
CHANGELOG Release history
LICENSE MIT license text
MIT — see LICENSE for the full text.