MLX-powered vector search: mlx-turbovec-swift #421

joelnishanth · 2026-06-10T20:09:36Z

joelnishanth
Jun 10, 2026

mlx-turbovec-swift

We built a native Swift implementation of TurboQuant (ICLR 2026) — a data-oblivious vector quantizer for on-device semantic search — using MLX for GPU acceleration.

Repo: https://github.com/offlyn-ai/mlx-turbovec-swift

How MLX is used

TurboVec uses a hybrid MLX + Accelerate architecture:

Operation	Backend	Why
Batch vector rotation (`[N, dim] @ R^T`)	MLX GPU matmul	10×+ faster than per-vector CPU for large batches
Batch L2 normalization	MLX GPU	Parallel across vectors
TQ+ calibration (quantile fitting)	MLX GPU sort	Column-wise percentile on rotated batch
Query rotation (single vector)	MLX GPU matmul	Low-latency search prep
LUT-based scoring + top-k	Accelerate SIMD	Bit-packing + sequential memory access is CPU-optimal

Rotation matrix generation uses LAPACK QR (one-time, deterministic, persistence-compatible). The matrix is then transferred to an MLXArray for all runtime matmuls.

// Enable GPU at app startup
MLXBackend.enableGPU()

let index = TurboQuantIndex(dim: 768, bitWidth: .four)
try index.add(embeddings)          // MLX batch rotate + quantize
let hits = try index.search(query: q, k: 10)  // MLX query rotate + CPU LUT search

Compression & performance

768-dim Float32 embeddings compress from 3,072 bytes → 384 bytes (4-bit) or 192 bytes (2-bit). That's 8–16× compression with >90% recall@10 vs brute-force cosine similarity.

The benchmark runner outputs structured JSON designed for LLM parsing — hardware info, latency stats, recall, compression ratios.

Use cases

On-device RAG — pair with MLX Embedders (BGE, EmbeddingGemma) for air-gapped semantic search
Offline AI apps — no cloud, no network; pure local vector index with .tv/.tvim persistence
Filtered search — allowlist-filtered top-k without over-fetching

Would love feedback from the MLX community on the GPU/CPU split and whether a custom Metal kernel for LUT scoring would be worth pursuing.

— Joel Nishanth · offlyn.AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLX-powered vector search: mlx-turbovec-swift #421

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

MLX-powered vector search: mlx-turbovec-swift #421

Uh oh!

joelnishanth Jun 10, 2026

mlx-turbovec-swift

How MLX is used

Compression & performance

Use cases

Replies: 0 comments

joelnishanth
Jun 10, 2026