MLX-powered vector search: mlx-turbovec-swift #421
joelnishanth
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
mlx-turbovec-swift
We built a native Swift implementation of TurboQuant (ICLR 2026) — a data-oblivious vector quantizer for on-device semantic search — using MLX for GPU acceleration.
Repo: https://github.com/offlyn-ai/mlx-turbovec-swift
How MLX is used
TurboVec uses a hybrid MLX + Accelerate architecture:
[N, dim] @ R^T)Rotation matrix generation uses LAPACK QR (one-time, deterministic, persistence-compatible). The matrix is then transferred to an
MLXArrayfor all runtime matmuls.Compression & performance
768-dim Float32 embeddings compress from 3,072 bytes → 384 bytes (4-bit) or 192 bytes (2-bit). That's 8–16× compression with >90% recall@10 vs brute-force cosine similarity.
The benchmark runner outputs structured JSON designed for LLM parsing — hardware info, latency stats, recall, compression ratios.
Use cases
.tv/.tvimpersistenceWould love feedback from the MLX community on the GPU/CPU split and whether a custom Metal kernel for LUT scoring would be worth pursuing.
— Joel Nishanth · offlyn.AI
Beta Was this translation helpful? Give feedback.
All reactions