Release Zaxy 2.2.0 — Evidence-Bounded Vector Search · syndicalt/zaxy

Re-engineered the embedded ANN vector path end to end, every change backed
by the internal vector-scale lane
(docs/research/artifacts/ann-2026-06/):
- Query path: HNSW candidates are oversampled and reranked with exact
  float64 scores from the resident entity vectors, fixing the measured
  recall deficit — float32 near-tie flips at the shadow table's precision
  boundary, not HNSW search quality (strict recall@10 at dim 64: 0.9062
  at 10^4 / 0.8969 at 10^5 before the rerank, 1.0 at 10^5 in both G4
  passes) — and each (session, version, dimension) scope queries its own
  shadow table directly: no per-query projected graph or predicate scan,
  which dominated filtered query latency at 10^5 vectors.
- Build path: full index rebuilds load a fresh generation table via bulk
  COPY and build the HNSW index after the load, then swap atomically —
  ~13x faster at 10^5 vectors (1,180s → 92s); small digest-verified
  append-only deltas ride live-index inserts instead.
- Frozen-runtime hardening: three defects of the pinned Kuzu 0.11.3 (the
  final upstream release) are designed around: COPY FROM an in-memory
  Arrow table with a fixed-size-list column segfaults, so bulk loads
  round-trip through a parquet tempfile; DROP_VECTOR_INDEX leaves
  un-checkpointed index metadata (kuzu#6040) and mutating a live index in
  place silently breaks subsequent direct-table searches, so rebuilds are
  drop-free generation swaps whose superseded tables are emptied, never
  dropped; and an unbound $param segfaults instead of raising, so every
  query runs through a single execution choke point that rejects unbound
  parameters.
- Measurement: lane recall is judged tie-aware (standard ann-benchmarks
  tie handling) with strict identity recall always reported alongside,
  because the hash corpus at dimension 1536 has a measured median of 210
  corpus vectors exactly tied with the true top-10; a realistic gaussian
  distribution variant gates the high-dimension posture (ANN recall@10
  0.9844/0.975 strict at 10^4 × dim 1536 across two runs).
- Raised the VECTOR_ANN_EFS default from 200 to 400: the gaussian
  efs sweep at dimension 1536 measured recall@10 of 0.8531 at 200, 0.9875
  at 400, and 1.0 at 800, with ~2ms of added p50 per step.
Lowered VECTOR_ANN_THRESHOLD from 1000000 to 100000 and made ANN
engagement two-clause within a new dimension ceiling (gate G4): scopes at
or below VECTOR_ANN_MAX_DIMENSION (default 64) engage at the count
threshold or when their exact float64 matrix would exceed the 256 MiB
vector cache byte budget (above 524,288 rows at dim 64). The count default
is backed by two consecutive lane passes at exactly 10^5 vectors (dim 64):
recall@10 1.0 on both metrics, ANN p50 at-or-better than exact in-run,
resident bytes improved. The ceiling is that evidence's measured envelope:
at dim 1536/50k gaussian, HNSW recall@10 was 0.6 at efs 400 (0.6344 on a
rerun; 0.8438 at efs 800 with worse-than-exact latency) while exact
answered in 22ms p50,
so high-dimension scopes stay on exact (or opted-in
int8) search regardless of size. Opt out with
VECTOR_ANN_THRESHOLD=1000000 and/or
VECTOR_ANN_BYTE_BUDGET_ENGAGEMENT=false; an explicit
VECTOR_QUANTIZATION=int8 keeps its precedence below the count threshold.
memory_capabilities reports the effective engagement rule. See
docs/migration.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zaxy 2.2.0 — Evidence-Bounded Vector Search

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!