Skip to content

RFC: index-resident metadata prefilter for vchordrq via INCLUDE columns #457

@spenc-r

Description

@spenc-r

Background

We run vector search over a ~6.2 M-row corpus of vector(1024)
candidates on PostgreSQL 18 with VectorChord. Most of our production
queries are cohort-shaped: a vector ORDER BY combined with several
mandatory equality / range / bitmask predicates over fixed-width
columns (feed, status, visibility, geo, time bucket, etc.). The
predicates typically cut the working set to 5–30 % of rows, but the
ORDER BY still needs ANN over the survivors.

Stock vchordrq.prefilter (the existing prefilter from #247, thanks!)
already helps by letting us avoid quicksort and scan the index. But
it's a heap-side prefilter: the IVF scan returns candidates, the
index fetches each candidate's full f32 vector, then the heap row is
checked, and rejected candidates' vector reads are wasted I/O.

For us this matters most on memory-constrained instances. With a 33 GB
vchord index that doesn't fit in shared_buffers, every wasted vector
fetch is a random page fault on the index, and at the cohort
selectivities we see (~10 %), that means 90 % of the IVF candidates
get read, scored, and discarded.

Proposed feature

Allow INCLUDE columns of bigint type on vchordrq indexes, store their
values in the H0 leaf tuples, and add a candidate-filter callback that
runs before the vector page fetch. Introduce a GUC
vchordrq.metadata_prefilter with three modes: off, reject_only
(definitely-false predicates short-circuit further work), and
covered_skip_heap (lossless predicates can also skip the heap
recheck).

In the IVF scan: for each H0 candidate, evaluate the compiled metadata
predicates against the candidate's INCLUDE values; if any predicate
proves "definitely false", skip the candidate without faulting in its
vector. Otherwise proceed as today (vector fetch + rerank + heap
recheck if the predicate is lossy).

Prototype + numbers

We have a working prototype in a private fork (it carries some
company-specific tooling and configuration in its history that we can't
publish). It implements all of the above. Happy to share the repo
privately with a maintainer if that would help review; we'd rather not
upstream-PR off that branch directly anyway, since it hardcodes our
domain's column names and would need to be generalized first.

Two boxes, two bottleneck regimes, identity recall verified vs the
no-metadata path on 20 random queries:

Clean attribution test (32 GiB / 4 vCPU, CPU-bound).
Stock 1.0.0 vs the prototype's metadata path on the same hardware,
with vchordrq.prefilter=on, enable_sort=off, and read_stream IO
applied to both runs (the only delta is the metadata feature itself
plus the wide-INCLUDE index):

config peak QPS p95 at peak
stock 1.0.0, plain index, GUCs tuned 5.68 (c=8) 2 324 ms
prototype, wide-INCLUDE + metadata_prefilter=reject_only 6.84 (c=4) 836 ms

That's +20 % peak QPS, −64 % p95. The QPS gain is modest because
this box is CPU-bound at c≥4 with only 4 vCPUs. The latency gain is
where the metadata reject earns its keep: sub-second p95 at the same
concurrency where stock-tuned is still over 2 seconds.

Memory-bound regime (16 vCPU host, 16 GiB cgroup cap). Same
prototype, toggling whether the metadata predicates are emitted:

config peak QPS @ c=16 p95 @ c=16
prototype, no metadata predicates emitted 11.86 2 635 ms
prototype, wide-metadata reject 37.79 732 ms

+218 % peak QPS, −72 % p95 under memory pressure. The gain is much
larger here because rejecting a candidate before its f32 vector page
faults in is exactly the right thing to do when shared_buffers can't
hold the index. The 33 GB index on this corpus is dominated by per-row
f32 vector copies on VectorTuple pages. Those copies exist solely so
rerank can avoid a heap fetch, and they're exactly what the metadata
reject lets us avoid reading in the first place.

A side note from the attribution work: the prototype's non-metadata
improvements (read_stream IO, prefetcher batching, search-loop
refactor) produced essentially zero throughput delta over stock 1.0.0
with the same GUCs applied. The metadata-prefilter feature is the only
novel contribution worth upstreaming.

Open questions for the maintainers

A couple of decisions are upstream-shaped and we'd value your steer
before writing the public PR:

  1. How users mark "lossless" vs "lossy" INCLUDE columns.
    Some metadata predicates prove the heap row matches (e.g. a
    stored bitmask passing (col & mask) = mask is exact); some only
    accelerate the reject case (e.g. a 64-bit hash where a non-match
    is conclusive but a match is "probably matches, recheck the heap").
    Lossless predicates let us skip the heap recheck; lossy ones can't.
    The most natural way to surface this in PG would be per-column
    opclass options. We'd define two bigint opclasses,
    bigint_lossless_ops and bigint_lossy_ops, and use them on the
    INCLUDE columns:

    CREATE INDEX … USING vchordrq (vec vector_ip_ops)
      INCLUDE (flags_col bigint_lossless_ops,
               hash_col  bigint_lossy_ops);

    That keeps the SQL declarative and uses standard PG plumbing. Any
    pushback on that approach, or a different shape you'd prefer?

  2. Tuple format change. The prototype adds a versioned metadata
    tail to FrozenTuple and AppendableTuple
    (METADATA_TAIL_VERSION = 2). For indexes built without INCLUDE
    columns, attr_count = 0 ⇒ no tail emitted ⇒ existing 1.x indexes
    read identically. Is that backward-compat shape acceptable, or
    would you rather the entire feature be gated behind a new index
    reloption / opclass so the on-disk format is unchanged for existing
    users?

  3. Tests. We'd plan to add ≥6 sqllogictest files in
    tests/vchordrq/ covering eq / IN / range / bitmask / NULL / mixed
    with ORDER BY, plus an identity-recall test against the
    no-metadata path. Anything else you'd want before review?

If this is on your roadmap or you'd rather take a different shape,
please let us know. We'd rather align early than rebuild later.

Acknowledgements

The prototype was implemented by @SamORichards (cc'd). I (@spenc-r) am happy to do the upstream-generalization work and own the
public PR; planning to add Sam as Co-authored-by: on each commit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions