Native hybrid_search for the 6 remaining hybrid-capable backends

## Native `hybrid_search` for the 6 remaining hybrid-capable backends

PR #16 landed `SupportsHybrid` + the universal client-side BM25 + RRF fallback, and wired the 3 backends whose adapters already provision text-side indexing (weaviate, elasticsearch, redis) as `SupportsHybrid`.

The 6 other backends `data/providers.yaml` marks `hybrid_search: true` still go through the client-side fallback — correct but O(N) over the collection. Each needs its own infra wiring before its native path can be turned on. They're listed below in roughly easiest-first order.

### lancedb
- Add `table.create_fts_index("text")` to `create_collection` (or lazy-create on first hybrid call).
- Define `_lexical_query` calling `table.search(text, query_type="fts")`, or use the unified `table.search(text, query_type="hybrid")` and skip RRF orchestration on this path.

### mongodb (Atlas)
- Auto-create a `$search` (Atlas Search) index alongside the existing `$vectorSearch` index in `create_collection` — analogous to how `_search_index_create` already provisions the vector index.
- Define `_lexical_query` running a `$search` aggregation pipeline.
- Or use the full `$rankFusion` aggregation stage to do dense + lexical + RRF in one round-trip.

### qdrant
- Add a sparse-vector config to `create_collection` and a sparse-vector encoder (e.g. SPLADE, BM25 sparse) at write time.
- Define `_lexical_query` issuing a sparse-vector `query_points` call, OR use `query_points(prefetch=[dense, sparse], query=FusionQuery(fusion=Fusion.RRF))` for native fusion.

### milvus (≥ 2.5)
- Add a `BM25` function field to the collection schema in `create_collection`.
- Define `_lexical_query` issuing a `client.search` call against the BM25 field.
- Or use `client.hybrid_search([AnnSearchRequest(...), AnnSearchRequest(...)], ranker=RRFRanker())`.

### pinecone
- Generate sparse vectors at write time (BM25 / SPLADE / pinecone-text).
- Either upsert with both `values` and `sparse_values`, then query with both — or use Pinecone's hosted hybrid endpoint.

### turbopuffer
- Define a BM25 attribute in the namespace schema and populate it at write time.
- Use `query(rank_by=("text", "BM25", text))` for the lexical side, fused via `rank_by` with the dense rank.

### Refs

Follow-up to #16, refs #11. Each backend is its own clean piece of work — feel free to split into separate PRs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native hybrid_search for the 6 remaining hybrid-capable backends #17

Native `hybrid_search` for the 6 remaining hybrid-capable backends

lancedb

mongodb (Atlas)

qdrant

milvus (≥ 2.5)

pinecone

turbopuffer

Refs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Native hybrid_search for the 6 remaining hybrid-capable backends #17

Description

Native hybrid_search for the 6 remaining hybrid-capable backends

lancedb

mongodb (Atlas)

qdrant

milvus (≥ 2.5)

pinecone

turbopuffer

Refs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Native `hybrid_search` for the 6 remaining hybrid-capable backends