Add user doc page for SQL vector search#5363
Add user doc page for SQL vector search#5363mengweieric wants to merge 38 commits intoopensearch-project:feature/vector-search-p0from
Conversation
…ch-project#5318) * [Feature] Add table function relation to SQL grammar for vectorSearch() Add table function relation support to the SQL parser: - New `tableFunctionRelation` alternative in `relation` grammar rule - Named argument syntax: `key=value` (e.g., table='index', field='vec') - Alias is required by grammar (FROM func(...) AS alias) - AstBuilder emits existing TableFunction + SubqueryAlias AST nodes - 3 parser unit tests: basic parse, with WHERE/ORDER BY/LIMIT, alias required This is a pure grammar change — no execution support yet. Queries will parse successfully but fail at the Analyzer with "unsupported function". Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Address review feedback on table function relation grammar 1. Canonicalize argument names at parser boundary: unquoteIdentifier + toLowerCase(Locale.ROOT) in visitTableFunctionRelation so FIELD='x' and `field`='x' both produce argName="field" 2. Make AS keyword optional (AS? alias) for consistency with tableAsRelation and subqueryAsRelation grammar rules 3. Strengthen test coverage: - Full structural AST assertion for WHERE + ORDER BY + LIMIT (verifies Sort, Limit, Filter nodes, not just toString) - Argument reorder test proves names resolve by name not position - Case canonicalization test (TABLE= → table=) - Alias-without-AS test (FROM func(...) v) Signed-off-by: Eric Wei <mengwei.eric@gmail.com> * Apply spotless formatting Signed-off-by: Eric Wei <mengwei.eric@gmail.com> --------- Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Maps knn_vector fields to ExprCoreType.ARRAY so they appear in DESCRIBE output and can be referenced in projections. This is a visibility shim — not a full vector type. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
VectorSearchIndex.createScanBuilder() needs to construct an OpenSearchIndexScanBuilder with a custom VectorSearchQueryBuilder delegate. The existing constructor was protected (test-only). Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Introduces the core execution pipeline for vectorsearch(): - VectorSearchTableFunctionResolver: registers vectorsearch with 4 STRING args - VectorSearchTableFunctionImplementation: parses named args, vector literal, options string, validates search mode (k/max_distance/min_score) - VectorSearchIndex: extends OpenSearchIndex with knn query seeding, score tracking, and WrapperQueryBuilder DSL construction - VectorSearchQueryBuilder: keeps knn in must (scoring) context, WHERE filters in filter (non-scoring) context Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Override getFunctions() to expose vectorsearch() table function to the query analysis pipeline. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Verifies knn query is placed in scoring (must) context, not wrapped in bool.filter when no WHERE clause is present. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Add pushDownFilter() unit test asserting knn stays in bool.must (scoring) and WHERE predicate goes to bool.filter (non-scoring) - Add option key allowlist (k, max_distance, min_score) to reject unknown/unsupported keys before they reach DSL generation - Add field name validation to reject characters that could corrupt the WrapperQueryBuilder JSON (allows alphanumeric, dots, underscores, hyphens) - Add named-arg type guard to reject non-NamedArgumentExpression args early with a clear error message Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Parse k as integer, max_distance and min_score as double before they reach buildKnnQuery(). Rejects non-numeric and non-finite values with clear errors. This closes the residual JSON-injection path through option values without requiring full XContent migration. Also fixes toString() to be consistent with the named-arg guard (no longer blindly casts to NamedArgumentExpression). Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- parseOptions: reject malformed segments and duplicate keys - parseVector: wrap errors in ExpressionEvaluationException, reject non-finite floats (Infinity, NaN) - VectorSearchIndex: default requestedTotalSize to k via pushDownLimitToRequestTotal so queries without LIMIT return k results - Add 5 new tests: malformed option, duplicate key, empty vector, malformed vector component, non-finite vector component Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- validateNamedArgs() now rejects null/empty arg names defensively, closing a potential NPE if the shared table-function path is later wired into PPL - OpenSearchStorageEngineTest uses contains-check instead of exact collection size assertion - Add testNullArgNameThrows test Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Remove unused VECTOR_OPTION constant from VectorSearchIndex - Clarify buildKnnQuery() comment: quoted fallback is for forward compatibility, all P0 values are already canonicalized as numeric - Rename testMissingSearchModeOptionThrows to testUnknownOptionKeyOnlyThrows to match what it actually tests Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Enforce exactly one of k, max_distance, or min_score - Validate k is in [1, 10000] range - Add 6 tests: mutual exclusivity (3 combos), k too small, k too large, k boundary values (1 and 10000) Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
VectorSearchQueryBuilder now accepts options map and rejects pushDownLimit when LIMIT exceeds k. Radial modes (max_distance, min_score) have no LIMIT restriction. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Create VectorSearchIndexTest: 7 tests covering buildKnnQueryJson() for top-k, max_distance, min_score, nested fields, multi-element and single-element vectors, numeric option rendering - Add edge case tests to VectorSearchTableFunctionImplementationTest: NaN vector component, empty option key/value, negative k, NaN for max_distance and min_score (6 new tests) - Add VectorSearchQueryBuilderTest: min_score radial mode LIMIT, pushDownSort delegation to parent (2 new tests) - Extract buildKnnQueryJson() as package-private for direct testing Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Test too-many (5) and zero arguments paths in VectorSearchTableFunctionResolver to complement existing too-few (2) test. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Cap radial mode (max_distance/min_score) results at maxResultWindow to prevent unbounded result sets - Reject ORDER BY on non-_score fields and _score ASC in vectorSearch since knn results are naturally sorted by _score DESC - Add 12 integration tests: 4 _explain DSL shape verification tests and 8 validation error path tests Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Add multi-sort expression test: ORDER BY _score DESC, name ASC correctly rejects the non-_score field (VectorSearchQueryBuilderTest) - Add case-insensitive argument name lookup test to verify TABLE='x' resolves same as table='x' (Implementation test) - Add non-numeric option fallback test: verifies string options are quoted in JSON output (VectorSearchIndexTest) - Add 4 integration tests: ORDER BY _score DESC succeeds, ORDER BY non-score rejects, ORDER BY _score ASC rejects, LIMIT within k succeeds (VectorSearchIT, now 16 tests) Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
The base OpenSearchIndexScanQueryBuilder.pushDownSort() pushes sort.getCount() as a limit when non-zero. Our override validated _score DESC and returned true, but did not preserve this contract. SQL always sets count=0, so this was not reachable today, but PPL or future callers may set a non-zero count to combine sort+limit in one LogicalSort node. Preserve the behavior defensively. Add focused test: LogicalSort(count=7) with _score DESC verifies the count is pushed down as request size. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Unit test: compound AND predicate survives pushdown into bool.filter - Integration test: compound WHERE (term + range) produces bool query - Integration test: radial max_distance with WHERE produces bool query Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
pushDownSort() called requestBuilder.pushDownLimit() directly, bypassing the LIMIT > k guard in pushDownLimit(). Extract validateLimitWithinK() helper and call it from both paths so the invariant holds when PPL or future callers set a non-zero sort count. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Move all explainQuery()-based DSL shape tests into a dedicated VectorSearchExplainIT suite. VectorSearchIT now contains only validation and error-path tests. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…SearchIndex Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…on in VectorSearchQueryBuilder Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…fficient mode Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…matting Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…ion; drop subquery workaround; note case-sensitive option keys Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
|
Is it our plan to convert all the |
dai-chen
left a comment
There was a problem hiding this comment.
Just wonder is this feature experimental or not? I see we mention preview in the doc.
RyanL1997
left a comment
There was a problem hiding this comment.
Hi @mengweieric , thanks for the doc change. Left 2 comments which are actually being carried over from some previous PRs you worked on in this feature branch.
| k-NN query (``knn.filter``), enabling pre-filtering during the ANN search. | ||
| See the `k-NN filtering guide <https://docs.opensearch.org/latest/vector-search/filter-search-knn/efficient-knn-filtering/>`_ | ||
| for engine and method requirements. | ||
|
|
There was a problem hiding this comment.
The in-memory fallback described here doesn't match the current implementation. In #5331, VectorSearchQueryBuilder.pushDownFilter() always returns true, so the optimizer always removes the LogicalFilter node from the plan. And FilterQueryBuilder.build() never signals "can't translate" — for expressions without a native Lucene mapping, it builds a ScriptQueryBuilder (painless script evaluated on the OpenSearch side), not an in-memory fallback in the SQL engine. If even the script path fails, it throws — the query fails, it doesn't fall back.
| Limitations | ||
| =========== | ||
|
|
||
| The following are not part of the ``vectorSearch()`` preview contract and |
There was a problem hiding this comment.
Minor coordination note: #5362 (same feature branch, also open) adds active rejection for GROUP BY / aggregations via VectorSearchIndexScanBuilder.pushDownAggregation. If that PR merges alongside this one, "are not validated" would be inaccurate — they'd be actively rejected with a descriptive error. May want to update to "are rejected" or "are not supported" once #5362 lands.
9aa62fc to
fa444fe
Compare
Summary
Add a user-facing reference page for the
vectorSearch()SQL table function, covering syntax, arguments, supported option keys (k,max_distance,min_score,filter_type), filter placement semantics, scoring/sort/limit rules, and preview limitations.Changes
docs/user/dql/vector-search.rstwith Introduction, Description, Arguments, Syntax, five examples (top-k, radialmax_distance, radialmin_score, implicit WHERE pushdown, explicitfilter_type=efficient), Filtering section, Scoring/Sorting/Limits, and Limitations.Test plan
./gradlew spotlessCheckpassesdocs/category.json— reference page only, no runnable doctest (matchesbeyond/fulltext.rstprecedent)VectorSearchIT(quotedvector='[...]', quotedoption='k=5,filter_type=efficient')VectorSearchQueryBuilderbehavior: omittedfilter_typeattempts post-style pushdown and falls back to in-memory when the WHERE cannot be translated; explicitfilter_type(postorefficient) requires a translatable WHERE or failsALLOWED_OPTION_KEYSset-membership check)