From 7ef526f0ffd2fc34efe43d033b51b3eb45b30cd0 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Tue, 18 Mar 2025 18:59:56 +0100 Subject: [PATCH 1/6] [Search] Add ESQL syntax to semantic and hybrid search tutorials --- .../query-filter/languages/querydsl.md | 2 +- solutions/search/hybrid-semantic-text.md | 39 ++++++++++++++++++- .../semantic-search-semantic-text.md | 39 +++++++++++++++++-- 3 files changed, 73 insertions(+), 7 deletions(-) diff --git a/explore-analyze/query-filter/languages/querydsl.md b/explore-analyze/query-filter/languages/querydsl.md index efbcc4c5a9..8ccc2441ca 100644 --- a/explore-analyze/query-filter/languages/querydsl.md +++ b/explore-analyze/query-filter/languages/querydsl.md @@ -7,7 +7,7 @@ mapped_urls: - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html --- -# QueryDSL +# Query DSL $$$filter-context$$$ diff --git a/solutions/search/hybrid-semantic-text.md b/solutions/search/hybrid-semantic-text.md index e6db19a531..f58e3de1b7 100644 --- a/solutions/search/hybrid-semantic-text.md +++ b/solutions/search/hybrid-semantic-text.md @@ -102,7 +102,15 @@ POST _tasks//_cancel ## Perform hybrid search [hybrid-search-perform-search] -After reindexing the data into the `semantic-embeddings` index, you can perform hybrid search by using [reciprocal rank fusion (RRF)](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. +After reindexing the data into the `semantic-embeddings` index, you can perform hybrid search to combine semantic and lexical search results. Choose between [retrievers](retrievers-overview.md) or [{{esql}}](/explore-analyze/query-filter/languages/esql.md) syntax to execute the query. + +::::{tab-set} +:group: query-type + +:::{tab-item} Retrievers +:sync: retrievers + +This example uses retrievers with [reciprocal rank fusion (RRF)](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. ```console GET semantic-embeddings/_search @@ -141,7 +149,7 @@ GET semantic-embeddings/_search 4. The `semantic_text` field is used to perform the semantic search. -After performing the hybrid search, the query will return the top 10 documents that match both semantic and lexical search criteria. The results include detailed information about each document: +After performing the hybrid search, the query will return the top 10 documents that match both semantic and lexical search criteria. The results include detailed information about each document. ```console-result { @@ -202,3 +210,30 @@ After performing the hybrid search, the query will return the top 10 documents t } } ``` +::: + +:::{tab-item} ES|QL +:sync: esql + +The ES|QL approach uses a combination of the match operator `:` and the match function `match()` to perform hybrid search. + +```esql +POST /_query?format=txt +{ + "query": """ + FROM semantic-embeddings METADATA _score <1> + | WHERE content: "How to avoid muscle soreness while running?" OR match(semantic_text, "How to avoid muscle soreness while running?", { "boost": 0.75 }) <2> <3> + | SORT _score DESC <4> + | LIMIT 1000 + """ +} +``` +1. The `METADATA _score` clause is used to return the score of each document +2. The [match (`:`) operator](elasticsearch://reference/query-languages/esql/esql-functions-operators.md#esql-search-operators) is used on the `content` field for standard keyword matching +3. Semantic search using the `match()` function on the `semantic_text` field with a boost of `0.75` +4. Sorts by descending score and limits to 1000 results +::: +:::: + + + diff --git a/solutions/search/semantic-search/semantic-search-semantic-text.md b/solutions/search/semantic-search/semantic-search-semantic-text.md index a0cf27513e..48049cd26c 100644 --- a/solutions/search/semantic-search/semantic-search-semantic-text.md +++ b/solutions/search/semantic-search/semantic-search-semantic-text.md @@ -101,15 +101,23 @@ POST _tasks//_cancel ## Semantic search [semantic-text-semantic-search] -After the data set has been enriched with the embeddings, you can query the data using semantic search. Provide the `semantic_text` field name and the query text in a `semantic` query type. The {{infer}} endpoint used to generate the embeddings for the `semantic_text` field will be used to process the query text. +After the data has been indexed with the embeddings, you can query the data using semantic search. Choose between [Query DSL](/explore-analyze/query-filter/languages/querydsl.md) or [{{esql}}](/explore-analyze/query-filter/languages/esql.md) syntax to execute the query. -```console +::::{tab-set} +:group: query-type + +:::{tab-item} Query DSL +:sync: dsl + +The Query DSL approach uses the `semantic` query type with the `semantic_text` field: + +```esql GET semantic-embeddings/_search { "query": { "semantic": { "field": "content", <1> - "query": "How to avoid muscle soreness while running?" <2> + "query": "What causes muscle soreness after running?" <2> } } } @@ -117,9 +125,32 @@ GET semantic-embeddings/_search 1. The `semantic_text` field on which you want to perform the search. 2. The query text. +::: + +:::{tab-item} ES|QL +:sync: esql + +The ES|QL approach uses the [match (`:`) operator](elasticsearch://reference/query-languages/esql/esql-functions-operators.md#esql-search-operators), which automatically detects the `semantic_text` field and performs the search on it. The query uses `METADATA _score` to sort by `_score` in descending order. -As a result, you receive the top 10 documents that are closest in meaning to the query from the `semantic-embedding` index. +```esql +POST /_query?format=txt +{ + "query": """ + FROM semantic-embeddings METADATA _score <1> + | WHERE content: "What causes muscle soreness after running?" <2> + | SORT _score DESC <3> + | LIMIT 1000 <4> + """ +} +``` +1. The `METADATA _score` clause is used to return the score of each document +2. The [match (`:`) operator](elasticsearch://reference/query-languages/esql/esql-functions-operators.md#esql-search-operators) is used on the `content` field for standard keyword matching +3. Sorts by descending score to display the most relevant results first +4. Limits the results to 1000 documents + +::: +:::: ## Further examples and reading [semantic-text-further-examples] From cb9b7a0b01bf48283a1e6a76847c661735e5a4c5 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Tue, 18 Mar 2025 19:02:18 +0100 Subject: [PATCH 2/6] Restore original query text --- solutions/search/hybrid-semantic-text.md | 2 +- .../search/semantic-search/semantic-search-semantic-text.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/solutions/search/hybrid-semantic-text.md b/solutions/search/hybrid-semantic-text.md index f58e3de1b7..d6d1e8d6b2 100644 --- a/solutions/search/hybrid-semantic-text.md +++ b/solutions/search/hybrid-semantic-text.md @@ -222,7 +222,7 @@ POST /_query?format=txt { "query": """ FROM semantic-embeddings METADATA _score <1> - | WHERE content: "How to avoid muscle soreness while running?" OR match(semantic_text, "How to avoid muscle soreness while running?", { "boost": 0.75 }) <2> <3> + | WHERE content: "muscle soreness running?" OR match(semantic_text, "How to avoid muscle soreness while running?", { "boost": 0.75 }) <2> <3> | SORT _score DESC <4> | LIMIT 1000 """ diff --git a/solutions/search/semantic-search/semantic-search-semantic-text.md b/solutions/search/semantic-search/semantic-search-semantic-text.md index 48049cd26c..a9a2e1a006 100644 --- a/solutions/search/semantic-search/semantic-search-semantic-text.md +++ b/solutions/search/semantic-search/semantic-search-semantic-text.md @@ -138,7 +138,7 @@ POST /_query?format=txt { "query": """ FROM semantic-embeddings METADATA _score <1> - | WHERE content: "What causes muscle soreness after running?" <2> + | WHERE content: "How to avoid muscle soreness while running?" <2> | SORT _score DESC <3> | LIMIT 1000 <4> """ From a90111b067daab6f769025766bf3f3a7632f3524 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 19 Mar 2025 12:52:07 +0100 Subject: [PATCH 3/6] Fixes per review, formatting --- solutions/search/hybrid-semantic-text.md | 4 ++-- .../search/semantic-search/semantic-search-semantic-text.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/solutions/search/hybrid-semantic-text.md b/solutions/search/hybrid-semantic-text.md index d6d1e8d6b2..9a77ef2403 100644 --- a/solutions/search/hybrid-semantic-text.md +++ b/solutions/search/hybrid-semantic-text.md @@ -110,7 +110,7 @@ After reindexing the data into the `semantic-embeddings` index, you can perform :::{tab-item} Retrievers :sync: retrievers -This example uses retrievers with [reciprocal rank fusion (RRF)](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. +This example uses [retrievers syntax](retrievers-overview.md) with [reciprocal rank fusion (RRF)](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. ```console GET semantic-embeddings/_search @@ -217,7 +217,7 @@ After performing the hybrid search, the query will return the top 10 documents t The ES|QL approach uses a combination of the match operator `:` and the match function `match()` to perform hybrid search. -```esql +```console POST /_query?format=txt { "query": """ diff --git a/solutions/search/semantic-search/semantic-search-semantic-text.md b/solutions/search/semantic-search/semantic-search-semantic-text.md index a9a2e1a006..7a7c811741 100644 --- a/solutions/search/semantic-search/semantic-search-semantic-text.md +++ b/solutions/search/semantic-search/semantic-search-semantic-text.md @@ -133,7 +133,7 @@ GET semantic-embeddings/_search The ES|QL approach uses the [match (`:`) operator](elasticsearch://reference/query-languages/esql/esql-functions-operators.md#esql-search-operators), which automatically detects the `semantic_text` field and performs the search on it. The query uses `METADATA _score` to sort by `_score` in descending order. -```esql +```console POST /_query?format=txt { "query": """ From 112f534fa201341e58af1cb836cc9a843786238a Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 19 Mar 2025 12:52:55 +0100 Subject: [PATCH 4/6] idem --- solutions/search/hybrid-semantic-text.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/solutions/search/hybrid-semantic-text.md b/solutions/search/hybrid-semantic-text.md index 9a77ef2403..a4c82bea35 100644 --- a/solutions/search/hybrid-semantic-text.md +++ b/solutions/search/hybrid-semantic-text.md @@ -110,7 +110,8 @@ After reindexing the data into the `semantic-embeddings` index, you can perform :::{tab-item} Retrievers :sync: retrievers -This example uses [retrievers syntax](retrievers-overview.md) with [reciprocal rank fusion (RRF)](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. +This example uses [retrievers syntax +](retrievers-overview.md) with [reciprocal rank fusion (RRF)](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. ```console GET semantic-embeddings/_search @@ -149,7 +150,7 @@ GET semantic-embeddings/_search 4. The `semantic_text` field is used to perform the semantic search. -After performing the hybrid search, the query will return the top 10 documents that match both semantic and lexical search criteria. The results include detailed information about each document. +After performing the hybrid search, the query will return the combined top 10 documents for both semantic and lexical search criteria. The results include detailed information about each document. ```console-result { From 28da9f220d8b6799196712b478cc7181b09f0ef1 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 19 Mar 2025 12:54:37 +0100 Subject: [PATCH 5/6] Fix fat finger typo --- solutions/search/hybrid-semantic-text.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/solutions/search/hybrid-semantic-text.md b/solutions/search/hybrid-semantic-text.md index a4c82bea35..353bc34619 100644 --- a/solutions/search/hybrid-semantic-text.md +++ b/solutions/search/hybrid-semantic-text.md @@ -110,8 +110,7 @@ After reindexing the data into the `semantic-embeddings` index, you can perform :::{tab-item} Retrievers :sync: retrievers -This example uses [retrievers syntax -](retrievers-overview.md) with [reciprocal rank fusion (RRF)](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. +This example uses [retrievers syntax](retrievers-overview.md) with [reciprocal rank fusion (RRF)](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. ```console GET semantic-embeddings/_search From 134163246239227c991d3afe7b29c6b46bc7e256 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 19 Mar 2025 13:23:30 +0100 Subject: [PATCH 6/6] Use query DSL in tab title --- solutions/search/hybrid-semantic-text.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/search/hybrid-semantic-text.md b/solutions/search/hybrid-semantic-text.md index 353bc34619..839d1d1ead 100644 --- a/solutions/search/hybrid-semantic-text.md +++ b/solutions/search/hybrid-semantic-text.md @@ -107,7 +107,7 @@ After reindexing the data into the `semantic-embeddings` index, you can perform ::::{tab-set} :group: query-type -:::{tab-item} Retrievers +:::{tab-item} Query DSL :sync: retrievers This example uses [retrievers syntax](retrievers-overview.md) with [reciprocal rank fusion (RRF)](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md). RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant.