diff --git a/docs/reference/connector/docs/connectors-API-tutorial.asciidoc b/docs/reference/connector/docs/connectors-API-tutorial.asciidoc index 2e26e0d2a361d..8e9c7de92128d 100644 --- a/docs/reference/connector/docs/connectors-API-tutorial.asciidoc +++ b/docs/reference/connector/docs/connectors-API-tutorial.asciidoc @@ -84,7 +84,7 @@ Note: With {es} running locally, you will need to pass the username and password .Running API calls **** -You can run API calls using the https://www.elastic.co/guide/en/kibana/master/console-kibana.html[Dev Tools Console] in Kibana, using `curl` in your terminal, or with our programming language clients. +You can run API calls using the https://www.elastic.co/guide/en/kibana/8.x/console-kibana.html[Dev Tools Console] in Kibana, using `curl` in your terminal, or with our programming language clients. Our example widget allows you to copy code examples in both Dev Tools Console syntax and curl syntax. To use curl, you'll need to add authentication headers to your request. @@ -171,9 +171,9 @@ Now it's time for the real fun! We'll set up a connector to create a searchable [discrete#es-connectors-tutorial-api-create-connector] ==== Create a connector -We'll use the https://www.elastic.co/guide/en/elasticsearch/reference/master/create-connector-api.html[Create connector API] to create a PostgreSQL connector instance. +We'll use the https://www.elastic.co/guide/en/elasticsearch/reference/8.x/create-connector-api.html[Create connector API] to create a PostgreSQL connector instance. -Run the following API call, using the https://www.elastic.co/guide/en/kibana/master/console-kibana.html[Dev Tools Console] or `curl`: +Run the following API call, using the https://www.elastic.co/guide/en/kibana/8.x/console-kibana.html[Dev Tools Console] or `curl`: [source,console] ---- diff --git a/docs/reference/quickstart/esql-search-tutorial.asciidoc b/docs/reference/quickstart/esql-search-tutorial.asciidoc new file mode 100644 index 0000000000000..fad6668db9f77 --- /dev/null +++ b/docs/reference/quickstart/esql-search-tutorial.asciidoc @@ -0,0 +1,485 @@ +// ℹ️ 9.x version of this doc lives in docs-content repo +// https://github.com/elastic/docs-content/blob/main/solutions/search/esql-search-tutorial.md + +[[esql-search-tutorial]] +== Tutorial: Search and filter with {esql} + +[TIP] +===== +This tutorial presents examples in {esql} syntax. Refer to <> for the equivalent examples in Query DSL syntax. +===== + +This is a hands-on introduction to the basics of full-text search and semantic search, using <>. + +For an overview of all the search capabilities in {esql}, refer to <>. + +In this scenario, we're implementing search for a cooking blog. The blog contains recipes with various attributes including textual content, categorical data, and numerical ratings. + +[discrete] +[[esql-search-tutorial-requirements]] +=== Requirements + +You'll need a running {es} cluster, together with {kib} to use the Dev Tools API Console. Refer to <> for deployment options. + +Want to get started quickly? Run the following command in your terminal to set up a <>: + +[source,sh] +---- +curl -fsSL https://elastic.co/start-local | sh +---- +// NOTCONSOLE + +[discrete] +[[esql-search-tutorial-running-esql-queries]] +=== Running {esql} queries + +In this tutorial, you'll see {esql} examples in the following format: + +[source,esql] +---- +FROM cooking_blog +| WHERE description:"fluffy pancakes" +| LIMIT 1000 +---- + +If you want to run these queries in the <>, you'll need to use the following syntax: + +[source,js] +---- +POST /_query?format=txt +{ + "query": """ + FROM cooking_blog + | WHERE description:"fluffy pancakes" + | LIMIT 1000 + """ +} +---- +// NOTCONSOLE + +If you'd prefer to use your favorite programming language, refer to <> for a list of official and community-supported clients. + +[discrete] +[[esql-search-tutorial-step-1-create-an-index]] +=== Step 1: Create an index + +Create the `cooking_blog` index to get started: + +[source,console] +---- +PUT /cooking_blog +---- +// TESTSETUP + +Now define the mappings for the index: + +[source,console] +---- +PUT /cooking_blog/_mapping +{ + "properties": { + "title": { + "type": "text", + "analyzer": "standard", <1> + "fields": { <2> + "keyword": { + "type": "keyword", + "ignore_above": 256 <3> + } + } + }, + "description": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword" + } + } + }, + "author": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword" + } + } + }, + "date": { + "type": "date", + "format": "yyyy-MM-dd" + }, + "category": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword" + } + } + }, + "tags": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword" + } + } + }, + "rating": { + "type": "float" + } + } +} +---- +// TEST + +<1> The `standard` analyzer is used by default for `text` fields if an `analyzer` isn't specified. It's included here for demonstration purposes. +<2> <> are used here to index `text` fields as both `text` and `keyword` <>. This enables both full-text search and exact matching/filtering on the same field. Note that if you used <>, these multi-fields would be created automatically. +<3> The <> prevents indexing values longer than 256 characters in the `keyword` field. Again this is the default value, but it's included here for demonstration purposes. It helps to save disk space and avoid potential issues with Lucene's term byte-length limit. + +[TIP] +===== +Full-text search is powered by <>. Text analysis normalizes and standardizes text data so it can be efficiently stored in an inverted index and searched in near real-time. Analysis happens at both <>. This tutorial won't cover analysis in detail, but it's important to understand how text is processed to create effective search queries. +===== + +[discrete] +[[esql-search-tutorial-index-data]] +=== Step 2: Add sample blog posts to your index + +Now you'll need to index some example blog posts using the https://www.elastic.co/docs/api/doc/elasticsearch/v8/operation/operation-indices-get-mapping[Bulk API]. Note that `text` fields are analyzed and multi-fields are generated at index time. + +[source,console] +---- +POST /cooking_blog/_bulk?refresh=wait_for +{"index":{"_id":"1"}} +{"title":"Perfect Pancakes: A Fluffy Breakfast Delight","description":"Learn the secrets to making the fluffiest pancakes, so amazing you won't believe your tastebuds. This recipe uses buttermilk and a special folding technique to create light, airy pancakes that are perfect for lazy Sunday mornings.","author":"Maria Rodriguez","date":"2023-05-01","category":"Breakfast","tags":["pancakes","breakfast","easy recipes"],"rating":4.8} +{"index":{"_id":"2"}} +{"title":"Spicy Thai Green Curry: A Vegetarian Adventure","description":"Dive into the flavors of Thailand with this vibrant green curry. Packed with vegetables and aromatic herbs, this dish is both healthy and satisfying. Don't worry about the heat - you can easily adjust the spice level to your liking.","author":"Liam Chen","date":"2023-05-05","category":"Main Course","tags":["thai","vegetarian","curry","spicy"],"rating":4.6} +{"index":{"_id":"3"}} +{"title":"Classic Beef Stroganoff: A Creamy Comfort Food","description":"Indulge in this rich and creamy beef stroganoff. Tender strips of beef in a savory mushroom sauce, served over a bed of egg noodles. It's the ultimate comfort food for chilly evenings.","author":"Emma Watson","date":"2023-05-10","category":"Main Course","tags":["beef","pasta","comfort food"],"rating":4.7} +{"index":{"_id":"4"}} +{"title":"Vegan Chocolate Avocado Mousse","description":"Discover the magic of avocado in this rich, vegan chocolate mousse. Creamy, indulgent, and secretly healthy, it's the perfect guilt-free dessert for chocolate lovers.","author":"Alex Green","date":"2023-05-15","category":"Dessert","tags":["vegan","chocolate","avocado","healthy dessert"],"rating":4.5} +{"index":{"_id":"5"}} +{"title":"Crispy Oven-Fried Chicken","description":"Get that perfect crunch without the deep fryer! This oven-fried chicken recipe delivers crispy, juicy results every time. A healthier take on the classic comfort food.","author":"Maria Rodriguez","date":"2023-05-20","category":"Main Course","tags":["chicken","oven-fried","healthy"],"rating":4.9} +---- + +[[step-3-perform-basic-full-text-searches]] +[discrete] +=== Step 3: Perform basic full-text searches + +Full-text search involves executing text-based queries across one or more document fields. These queries calculate a relevance score for each matching document, based on how closely the document's content aligns with the search terms. Elasticsearch offers various query types, each with its own method for matching text and relevance scoring. + +[TIP] +===== +{esql} provides two ways to perform full-text searches: + +1. Full <> syntax: `match(field, "search terms")` +2. Compact syntax using the <>: `field:"search terms"` + +Both are equivalent and can be used interchangeably. The compact syntax is more concise, while the function syntax allows for more configuration options. We'll use the compact syntax in most examples for brevity. + +Refer to the <> reference docs for advanced parameters available with the function syntax. +===== + +[discrete] +[[esql-search-tutorial-basic-full-text-query]] +==== Basic full-text query + +Here's how to search the `description` field for "fluffy pancakes": + +[source,esql] +---- +FROM cooking_blog <1> +| WHERE description:"fluffy pancakes" <2> +| LIMIT 1000 <3> +---- +<1> Specify the index to search +<2> Full-text search with OR logic by default +<3> Return up to 1000 results + +[NOTE] +===== +The results ordering isn't by relevance, as we haven't requested the `_score` metadata field. We'll cover relevance scoring in the next section. +===== + +By default, like the Query DSL `match` query, {esql} uses `OR` logic between terms. This means it will match documents that contain either "fluffy" or "pancakes", or both, in the description field. + +[TIP] +===== +You can control which fields to include in the response using the `KEEP` command: + +[source,esql] +---- +FROM cooking_blog +| WHERE description:"fluffy pancakes" +| KEEP title, description, rating <1> +| LIMIT 1000 +---- +<1> Select only specific fields to include in response +===== + +[discrete] +[[esql-search-tutorial-require-all-terms]] +==== Require all terms in a match query + +Sometimes you need to require that all search terms appear in the matching documents. Here's how to do that using the function syntax with the `operator` parameter: + +[source,esql] +---- +FROM cooking_blog +| WHERE match(description, "fluffy pancakes", {"operator": "AND"}) <1> +| LIMIT 1000 +---- +<1> Require ALL terms to match + +This stricter search returns *zero hits* on our sample data, as no document contains both "fluffy" and "pancakes" in the description. + +[discrete] +[[esql-search-tutorial-minimum-terms]] +==== Specify a minimum number of terms to match + +Sometimes requiring all terms is too strict, but the default OR behavior is too lenient. You can specify a minimum number of terms that must match: + +[source,esql] +---- +FROM cooking_blog +| WHERE match(title, "fluffy pancakes breakfast", {"minimum_should_match": 2}) +| LIMIT 1000 +---- + +This query searches the title field to match at least 2 of the 3 terms: "fluffy", "pancakes", or "breakfast". + +[discrete] +[[esql-search-tutorial-semantic-search]] +=== Step 4: Semantic search and hybrid search + +[discrete] +[[esql-search-tutorial-index-semantic-content]] +==== Index semantic content + +{es} allows you to semantically search for documents based on the meaning of the text, rather than just the presence of specific keywords. This is useful when you want to find documents that are conceptually similar to a given query, even if they don't contain the exact search terms. + +ES|QL supports semantic search when your mappings include fields of the <> type. This example mapping update adds a new field called `semantic_description` with the type `semantic_text`: + +[source,console] +---- +PUT /cooking_blog/_mapping +{ + "properties": { + "semantic_description": { + "type": "semantic_text" + } + } +} +---- + +Next, index a document with content into the new field: + +[source,console] +---- +POST /cooking_blog/_doc +{ + "title": "Mediterranean Quinoa Bowl", + "semantic_description": "A protein-rich bowl with quinoa, chickpeas, fresh vegetables, and herbs. This nutritious Mediterranean-inspired dish is easy to prepare and perfect for a quick, healthy dinner.", + "author": "Jamie Oliver", + "date": "2023-06-01", + "category": "Main Course", + "tags": ["vegetarian", "healthy", "mediterranean", "quinoa"], + "rating": 4.7 +} +---- +// TEST[skip:uses ML] + +[discrete] +[[esql-search-tutorial-perform-semantic-search]] +==== Perform semantic search + +Once the document has been processed by the underlying model running on the inference endpoint, you can perform semantic searches. Here's an example natural language query against the `semantic_description` field: + +[source,esql] +---- +FROM cooking_blog +| WHERE semantic_description:"What are some easy to prepare but nutritious plant-based meals?" +| LIMIT 5 +---- + +[TIP] +===== +Follow this <> if you'd like to test out the semantic search workflow against a large dataset. +===== + +[discrete] +[[esql-search-tutorial-perform-hybrid-search]] +==== Perform hybrid search + +You can combine full-text and semantic queries. In this example we combine full-text and semantic search with custom weights: + +[source,esql] +---- +FROM cooking_blog METADATA _score +| WHERE match(semantic_description, "easy to prepare vegetarian meals", { "boost": 0.75 }) + OR match(tags, "vegetarian", { "boost": 0.25 }) +| SORT _score DESC +| LIMIT 5 +---- + +[discrete] +[[esql-search-tutorial-search-across-fields]] +=== Step 5: Search across multiple fields at once + +When users enter a search query, they often don't know (or care) whether their search terms appear in a specific field. {esql} provides ways to search across multiple fields simultaneously: + +[source,esql] +---- +FROM cooking_blog +| WHERE title:"vegetarian curry" OR description:"vegetarian curry" OR tags:"vegetarian curry" +| LIMIT 1000 +---- + +This query searches for "vegetarian curry" across the title, description, and tags fields. Each field is treated with equal importance. + +However, in many cases, matches in certain fields (like the title) might be more relevant than others. We can adjust the importance of each field using scoring: + +[source,esql] +---- +FROM cooking_blog METADATA _score <1> +| WHERE match(title, "vegetarian curry", {"boost": 2.0}) <2> + OR match(description, "vegetarian curry") + OR match(tags, "vegetarian curry") +| KEEP title, description, tags, _score <3> +| SORT _score DESC <4> +| LIMIT 1000 +---- +<1> Request _score metadata for relevance-based results +<2> Title matches are twice as important +<3> Include relevance score in results +<4> You must explicitly sort by `_score` to see relevance-based results + +[TIP] +===== +When working with relevance scoring in ES|QL, it's important to understand `_score`. If you don't include `METADATA _score` in your query, you won't see relevance scores in your results. This means you won't be able to sort by relevance or filter based on relevance scores. + +When you include `METADATA _score`, search functions included in WHERE conditions contribute to the relevance score. Filtering operations (like range conditions and exact matches) don't affect the score. + +If you want the most relevant results first, you must sort by `_score`, by explicitly using `SORT _score DESC` or `SORT _score ASC`. +===== + +[discrete] +[[esql-search-tutorial-filter-exact-matches]] +=== Step 6: Filter and find exact matches + +Filtering allows you to narrow down your search results based on exact criteria. Unlike full-text searches, filters are binary (yes/no) and do not affect the relevance score. Filters execute faster than queries because excluded results don't need to be scored. + +[source,esql] +---- +FROM cooking_blog +| WHERE category.keyword == "Breakfast" <1> +| KEEP title, author, rating, tags +| SORT rating DESC +| LIMIT 1000 +---- +<1> Exact match using keyword field (case-sensitive) + +Note the use of `category.keyword` here. This refers to the <> multi-field of the `category` field, ensuring an exact, case-sensitive match. + +[discrete] +[[esql-search-tutorial-date-range]] +==== Search for posts within a date range + +Often users want to find content published within a specific time frame: + +[source,esql] +---- +FROM cooking_blog +| WHERE date >= "2023-05-01" AND date <= "2023-05-31" <1> +| KEEP title, author, date, rating +| LIMIT 1000 +---- +<1> Inclusive date range filter + +[discrete] +[[esql-search-tutorial-exact-matches]] +==== Find exact matches + +Sometimes users want to search for exact terms to eliminate ambiguity in their search results: + +[source,esql] +---- +FROM cooking_blog +| WHERE author.keyword == "Maria Rodriguez" <1> +| KEEP title, author, rating, tags +| SORT rating DESC +| LIMIT 1000 +---- +<1> Exact match on author + +Like the `term` query in Query DSL, this has zero flexibility and is case-sensitive. + +[discrete] +[[esql-search-tutorial-combine-criteria]] +=== Step 7: Combine multiple search criteria + +Complex searches often require combining multiple search criteria: + +[source,esql] +---- +FROM cooking_blog METADATA _score +| WHERE rating >= 4.5 <1> + AND NOT category.keyword == "Dessert" <2> + AND (title:"curry spicy" OR description:"curry spicy") <3> +| SORT _score DESC +| KEEP title, author, rating, tags, description +| LIMIT 1000 +---- +<1> Numerical filter +<2> Exclusion filter +<3> Full-text search in multiple fields + +[discrete] +[[esql-search-tutorial-relevance-scoring]] +==== Combine relevance scoring with custom criteria + +For more complex relevance scoring with combined criteria, you can use the `EVAL` command to calculate custom scores: + +[source,esql] +---- +FROM cooking_blog METADATA _score +| WHERE NOT category.keyword == "Dessert" +| EVAL tags_concat = MV_CONCAT(tags.keyword, ",") <1> +| WHERE tags_concat LIKE "*vegetarian*" AND rating >= 4.5 <2> +| WHERE match(title, "curry spicy", {"boost": 2.0}) OR match(description, "curry spicy") <3> +| EVAL category_boost = CASE(category.keyword == "Main Course", 1.0, 0.0) <4> +| EVAL date_boost = CASE(DATE_DIFF("month", date, NOW()) <= 1, 0.5, 0.0) <5> +| EVAL custom_score = _score + category_boost + date_boost <6> +| WHERE custom_score > 0 <7> +| SORT custom_score DESC +| LIMIT 1000 +---- +<1> Convert multi-value field to string +<2> Wildcard pattern matching +<3> Uses full text functions, will update _score metadata field +<4> Conditional boost +<5> Boost recent content +<6> Combine scores +<7> Filter based on custom score + +[discrete] +[[esql-search-tutorial-learn-more]] +=== Learn more + +[discrete] +[[esql-search-tutorial-documentation]] +==== Documentation + +This tutorial introduced the basics of search and filtering in {esql}. Building a real-world search experience requires understanding many more advanced concepts and techniques. Here are some resources once you're ready to dive deeper: + +- <>: Learn about all your options for search use cases with {esql}. +- <>: Explore the full list of search functions available in {esql}. +- <>: Understand your various options for semantic search in Elasticsearch. + - <>: Learn how to use the `semantic_text` field type for semantic search. This is the recommended approach for most users looking to perform semantic search in {es}, because it abstracts away the complexity of setting up inference endpoints and models. + +[discrete] +[[esql-search-tutorial-blog-posts]] +==== Related blog posts + +// TODO [[uncomment once blog is live]] - https://www.elastic.co/blog/esql-you-know-for-search-scoring-semantic-search[Introducing scoring and semantic searchin {esql}]: +- https://www.elastic.co/search-labs/blog/filtering-in-esql-full-text-search-match-qstr[Introducing full text filtering in ES|QL] \ No newline at end of file diff --git a/docs/reference/quickstart/index.asciidoc b/docs/reference/quickstart/index.asciidoc index 330582956c457..ccb7968accc0f 100644 --- a/docs/reference/quickstart/index.asciidoc +++ b/docs/reference/quickstart/index.asciidoc @@ -25,7 +25,8 @@ Alternatively, refer to our <>. Learn about indices, documents, and mappings, and perform a basic search using the Query DSL. * <>. Learn about different options for querying data, including full-text search and filtering, using the Query DSL. -* <>: Learn how to query and aggregate your data using {esql}. +* <>: Learn how to query and aggregate your data using {esql}. +* <>: Learn how to use {esql} for search use cases, including full-text search, semantic search, and hybrid search. * <>. Learn how to analyze data using different types of aggregations, including metrics, buckets, and pipelines. * <>: Learn how to create embeddings for your data with `semantic_text` and query using the `semantic` query. ** <>: Learn how to combine semantic search with full-text search. @@ -42,4 +43,5 @@ If you're interested in using {es} with Python, check out Elastic Search Labs: include::getting-started.asciidoc[] include::full-text-filtering-tutorial.asciidoc[] +include::esql-search-tutorial.asciidoc[] include::aggs-tutorial.asciidoc[] diff --git a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc index 1caa03c66d699..07bb90889716d 100644 --- a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc +++ b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc @@ -128,7 +128,12 @@ POST _tasks//_cancel [[semantic-text-semantic-search]] ==== Semantic search -After the data set has been enriched with the embeddings, you can query the data using semantic search. +After the data set has been enriched with the embeddings, you can query the data using semantic search. You can use Query DSL or {esql} syntax. + +[discrete] +[[semantic-text-semantic-search-query-dsl]] +===== Query DSL syntax + Provide the `semantic_text` field name and the query text in a `semantic` query type. The {infer} endpoint used to generate the embeddings for the `semantic_text` field will be used to process the query text. @@ -151,6 +156,35 @@ GET semantic-embeddings/_search As a result, you receive the top 10 documents that are closest in meaning to the query from the `semantic-embedding` index. +[discrete] +[[semantic-text-semantic-search-esql]] +===== {esql} syntax + +The ES|QL approach uses the <>, which automatically detects the `semantic_text` field and performs the search on it. The query uses `METADATA _score` to sort by `_score` in descending order. + +[source,console] +---- +POST /_query?format=txt +{ + "query": """ + FROM semantic-embeddings METADATA _score <1> + | WHERE content: "How to avoid muscle soreness while running?" <2> + | SORT _score DESC <3> + | LIMIT 1000 <4> + """ +} +---- +// TEST[skip:uses ML] +<1> The `METADATA _score` clause is used to return the score of each document +<2> The <> is used on the `content` field for standard keyword matching +<3> Sorts by descending score to display the most relevant results first +<4> Limits the results to 1000 documents + +[TIP] +==== +Refer to <> for more information on using the {esql} language for search use cases. +==== + [discrete] [[semantic-text-further-examples]] ==== Further examples and reading diff --git a/docs/reference/search/search-your-data/semantic-text-hybrid-search b/docs/reference/search/search-your-data/semantic-text-hybrid-search index 3d27f92e767ae..23a04251c6989 100644 --- a/docs/reference/search/search-your-data/semantic-text-hybrid-search +++ b/docs/reference/search/search-your-data/semantic-text-hybrid-search @@ -112,7 +112,15 @@ POST _tasks//_cancel [[hybrid-search-perform-search]] ==== Perform hybrid search -After reindexing the data into the `semantic-embeddings` index, you can perform hybrid search by using <>. RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. +After reindexing the data into the `semantic-embeddings` index, you can perform hybrid search. You can use retrievers syntax or {esql} syntax to perform the search. + +[discrete] +[[hybrid-search-retrievers-syntax]] +===== Retrievers syntax + +This approach uses the <> algorithm. RRF is a technique that merges the rankings from both semantic and lexical queries, giving more weight to results that rank high in either search. This ensures that the final results are balanced and relevant. + +To extract the most relevant fragments from the original text and query, you can use the <>: [source,console] ------------------------------------------------------------ @@ -215,3 +223,32 @@ After performing the hybrid search, the query will return the top 10 documents t } ------------------------------------------------------------ // NOTCONSOLE + +[discrete] +[[hybrid-search-esql-syntax]] +===== {esql} syntax + +The <> approach uses a combination of the match operator `:` and the match function `match()` to perform hybrid search. + +[source,console] +---- +POST /_query?format=txt +{ + "query": """ + FROM semantic-embeddings METADATA _score <1> + | WHERE content: "muscle soreness running?" OR match(semantic_text, "How to avoid muscle soreness while running?", { "boost": 0.75 }) <2> <3> + | SORT _score DESC <4> + | LIMIT 1000 + """ +} +---- +// TEST[skip:uses ML] +<1> The `METADATA _score` clause is used to return the score of each document +<2> The <> is used on the `content` field for standard keyword matching +<3> Semantic search using the `match()` function on the `semantic_text` field with a boost of `0.75` +<4> Sorts by descending score and limits to 1000 results + +[TIP] +==== +Refer to <> for more information on using the {esql} language for search use cases. +==== \ No newline at end of file