From bef5d578665b434f8912ee50298c37c1fb8fd21e Mon Sep 17 00:00:00 2001 From: lcawl Date: Wed, 25 Jun 2025 17:35:36 -0700 Subject: [PATCH 1/9] Add vector search getting started guide --- ...erless-elasticsearch-get-started-vector.md | 160 ++++++++++++++++++ solutions/toc.yml | 2 + 2 files changed, 162 insertions(+) create mode 100644 solutions/search/serverless-elasticsearch-get-started-vector.md diff --git a/solutions/search/serverless-elasticsearch-get-started-vector.md b/solutions/search/serverless-elasticsearch-get-started-vector.md new file mode 100644 index 0000000000..4181a8dbec --- /dev/null +++ b/solutions/search/serverless-elasticsearch-get-started-vector.md @@ -0,0 +1,160 @@ +--- +navigation_title: Vector search +description: An introduction to vectors and knn search in Elasticsearch. +applies_to: + serverless: +products: + - id: cloud-serverless +--- +# Get started with vector search in {{es-serverless}} + +{{es}} enables you to generate mathematical representations of your content called _embeddings_ or _vectors_. +There are two types of representation (_dense_ and _sparse_), which are suited to different types of queries and use cases (for example, finding similar images and content or storing expanded terms and weights). +In this introduction to vector search, you'll store and search for [dense vectors](/solutions/search/vector/dense-vector.md). +The primary use case for dense vectors is to find pieces of content with similar meanings by using mathematical functions, in this case an [approximate k-nearest neighbour](/solutions/search/vector/knn.md)(kNN) search. + +To learn more about which type of vector is appropriate for your use case, check out [](/docs/solutions/search/vector.md). +For an overview of the differences between semantic search and vector search, go to [](/solutions/search/ai-search/ai-search.md). + +% TBD: Is "text embedding" interchangeable with "vector embedding"? + +To try out vector search, [create an {{es-serverless}} project](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-get-started-create-project) that is optimized for vectors. + +% TBD can you do vector search in the other project options too? + +## Add data + + + +There are some simple data sets that you can use for learning purposes. +For example, if you follow the [guided index flow](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-follow-guided-index-flow), you can choose the vector search option. +Follow the instructions to install an {{es}} client and define field mappings. +Alternatively, try out the API requests in the [Console](/explore-analyze/query-filter/tools/console.md): + +```console +PUT /vector-index/_mapping +{ + "properties": { + "vector": { + "type": "dense_vector", + "dims": 3 + }, + "text": { + "type": "text" + } + } +} +``` + +This example defines two fields: a three-dimensional [dense_vector field](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) and a text field. + +Next, use the Elasticsearch bulk API to ingest an array of documents into the index. +For example: + +```console +POST /_bulk?pretty +{ "index": { "_index": "vector-index" } } +{"vector":[5.936,3.083,5.087],"text":"Yellowstone National Park is one of the largest national parks in the United States. It ranges from the Wyoming to Montana and Idaho, and contains an area of 2,219,791 acress across three different states. Its most famous for hosting the geyser Old Faithful and is centered on the Yellowstone Caldera, the largest super volcano on the American continent. Yellowstone is host to hundreds of species of animal, many of which are endangered or threatened. Most notably, it contains free-ranging herds of bison and elk, alongside bears, cougars and wolves. The national park receives over 4.5 million visitors annually and is a UNESCO World Heritage Site."} +{ "index": { "_index": "vector-index" } } +{"vector":[4.938,7.78,3.88],"text":"Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face."} +{ "index": { "_index": "vector-index" } } +{"vector":[9.863,8.919,2.368],"text":"Rocky Mountain National Park is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site."} + +``` + +In this simple example, the vectors are provided in each document. +In a real-world scenario, you could generate the vectors as you added data by using an ingest pipeline with an inference processor. + +## Test a vector search query + +Now try a query to get the documents that are closest to a vector. +For example, use a knn query with a vector `[2,6,0]`: + +```console +GET vector-search/_search +{ + "knn": { + "field": "vector", + "k": 10, + "num_candidates": 100, + "query_vector": [2,6,0] + } +} +``` + +The search results are sorted by relevance score, which measures how well each document matches the query. +For example: + +```json +{ + "took": 9, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 0.95106244, + "hits": [ + { + "_index": "search-05ro", + "_id": "QZqVqZcBabT17nUqiiKf", + "_score": 0.95106244, + "_source": { + "vector": [ + 4.938, + 7.78, + 3.88 + ], + "text": "Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face." + } + }, + ... +``` + + + +## Next steps + +Thanks for taking the time to try out vector search in {{es-serverless}}. +For another dense vector example, check out [](/solutions/search/vector/bring-own-vectors.md). +For an example of using pipelines to generate text embeddings, check out [](/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md). +To learn about more options, such as semantic and keyword search, go to [](/solutions/search/search-approaches.md). + + diff --git a/solutions/toc.yml b/solutions/toc.yml index 1da35be459..4b3e6ea3ec 100644 --- a/solutions/toc.yml +++ b/solutions/toc.yml @@ -7,6 +7,8 @@ toc: children: - file: search/run-elasticsearch-locally.md - file: search/serverless-elasticsearch-get-started.md + children: + - file: search/serverless-elasticsearch-get-started-vector.md - file: search/search-connection-details.md - file: search/api-quickstarts.md children: From c4260eae9c4c5b77ec5fcd9b3e38910e249c97bf Mon Sep 17 00:00:00 2001 From: Lisa Cawley Date: Thu, 26 Jun 2025 10:03:49 -0700 Subject: [PATCH 2/9] Fix broken url --- solutions/search/serverless-elasticsearch-get-started-vector.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/search/serverless-elasticsearch-get-started-vector.md b/solutions/search/serverless-elasticsearch-get-started-vector.md index 4181a8dbec..d82d7d48fb 100644 --- a/solutions/search/serverless-elasticsearch-get-started-vector.md +++ b/solutions/search/serverless-elasticsearch-get-started-vector.md @@ -13,7 +13,7 @@ There are two types of representation (_dense_ and _sparse_), which are suited t In this introduction to vector search, you'll store and search for [dense vectors](/solutions/search/vector/dense-vector.md). The primary use case for dense vectors is to find pieces of content with similar meanings by using mathematical functions, in this case an [approximate k-nearest neighbour](/solutions/search/vector/knn.md)(kNN) search. -To learn more about which type of vector is appropriate for your use case, check out [](/docs/solutions/search/vector.md). +To learn more about which type of vector is appropriate for your use case, check out [](/solutions/search/vector.md). For an overview of the differences between semantic search and vector search, go to [](/solutions/search/ai-search/ai-search.md). % TBD: Is "text embedding" interchangeable with "vector embedding"? From 7dbc2196510dcdbb06c4956a77a21887845a5811 Mon Sep 17 00:00:00 2001 From: lcawl Date: Wed, 9 Jul 2025 18:43:32 -0700 Subject: [PATCH 3/9] Rebase to align with semantic search quickstart --- solutions/search/get-started/quickstarts.md | 1 + solutions/search/get-started/vector-search.md | 232 ++++++++++++++++++ ...erless-elasticsearch-get-started-vector.md | 160 ------------ solutions/toc.yml | 3 +- 4 files changed, 234 insertions(+), 162 deletions(-) create mode 100644 solutions/search/get-started/vector-search.md delete mode 100644 solutions/search/serverless-elasticsearch-get-started-vector.md diff --git a/solutions/search/get-started/quickstarts.md b/solutions/search/get-started/quickstarts.md index 0c99feb870..944ccf742f 100644 --- a/solutions/search/get-started/quickstarts.md +++ b/solutions/search/get-started/quickstarts.md @@ -16,6 +16,7 @@ Each quickstart provides: Follow the steps in these guides to get started quickly: - [](/solutions/search/get-started/semantic-search.md) +- [](/solutions/search/get-started/vector-search.md) For more advanced API examples, check out [](/solutions/search/api-quickstarts.md). diff --git a/solutions/search/get-started/vector-search.md b/solutions/search/get-started/vector-search.md new file mode 100644 index 0000000000..2703ac5b91 --- /dev/null +++ b/solutions/search/get-started/vector-search.md @@ -0,0 +1,232 @@ +--- +navigation_title: Vector search +description: An introduction to vectors and knn search in Elasticsearch. +applies_to: + serverless: all + stack: all +products: + - id: elasticsearch +--- +# Get started with vector search + +{{es}} enables you to generate mathematical representations of your content called _embeddings_ or _vectors_. +There are two types of representation (_dense_ and _sparse_), which are suited to different types of queries and use cases (for example, finding similar images and content or storing expanded terms and weights). +In this introduction to vector search, you'll store and search for [dense vectors](/solutions/search/vector/dense-vector.md). +The primary use case for dense vectors is to find pieces of content with similar meanings by using mathematical functions, in this case an [approximate k-nearest neighbour](/solutions/search/vector/knn.md) (kNN) search. + +To learn more about which type of vector is appropriate for your use case, check out [](/solutions/search/vector.md). +For an overview of the differences between semantic search and vector search, go to [](/solutions/search/ai-search/ai-search.md). + + + +## Prerequisites + +- If you're using [{{es-serverless}}](/solutions/search/serverless-elasticsearch-get-started.md), create a project that is optimized for vectors. To add the sample data, you must have a `developer` or `admin` predefined role or an equivalent custom role. +- If you're using [{{ech}}](/deploy-manage/deploy/elastic-cloud/cloud-hosted.md) or [running {{es}} locally](/solutions/search/run-elasticsearch-locally.md), start {{es}} and {{kib}}. To add the sample data, log in with a user that has the `superuser` built-in role. + +To learn about role-based access control, check out [](/deploy-manage/users-roles/cluster-or-deployment-auth/user-roles.md). + + + +## Create a vector database + + + + +:::::{stepper} +::::{step} Create an index + +An index is a collection of documents uniquely identified by a name or an alias. +You can follow the guided index workflow: + +- If you're using {{es-serverless}}, go to **{{es}} > Home**, select the vector search workflow, and **Create a vector optimized index**. +- If you're using {{ech}} or running {{es}} locally, go to **{{es}} > Home** and click **Create API index**. Select the vector search workflow. + +When you complete the workflow, you will have sample data and can skip to the steps related to exploring and searching it. +Alternatively, run the following API request in [Console](/explore-analyze/query-filter/tools/console.md): + +```console +PUT /vector-index +``` + +:::{tip} +For an introduction to the concept of indices, check out [](/manage-data/data-store/index-basics.md). +::: +:::: +::::{step} Create a dense vector field mapping + +Each index has mappings that define how data is stored and indexed, like a schema in a relational database. + +The following example defines two fields: a three-dimensional [dense_vector field](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) and a text field. + +```console +PUT /vector-index/_mapping +{ + "properties": { + "vector": { + "type": "dense_vector", + "dims": 3 + }, + "content": { + "type": "text" + } + } +} +``` + +For a deeper dive, check out [Mapping embeddings to Elasticsearch field types: semantic_text, dense_vector, sparse_vector](https://www.elastic.co/search-labs/blog/mapping-embeddings-to-elasticsearch-field-types). +:::: +:::: +::::{step} Add documents + +You can use the Elasticsearch bulk API to ingest an array of documents: + +```console +POST /_bulk?pretty +{ "index": { "_index": "vector-index" } } +{"vector":[5.936,3.083,5.087],"content":"Yellowstone National Park is one of the largest national parks in the United States. It ranges from the Wyoming to Montana and Idaho, and contains an area of 2,219,791 acress across three different states. Its most famous for hosting the geyser Old Faithful and is centered on the Yellowstone Caldera, the largest super volcano on the American continent. Yellowstone is host to hundreds of species of animal, many of which are endangered or threatened. Most notably, it contains free-ranging herds of bison and elk, alongside bears, cougars and wolves. The national park receives over 4.5 million visitors annually and is a UNESCO World Heritage Site."} +{ "index": { "_index": "vector-index" } } +{"vector":[4.938,7.78,3.88],"content":"Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face."} +{ "index": { "_index": "vector-index" } } +{"vector":[9.863,8.919,2.368],"content":"Rocky Mountain National Park is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site."} +``` + +In this example, the vectors are provided in each document. +In a real-world scenario, you could generate the vectors as you added data by using an ingest pipeline with an inference processor. +:::: + +## Test vector search + + + + +Now try a query to get the documents that are closest to a vector. +For example, use a knn query with a vector `[2,6,0]`: + +```console +GET vector-search/_search +{ + "knn": { + "field": "vector", + "k": 10, + "num_candidates": 100, + "query_vector": [2,6,0] + } +} +``` + +The search results are sorted by relevance score, which measures how well each document matches the query. +For example: + +```json +{ + "took": 9, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 0.95106244, + "hits": [ + { + "_index": "search-05ro", + "_id": "QZqVqZcBabT17nUqiiKf", + "_score": 0.95106244, + "_source": { + "vector": [ + 4.938, + 7.78, + 3.88 + ], + "content": "Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face." + } + }, + ... +``` + + + + +## Next steps + +Thanks for taking the time to try out vector search. +For another dense vector example, check out [](/solutions/search/vector/bring-own-vectors.md). +For an example of using pipelines to generate text embeddings, check out [](/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md). + +To learn about more options, such as semantic and keyword search, go to [](/solutions/search/search-approaches.md). +For a summary of the AI-powered search use cases, go to [](/solutions/search/ai-search/ai-search.md). + + diff --git a/solutions/search/serverless-elasticsearch-get-started-vector.md b/solutions/search/serverless-elasticsearch-get-started-vector.md deleted file mode 100644 index d82d7d48fb..0000000000 --- a/solutions/search/serverless-elasticsearch-get-started-vector.md +++ /dev/null @@ -1,160 +0,0 @@ ---- -navigation_title: Vector search -description: An introduction to vectors and knn search in Elasticsearch. -applies_to: - serverless: -products: - - id: cloud-serverless ---- -# Get started with vector search in {{es-serverless}} - -{{es}} enables you to generate mathematical representations of your content called _embeddings_ or _vectors_. -There are two types of representation (_dense_ and _sparse_), which are suited to different types of queries and use cases (for example, finding similar images and content or storing expanded terms and weights). -In this introduction to vector search, you'll store and search for [dense vectors](/solutions/search/vector/dense-vector.md). -The primary use case for dense vectors is to find pieces of content with similar meanings by using mathematical functions, in this case an [approximate k-nearest neighbour](/solutions/search/vector/knn.md)(kNN) search. - -To learn more about which type of vector is appropriate for your use case, check out [](/solutions/search/vector.md). -For an overview of the differences between semantic search and vector search, go to [](/solutions/search/ai-search/ai-search.md). - -% TBD: Is "text embedding" interchangeable with "vector embedding"? - -To try out vector search, [create an {{es-serverless}} project](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-get-started-create-project) that is optimized for vectors. - -% TBD can you do vector search in the other project options too? - -## Add data - - - -There are some simple data sets that you can use for learning purposes. -For example, if you follow the [guided index flow](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-follow-guided-index-flow), you can choose the vector search option. -Follow the instructions to install an {{es}} client and define field mappings. -Alternatively, try out the API requests in the [Console](/explore-analyze/query-filter/tools/console.md): - -```console -PUT /vector-index/_mapping -{ - "properties": { - "vector": { - "type": "dense_vector", - "dims": 3 - }, - "text": { - "type": "text" - } - } -} -``` - -This example defines two fields: a three-dimensional [dense_vector field](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) and a text field. - -Next, use the Elasticsearch bulk API to ingest an array of documents into the index. -For example: - -```console -POST /_bulk?pretty -{ "index": { "_index": "vector-index" } } -{"vector":[5.936,3.083,5.087],"text":"Yellowstone National Park is one of the largest national parks in the United States. It ranges from the Wyoming to Montana and Idaho, and contains an area of 2,219,791 acress across three different states. Its most famous for hosting the geyser Old Faithful and is centered on the Yellowstone Caldera, the largest super volcano on the American continent. Yellowstone is host to hundreds of species of animal, many of which are endangered or threatened. Most notably, it contains free-ranging herds of bison and elk, alongside bears, cougars and wolves. The national park receives over 4.5 million visitors annually and is a UNESCO World Heritage Site."} -{ "index": { "_index": "vector-index" } } -{"vector":[4.938,7.78,3.88],"text":"Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face."} -{ "index": { "_index": "vector-index" } } -{"vector":[9.863,8.919,2.368],"text":"Rocky Mountain National Park is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site."} - -``` - -In this simple example, the vectors are provided in each document. -In a real-world scenario, you could generate the vectors as you added data by using an ingest pipeline with an inference processor. - -## Test a vector search query - -Now try a query to get the documents that are closest to a vector. -For example, use a knn query with a vector `[2,6,0]`: - -```console -GET vector-search/_search -{ - "knn": { - "field": "vector", - "k": 10, - "num_candidates": 100, - "query_vector": [2,6,0] - } -} -``` - -The search results are sorted by relevance score, which measures how well each document matches the query. -For example: - -```json -{ - "took": 9, - "timed_out": false, - "_shards": { - "total": 3, - "successful": 3, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 3, - "relation": "eq" - }, - "max_score": 0.95106244, - "hits": [ - { - "_index": "search-05ro", - "_id": "QZqVqZcBabT17nUqiiKf", - "_score": 0.95106244, - "_source": { - "vector": [ - 4.938, - 7.78, - 3.88 - ], - "text": "Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face." - } - }, - ... -``` - - - -## Next steps - -Thanks for taking the time to try out vector search in {{es-serverless}}. -For another dense vector example, check out [](/solutions/search/vector/bring-own-vectors.md). -For an example of using pipelines to generate text embeddings, check out [](/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md). -To learn about more options, such as semantic and keyword search, go to [](/solutions/search/search-approaches.md). - - diff --git a/solutions/toc.yml b/solutions/toc.yml index 53c17e85a3..3a9ee3bfb0 100644 --- a/solutions/toc.yml +++ b/solutions/toc.yml @@ -7,12 +7,11 @@ toc: children: - file: search/run-elasticsearch-locally.md - file: search/serverless-elasticsearch-get-started.md - children: - - file: search/serverless-elasticsearch-get-started-vector.md - file: search/search-connection-details.md - file: search/get-started/quickstarts.md children: - file: search/get-started/semantic-search.md + - file: search/get-started/vector-search.md - file: search/api-quickstarts.md children: - file: search/elasticsearch-basics-quickstart.md From f3ce8a1aa57896c6e4c18d963d7c7b5702730b8a Mon Sep 17 00:00:00 2001 From: lcawl Date: Mon, 14 Jul 2025 17:47:31 -0700 Subject: [PATCH 4/9] Remove new page, augment existing tutorial --- solutions/search/get-started/quickstarts.md | 2 +- solutions/search/get-started/vector-search.md | 232 ------------------ solutions/search/vector/bring-own-vectors.md | 98 +++++--- solutions/toc.yml | 1 - 4 files changed, 61 insertions(+), 272 deletions(-) delete mode 100644 solutions/search/get-started/vector-search.md diff --git a/solutions/search/get-started/quickstarts.md b/solutions/search/get-started/quickstarts.md index 944ccf742f..75695be286 100644 --- a/solutions/search/get-started/quickstarts.md +++ b/solutions/search/get-started/quickstarts.md @@ -16,7 +16,7 @@ Each quickstart provides: Follow the steps in these guides to get started quickly: - [](/solutions/search/get-started/semantic-search.md) -- [](/solutions/search/get-started/vector-search.md) +- [](/solutions/search/vector/bring-your-own-vectors.md) For more advanced API examples, check out [](/solutions/search/api-quickstarts.md). diff --git a/solutions/search/get-started/vector-search.md b/solutions/search/get-started/vector-search.md deleted file mode 100644 index 2703ac5b91..0000000000 --- a/solutions/search/get-started/vector-search.md +++ /dev/null @@ -1,232 +0,0 @@ ---- -navigation_title: Vector search -description: An introduction to vectors and knn search in Elasticsearch. -applies_to: - serverless: all - stack: all -products: - - id: elasticsearch ---- -# Get started with vector search - -{{es}} enables you to generate mathematical representations of your content called _embeddings_ or _vectors_. -There are two types of representation (_dense_ and _sparse_), which are suited to different types of queries and use cases (for example, finding similar images and content or storing expanded terms and weights). -In this introduction to vector search, you'll store and search for [dense vectors](/solutions/search/vector/dense-vector.md). -The primary use case for dense vectors is to find pieces of content with similar meanings by using mathematical functions, in this case an [approximate k-nearest neighbour](/solutions/search/vector/knn.md) (kNN) search. - -To learn more about which type of vector is appropriate for your use case, check out [](/solutions/search/vector.md). -For an overview of the differences between semantic search and vector search, go to [](/solutions/search/ai-search/ai-search.md). - - - -## Prerequisites - -- If you're using [{{es-serverless}}](/solutions/search/serverless-elasticsearch-get-started.md), create a project that is optimized for vectors. To add the sample data, you must have a `developer` or `admin` predefined role or an equivalent custom role. -- If you're using [{{ech}}](/deploy-manage/deploy/elastic-cloud/cloud-hosted.md) or [running {{es}} locally](/solutions/search/run-elasticsearch-locally.md), start {{es}} and {{kib}}. To add the sample data, log in with a user that has the `superuser` built-in role. - -To learn about role-based access control, check out [](/deploy-manage/users-roles/cluster-or-deployment-auth/user-roles.md). - - - -## Create a vector database - - - - -:::::{stepper} -::::{step} Create an index - -An index is a collection of documents uniquely identified by a name or an alias. -You can follow the guided index workflow: - -- If you're using {{es-serverless}}, go to **{{es}} > Home**, select the vector search workflow, and **Create a vector optimized index**. -- If you're using {{ech}} or running {{es}} locally, go to **{{es}} > Home** and click **Create API index**. Select the vector search workflow. - -When you complete the workflow, you will have sample data and can skip to the steps related to exploring and searching it. -Alternatively, run the following API request in [Console](/explore-analyze/query-filter/tools/console.md): - -```console -PUT /vector-index -``` - -:::{tip} -For an introduction to the concept of indices, check out [](/manage-data/data-store/index-basics.md). -::: -:::: -::::{step} Create a dense vector field mapping - -Each index has mappings that define how data is stored and indexed, like a schema in a relational database. - -The following example defines two fields: a three-dimensional [dense_vector field](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) and a text field. - -```console -PUT /vector-index/_mapping -{ - "properties": { - "vector": { - "type": "dense_vector", - "dims": 3 - }, - "content": { - "type": "text" - } - } -} -``` - -For a deeper dive, check out [Mapping embeddings to Elasticsearch field types: semantic_text, dense_vector, sparse_vector](https://www.elastic.co/search-labs/blog/mapping-embeddings-to-elasticsearch-field-types). -:::: -:::: -::::{step} Add documents - -You can use the Elasticsearch bulk API to ingest an array of documents: - -```console -POST /_bulk?pretty -{ "index": { "_index": "vector-index" } } -{"vector":[5.936,3.083,5.087],"content":"Yellowstone National Park is one of the largest national parks in the United States. It ranges from the Wyoming to Montana and Idaho, and contains an area of 2,219,791 acress across three different states. Its most famous for hosting the geyser Old Faithful and is centered on the Yellowstone Caldera, the largest super volcano on the American continent. Yellowstone is host to hundreds of species of animal, many of which are endangered or threatened. Most notably, it contains free-ranging herds of bison and elk, alongside bears, cougars and wolves. The national park receives over 4.5 million visitors annually and is a UNESCO World Heritage Site."} -{ "index": { "_index": "vector-index" } } -{"vector":[4.938,7.78,3.88],"content":"Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face."} -{ "index": { "_index": "vector-index" } } -{"vector":[9.863,8.919,2.368],"content":"Rocky Mountain National Park is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site."} -``` - -In this example, the vectors are provided in each document. -In a real-world scenario, you could generate the vectors as you added data by using an ingest pipeline with an inference processor. -:::: - -## Test vector search - - - - -Now try a query to get the documents that are closest to a vector. -For example, use a knn query with a vector `[2,6,0]`: - -```console -GET vector-search/_search -{ - "knn": { - "field": "vector", - "k": 10, - "num_candidates": 100, - "query_vector": [2,6,0] - } -} -``` - -The search results are sorted by relevance score, which measures how well each document matches the query. -For example: - -```json -{ - "took": 9, - "timed_out": false, - "_shards": { - "total": 3, - "successful": 3, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 3, - "relation": "eq" - }, - "max_score": 0.95106244, - "hits": [ - { - "_index": "search-05ro", - "_id": "QZqVqZcBabT17nUqiiKf", - "_score": 0.95106244, - "_source": { - "vector": [ - 4.938, - 7.78, - 3.88 - ], - "content": "Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face." - } - }, - ... -``` - - - - -## Next steps - -Thanks for taking the time to try out vector search. -For another dense vector example, check out [](/solutions/search/vector/bring-own-vectors.md). -For an example of using pipelines to generate text embeddings, check out [](/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md). - -To learn about more options, such as semantic and keyword search, go to [](/solutions/search/search-approaches.md). -For a summary of the AI-powered search use cases, go to [](/solutions/search/ai-search/ai-search.md). - - diff --git a/solutions/search/vector/bring-own-vectors.md b/solutions/search/vector/bring-own-vectors.md index b6302884e5..cde58a027d 100644 --- a/solutions/search/vector/bring-own-vectors.md +++ b/solutions/search/vector/bring-own-vectors.md @@ -7,36 +7,53 @@ applies_to: serverless: products: - id: elasticsearch +description: An introduction to vectors and knn search in Elasticsearch. --- # Bring your own dense vectors [bring-your-own-vectors] +{{es}} enables you store and search mathematical representations of your content called _embeddings_ or _vectors_, which help machines understand and process your data more effectively. +There are two types of representation (_dense_ and _sparse_), which are suited to different types of queries and use cases (for example, finding similar images and content or storing expanded terms and weights). -This tutorial demonstrates how to index documents that already have dense vector embeddings into {{es}}. You’ll also learn the syntax for searching these documents using a `knn` query. +In this introduction to vector search, you'll store and search for dense vectors. +In particular, you'll index documents that already have dense vector embeddings into {{es}}. +You'll also learn the syntax for searching these documents using a [k-nearest neighbour](/solutions/search/vector/knn.md) (kNN) query. -You’ll find links at the end of this tutorial for more information about deploying a text embedding model in {{es}}, so you can generate embeddings for queries on the fly. +To learn more about which type of vector is appropriate for your use case, check out [](/solutions/search/vector.md). +For an overview of the differences between semantic search and vector search, go to [](/solutions/search/ai-search/ai-search.md). -::::{tip} -This is an advanced use case. Refer to [Semantic search](../semantic-search.md) for an overview of your options for semantic search with {{es}}. +## Prerequisites -:::: +- If you're using [{{es-serverless}}](/solutions/search/serverless-elasticsearch-get-started.md), create a project that is optimized for vectors. To add the sample data, you must have a `developer` or `admin` predefined role or an equivalent custom role. +- If you're using [{{ech}}](/deploy-manage/deploy/elastic-cloud/cloud-hosted.md) or [running {{es}} locally](/solutions/search/run-elasticsearch-locally.md), start {{es}} and {{kib}}. The simplest method to complete the steps in this guide is to log in with a user that has the `superuser` built-in role. + +To learn about role-based access control, check out [](/deploy-manage/users-roles/cluster-or-deployment-auth/user-roles.md). +## Create a vector database -## Step 1: Create an index with `dense_vector` mapping [bring-your-own-vectors-create-index] +When you create vectors (or _vectorize_ your data), you convert complex and nuanced content (such as text, videos, images, or audio) into multidimensional numerical representations. +They must be stored in specialized data structures designed to ensure efficient similarity search and speedy vector distance calculations. -Each document in our simple dataset will have: +In this quide, you'll use documents that already have dense vector embeddings. +To deploy a vector embedding model in {{es}} and generate vectors while ingesting and searching your data, refer to the links in [Learn more](#bring-your-own-vectors-learn-more). -* A review: stored in a `review_text` field -* An embedding of that review: stored in a `review_vector` field +::::{tip} +This is an advanced use case that uses the `dense_vector` field type. Refer to [](/solutions/search/semantic-search.md) for an overview of your options for semantic search with {{es}}. +:::: - * The `review_vector` field is defined as a [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) data type. +:::::{stepper} +::::{step} Create an index with dense vector field mappings +Each document in our simple data set will have: -::::{tip} -The `dense_vector` type automatically uses `int8_hnsw` quantization by default to reduce the memory footprint required when searching float vectors. Learn more about balancing performance and accuracy in [Dense vector quantization](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization). +* A review: stored in a `review_text` field +* An embedding of that review: stored in a `review_vector` field, which is defined as a [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) data type. -:::: +:::{tip} +The `dense_vector` type automatically uses `int8_hnsw` quantization by default to reduce the memory footprint required when searching float vectors. Learn more about balancing performance and accuracy in [Dense vector quantization](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization). +::: +The following API request defines the `review_text` and `review_vector` fields: ```console PUT /amazon-reviews @@ -57,18 +74,17 @@ PUT /amazon-reviews } ``` -1. The `dims` parameter must match the length of the embedding vector. Here we’re using a simple 8-dimensional embedding for readability. If not specified, `dims` will be dynamically calculated based on the first indexed document. +1. The `dims` parameter must match the length of the embedding vector. If not specified, `dims` will be dynamically calculated based on the first indexed document. 2. The `index` parameter is set to `true` to enable the use of the `knn` query. 3. The `similarity` parameter defines the similarity function used to compare the query vector to the document vectors. `cosine` is the default similarity function for `dense_vector` fields in {{es}}. +Here we're using an 8-dimensional embedding for readability. +The vectors that neural network models work with can have several hundreds or even thousands of dimensions and simply represent a point in a multi-dimensional space. +Each vector dimension represents a feature, or a characteristic, of the unstructured data. +:::: +::::{step} Add documents with embeddings - -## Step 2: Index documents with embeddings [bring-your-own-vectors-index-documents] - - -### Index a single document [_index_a_single_document] - -First, index a single document to understand the document structure. +First, index a single document to understand the document structure: ```console PUT /amazon-reviews/_doc/1 @@ -80,13 +96,9 @@ PUT /amazon-reviews/_doc/1 1. The size of the `review_vector` array is 8, matching the `dims` count specified in the mapping. +In a production scenario, you'll want to index many documents at once using the [`_bulk` endpoint]({{es-apis}}operation/operation-bulk). - -### Bulk index multiple documents [_bulk_index_multiple_documents] - -In a production scenario, you’ll want to index many documents at once using the [`_bulk` endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk). - -Here’s an example of indexing multiple documents in a single `_bulk` request. +Here's an example of indexing multiple documents in a single `_bulk` request: ```console POST /_bulk @@ -100,10 +112,14 @@ POST /_bulk { "review_text": "This product has ruined my life and the lives of my family and friends.", "review_vector": [0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1] } ``` +:::: +::::: -## Step 3: Search documents with embeddings [bring-your-own-vectors-search-documents] +## Test vector search [bring-your-own-vectors-search-documents] -Now you can query these document vectors using a [`knn` retriever](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#operation-search-body-application-json-retriever). `knn` is a type of vector search, which finds the `k` most similar documents to a query vector. Here we’re simply using a raw vector for the query text, for demonstration purposes. +Now you can query these document vectors using a [`knn` retriever]({{es-apis}}operation/operation-search#operation-search-body-application-json-retriever). +`knn` is a type of vector search, which finds the `k` most similar documents to a query vector. +Here we're simply using a raw vector for the query text, for demonstration purposes: ```console POST /amazon-reviews/_search @@ -119,23 +135,29 @@ POST /amazon-reviews/_search } ``` -1. In this simple example, we’re sending a raw vector as the query text. In a real-world scenario, you’ll need to generate vectors for queries using an embedding model. +1. In this simple example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model. 2. The `k` parameter specifies the number of results to return. 3. The `num_candidates` parameter is optional. It limits the number of candidates returned by the search node. This can improve performance and reduce costs. +When you finish your tests and no longer need the sample data set, delete the index: +```console +DELETE /amazon-reviews +``` ## Learn more [bring-your-own-vectors-learn-more] -In this simple example, we’re sending a raw vector for the query text. In a real-world scenario you won’t know the query text ahead of time. You’ll need to generate query vectors, on the fly, using the same embedding model that generated the document vectors. - -For this you’ll need to deploy a text embedding model in {{es}} and use the [`query_vector_builder` parameter](elasticsearch://reference/query-languages/query-dsl/query-dsl-knn-query.md#knn-query-top-level-parameters). Alternatively, you can generate vectors client-side and send them directly with the search request. +If you want to try a similar set of steps from an {{es}} client, check out the guided index workflow: -Learn how to [use a deployed text embedding model](dense-versus-sparse-ingest-pipelines.md) for semantic search. +- If you're using Elasticsearch Serverless, go to **{{es}} > Home**, select the vector search workflow, and **Create a vector optimized index**. +- If you're using {{ech}} or a self-managed cluster, go to **Elasticsearch > Home** and click **Create API index**. Select the vector search workflow. -::::{tip} -If you’re just getting started with vector search in {{es}}, refer to [Semantic search](../semantic-search.md). - -:::: +In these simple examples, we're sending a raw vector for the query text. +In a real-world scenario you won't know the query text ahead of time. +You'll need to generate query vectors, on the fly, using the same embedding model that generated the document vectors. +For this you'll need to deploy a text embedding model in {{es}} and use the [`query_vector_builder` parameter](elasticsearch://reference/query-languages/query-dsl/query-dsl-knn-query.md#knn-query-top-level-parameters). +Alternatively, you can generate vectors client-side and send them directly with the search request. +For an example of using pipelines to generate text embeddings, check out [](/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md). +To learn about more search options, such as semantic, full-text, and hybrid, go to [](/solutions/search/search-approaches.md). diff --git a/solutions/toc.yml b/solutions/toc.yml index 3a9ee3bfb0..918381a427 100644 --- a/solutions/toc.yml +++ b/solutions/toc.yml @@ -11,7 +11,6 @@ toc: - file: search/get-started/quickstarts.md children: - file: search/get-started/semantic-search.md - - file: search/get-started/vector-search.md - file: search/api-quickstarts.md children: - file: search/elasticsearch-basics-quickstart.md From 1acd5c8fdffd568eab8af3247b9801f0b6e7a504 Mon Sep 17 00:00:00 2001 From: lcawl Date: Mon, 14 Jul 2025 17:54:47 -0700 Subject: [PATCH 5/9] Fix quickstart link --- solutions/search/get-started/quickstarts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/search/get-started/quickstarts.md b/solutions/search/get-started/quickstarts.md index 75695be286..364d1d6ed8 100644 --- a/solutions/search/get-started/quickstarts.md +++ b/solutions/search/get-started/quickstarts.md @@ -16,7 +16,7 @@ Each quickstart provides: Follow the steps in these guides to get started quickly: - [](/solutions/search/get-started/semantic-search.md) -- [](/solutions/search/vector/bring-your-own-vectors.md) +- [](/solutions/search/vector/bring-own-vectors.md) For more advanced API examples, check out [](/solutions/search/api-quickstarts.md). From 13817c1e8e7211dffa921e23908a65d21f71318a Mon Sep 17 00:00:00 2001 From: lcawl Date: Mon, 14 Jul 2025 19:49:45 -0700 Subject: [PATCH 6/9] Minor edits --- solutions/search/vector/bring-own-vectors.md | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/solutions/search/vector/bring-own-vectors.md b/solutions/search/vector/bring-own-vectors.md index cde58a027d..e32c83e1bc 100644 --- a/solutions/search/vector/bring-own-vectors.md +++ b/solutions/search/vector/bring-own-vectors.md @@ -15,17 +15,13 @@ description: An introduction to vectors and knn search in Elasticsearch. {{es}} enables you store and search mathematical representations of your content called _embeddings_ or _vectors_, which help machines understand and process your data more effectively. There are two types of representation (_dense_ and _sparse_), which are suited to different types of queries and use cases (for example, finding similar images and content or storing expanded terms and weights). -In this introduction to vector search, you'll store and search for dense vectors. -In particular, you'll index documents that already have dense vector embeddings into {{es}}. +In this introduction to [vector search](/solutions/search/vector.md), you'll store and search for dense vectors. You'll also learn the syntax for searching these documents using a [k-nearest neighbour](/solutions/search/vector/knn.md) (kNN) query. -To learn more about which type of vector is appropriate for your use case, check out [](/solutions/search/vector.md). -For an overview of the differences between semantic search and vector search, go to [](/solutions/search/ai-search/ai-search.md). - ## Prerequisites -- If you're using [{{es-serverless}}](/solutions/search/serverless-elasticsearch-get-started.md), create a project that is optimized for vectors. To add the sample data, you must have a `developer` or `admin` predefined role or an equivalent custom role. -- If you're using [{{ech}}](/deploy-manage/deploy/elastic-cloud/cloud-hosted.md) or [running {{es}} locally](/solutions/search/run-elasticsearch-locally.md), start {{es}} and {{kib}}. The simplest method to complete the steps in this guide is to log in with a user that has the `superuser` built-in role. +- If you're using {{es-serverless}}, create a project that is optimized for vectors. To add the sample data, you must have a `developer` or `admin` predefined role or an equivalent custom role. +- If you're using {{ech}} or a self-managed cluster, start {{es}} and {{kib}}. The simplest method to complete the steps in this guide is to log in with a user that has the `superuser` built-in role. To learn about role-based access control, check out [](/deploy-manage/users-roles/cluster-or-deployment-auth/user-roles.md). @@ -39,6 +35,7 @@ To deploy a vector embedding model in {{es}} and generate vectors while ingestin ::::{tip} This is an advanced use case that uses the `dense_vector` field type. Refer to [](/solutions/search/semantic-search.md) for an overview of your options for semantic search with {{es}}. +To learn about the differences between semantic search and vector search, go to [](/solutions/search/ai-search/ai-search.md). :::: :::::{stepper} @@ -79,8 +76,8 @@ PUT /amazon-reviews 3. The `similarity` parameter defines the similarity function used to compare the query vector to the document vectors. `cosine` is the default similarity function for `dense_vector` fields in {{es}}. Here we're using an 8-dimensional embedding for readability. -The vectors that neural network models work with can have several hundreds or even thousands of dimensions and simply represent a point in a multi-dimensional space. -Each vector dimension represents a feature, or a characteristic, of the unstructured data. +The vectors that neural network models work with can have several hundreds or even thousands of dimensions that represent a point in a multi-dimensional space. +Each vector dimension represents a _feature_ or a characteristic of the unstructured data. :::: ::::{step} Add documents with embeddings @@ -97,7 +94,6 @@ PUT /amazon-reviews/_doc/1 1. The size of the `review_vector` array is 8, matching the `dims` count specified in the mapping. In a production scenario, you'll want to index many documents at once using the [`_bulk` endpoint]({{es-apis}}operation/operation-bulk). - Here's an example of indexing multiple documents in a single `_bulk` request: ```console @@ -119,7 +115,7 @@ POST /_bulk Now you can query these document vectors using a [`knn` retriever]({{es-apis}}operation/operation-search#operation-search-body-application-json-retriever). `knn` is a type of vector search, which finds the `k` most similar documents to a query vector. -Here we're simply using a raw vector for the query text, for demonstration purposes: +Here we're using a raw vector for the query text for demonstration purposes: ```console POST /amazon-reviews/_search @@ -135,7 +131,7 @@ POST /amazon-reviews/_search } ``` -1. In this simple example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model. +1. A raw vector serves as the query text in this example. In a real-world scenario, you'll need to generate vectors for queries using an embedding model. 2. The `k` parameter specifies the number of results to return. 3. The `num_candidates` parameter is optional. It limits the number of candidates returned by the search node. This can improve performance and reduce costs. From 833d347e2bdfd5ea6ae9e0165364a7968f5dfd76 Mon Sep 17 00:00:00 2001 From: Lisa Cawley Date: Tue, 15 Jul 2025 12:59:49 -0700 Subject: [PATCH 7/9] Update solutions/search/vector/bring-own-vectors.md Co-authored-by: Liam Thompson --- solutions/search/vector/bring-own-vectors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/search/vector/bring-own-vectors.md b/solutions/search/vector/bring-own-vectors.md index e32c83e1bc..4c23b43d7b 100644 --- a/solutions/search/vector/bring-own-vectors.md +++ b/solutions/search/vector/bring-own-vectors.md @@ -113,7 +113,7 @@ POST /_bulk ## Test vector search [bring-your-own-vectors-search-documents] -Now you can query these document vectors using a [`knn` retriever]({{es-apis}}operation/operation-search#operation-search-body-application-json-retriever). +Now you can query these document vectors using a [`knn` retriever](elasticsearch://reference/elasticsearch/rest-apis/retrievers#knn-retriever). `knn` is a type of vector search, which finds the `k` most similar documents to a query vector. Here we're using a raw vector for the query text for demonstration purposes: From 040a3c89df438c9cefbffb2cd5ae67111e04b33b Mon Sep 17 00:00:00 2001 From: Lisa Cawley Date: Wed, 16 Jul 2025 02:46:29 -0700 Subject: [PATCH 8/9] Update solutions/search/vector/bring-own-vectors.md Co-authored-by: Liam Thompson --- solutions/search/vector/bring-own-vectors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/search/vector/bring-own-vectors.md b/solutions/search/vector/bring-own-vectors.md index 4c23b43d7b..05c9fa6b18 100644 --- a/solutions/search/vector/bring-own-vectors.md +++ b/solutions/search/vector/bring-own-vectors.md @@ -113,7 +113,7 @@ POST /_bulk ## Test vector search [bring-your-own-vectors-search-documents] -Now you can query these document vectors using a [`knn` retriever](elasticsearch://reference/elasticsearch/rest-apis/retrievers#knn-retriever). +Now you can query these document vectors using a [`knn` retriever](elasticsearch://reference/elasticsearch/rest-apis/retrievers.md#knn-retriever). `knn` is a type of vector search, which finds the `k` most similar documents to a query vector. Here we're using a raw vector for the query text for demonstration purposes: From 40556b51e5655a1477a7b32a0546bb7b4ea56077 Mon Sep 17 00:00:00 2001 From: lcawl Date: Wed, 16 Jul 2025 02:56:14 -0700 Subject: [PATCH 9/9] Add next steps --- solutions/search/vector/bring-own-vectors.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/solutions/search/vector/bring-own-vectors.md b/solutions/search/vector/bring-own-vectors.md index 05c9fa6b18..e648e08638 100644 --- a/solutions/search/vector/bring-own-vectors.md +++ b/solutions/search/vector/bring-own-vectors.md @@ -135,6 +135,13 @@ POST /amazon-reviews/_search 2. The `k` parameter specifies the number of results to return. 3. The `num_candidates` parameter is optional. It limits the number of candidates returned by the search node. This can improve performance and reduce costs. +## Next steps + +If you want to try a similar set of steps from an {{es}} client, check out the guided index workflow: + +- If you're using Elasticsearch Serverless, go to **{{es}} > Home**, select the vector search workflow, and **Create a vector optimized index**. +- If you're using {{ech}} or a self-managed cluster, go to **Elasticsearch > Home** and click **Create API index**. Select the vector search workflow. + When you finish your tests and no longer need the sample data set, delete the index: ```console @@ -143,11 +150,6 @@ DELETE /amazon-reviews ## Learn more [bring-your-own-vectors-learn-more] -If you want to try a similar set of steps from an {{es}} client, check out the guided index workflow: - -- If you're using Elasticsearch Serverless, go to **{{es}} > Home**, select the vector search workflow, and **Create a vector optimized index**. -- If you're using {{ech}} or a self-managed cluster, go to **Elasticsearch > Home** and click **Create API index**. Select the vector search workflow. - In these simple examples, we're sending a raw vector for the query text. In a real-world scenario you won't know the query text ahead of time. You'll need to generate query vectors, on the fly, using the same embedding model that generated the document vectors.