From e9b391e743dff3aff4abc73ffaacc1b724229db0 Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Thu, 28 Nov 2024 17:54:06 +0100
Subject: [PATCH 01/12] [WIP] Elastic Rerank model landing page

---
 docs/en/stack/ml/nlp/index.asciidoc           |   1 +
 .../ml/nlp/ml-nlp-elastic-rerank.asciidoc     | 236 ++++++++++++++++++
 2 files changed, 237 insertions(+)
 create mode 100644 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc

diff --git a/docs/en/stack/ml/nlp/index.asciidoc b/docs/en/stack/ml/nlp/index.asciidoc
index ef78ae52e..206f72919 100644
--- a/docs/en/stack/ml/nlp/index.asciidoc
+++ b/docs/en/stack/ml/nlp/index.asciidoc
@@ -9,6 +9,7 @@ include::ml-nlp-inference.asciidoc[leveloffset=+1]
 include::ml-nlp-apis.asciidoc[leveloffset=+1]
 include::ml-nlp-built-in-models.asciidoc[leveloffset=+1]
 include::ml-nlp-elser.asciidoc[leveloffset=+2]
+include::ml-nlp-elastic-rerank.asciidoc[leveloffset=+2]
 include::ml-nlp-e5.asciidoc[leveloffset=+2]
 include::ml-nlp-lang-ident.asciidoc[leveloffset=+2]
 include::ml-nlp-model-ref.asciidoc[leveloffset=+1]
diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
new file mode 100644
index 000000000..bfa44b99a
--- /dev/null
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -0,0 +1,236 @@
+[[ml-nlp-rerank]]
+= Elastic Rerank
+
+Elastic Rerank is a state-of-the-art cross-encoder reranking model trained by Elastic that helps you improve search relevance with a few simple API calls.
+Elastic Rerank is Elastic's first semantic reranking model and is available out-of-the-box in supporting Elastic deployments.
+
+This model is recommended for English language documents and queries.
+
+Use Elastic Rerank to improve existing search applications including:
+
+* Traditional BM25 scoring
+* Hybrid semantic search
+* Retrieval Augmented Generation (RAG)
+
+The model can significantly improve search result quality by reordering results based on deeper semantic understanding of queries and documents.
+
+When reranking BM25 results, it provides an average 40% improvement in ranking quality on a diverse benchmark of retrieval tasks— matching the performance of models 11x its size.
+
+[discrete]
+[[ml-nlp-rerank-availability]]
+== Availability and requirements 
+
+IMPORTANT: This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
+
+[discrete]
+[[ml-nlp-rerank-availability-serverless-]]
+=== Elastic Cloud Serverless
+
+Elastic Rerank is available in Elasticsearch Serverless projects as of November 25, 2024.
+
+[discrete]
+[[ml-nlp-rerank-availability-elastic-stack]]
+=== Elastic stack (Cloud hosted and self-managed deployments)
+
+Elastic Rerank will be available in Elastic Stack version 8.17+:
+
+* To use Elastic Rerank, you must have the appropriate subscription level or the trial period activated.
+* Requires ML nodes 
+** Subject to ML trial node limitations
+
+NOTE: BIG WIP TBD about trial deployments
+//TODO
+
+[discrete]
+[[ml-nlp-rerank-deploy]]
+== Download and deploy
+
+To download and deploy Elastic Rerank, use the https://www.elastic.co/guide/en/elasticsearch/reference/current/infer-service-elasticsearch.html[inference service API] to create an Elasticsearch service rerank endpoint.
+
+[discrete]
+[[ml-nlp-rerank-deploy-steps]]
+=== Create an inference endpoint
+
+. In {kib}, navigate to the *Dev Console*.
+
+. Create an {infer} endpoint with the Elastic Rerank service by running:
++
+--
+[source,console]
+----------------------------------
+PUT _inference/text_similarity/my-rerank-model
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "adaptive_allocations": {
+      "enabled": true,
+      "min_number_of_allocations": 1,
+      "max_number_of_allocations": 10
+    },
+    "num_threads": 1,
+    "model_id": ".elastic-rerank"
+  }
+}
+----------------------------------
+--
+
+NOTE: The API request automatically downloads and deploys the model. This example uses <<ml-nlp-auto-scale,autoscaling>> through adaptive allocation.
+
+After creating the Elastic Rerank {infer} endpoint, it's ready to use with a {ref}/retriever.html#text-similarity-reranker-retriever-example-elastic-rerank[`text_similarity_reranker`] retriever.
+
+// Is air-gapped deployment supported?
+[discrete]
+[[ml-nlp-rerank-deploy-airgapped]]
+=== Air-gapped deployment
+
+[discrete]
+[[ml-nlp-rerank-model-specs]]
+== Model specifications
+
+* Purpose-built for English language content
+
+* Relatively small: 184M parameters (86M backbone + 98M embedding layer)
+
+* Matches performance of billion-parameter reranking models
+
+* Built directly into Elasticsearch - no external services or dependencies needed
+
+[discrete]
+[[ml-nlp-rerank-arch-overview]]
+== Model architecture
+
+Elastic Rerank is built on the https://arxiv.org/abs/2111.09543[DeBERTa v3] language model architecture.
+
+The model employs several key architectural features that make it particularly effective for reranking:
+
+* *Disentangled attention mechanism* enables the model to:
+** Process word content and position separately
+** Learn more nuanced relationships between query and document text
+** Better understand the semantic importance of word positions and relationships
+
+* *ELECTRA-style pre-training* uses:
+** A GAN-like approach to token prediction
+** Simultaneous training of token generation and detection
+** Enhanced parameter efficiency compared to traditional masked language modeling
+
+[discrete]
+[[ml-nlp-rerank-arch-training]]
+== Training process
+
+Here is an overview of the Elastic Rerank model training process:
+
+* *Initial relevance extraction*
+** Fine-tunes the pre-trained DeBERTa [CLS] token representation
+** Uses a GeLU activation and dropout layer
+** Preserves important pre-trained knowledge while adapting to the reranking task
+
+* *Trained by distillation*
+** Uses an ensemble of bi-encoder and cross-encoder models as a teacher
+** Bi-encoder provides nuanced negative example assessment
+** Cross-encoder helps differentiate between positive and negative examples
+** Combines strengths of both model types
+
+[discrete]
+[[ml-nlp-rerank-arch-data]]
+=== Training data
+
+The training data consists of:
+
+* Open domain Question-Answering datasets
+* Natural document pairs (like article headings and summaries)
+* 180,000 synthetic query-passage pairs with varying relevance
+* Total of approximately 3 million queries
+
+The data preparation process includes:
+
+* Basic cleaning and fuzzy deduplication
+* Multi-stage prompting for diverse topics (on the synthetic portion of the training data only)
+* Varied query types:
+** Keyword search
+** Exact phrase matching
+** Short and long natural language questions
+
+[discrete]
+[[ml-nlp-rerank-arch-sampling]]
+=== Negative sampling
+
+The model uses an advanced sampling strategy to ensure high-quality rankings:
+
+* Samples from top 128 documents per query using multiple retrieval methods
+* Uses five negative samples per query - more than typical approaches
+* Applies probability distribution shaped by document scores for sampling
+
+* Deep sampling benefits:
+** Improves model robustness across different retrieval depths
+** Enhances score calibration
+** Provides better handling of document diversity
+
+[discrete]
+[[ml-nlp-rerank-arch-optimization]]
+=== Training optimization
+
+The training process incorporates several key optimizations:
+
+Uses cross-entropy loss function to:
+
+* Model relevance as probability distribution
+* Learn relationships between all document scores
+* Fit scores through maximum likelihood estimation
+
+Implemented parameter averaging along optimization trajectory:
+
+* Eliminates need for traditional learning rate scheduling and provides improvement in the final model quality
+
+[[ml-nlp-rerank-input-prep]]
+== Input preparation
+// Do we need guidance on preparing texts for reranking?
+
+[[ml-nlp-rerank-testing]]
+== Testing Elastic Rerank
+// How do we test the model? What tools/UI are available?
+
+[discrete]
+[[ml-nlp-rerank-performance]]
+== Performance
+
+Elastic Rerank shows significant improvements in search quality across a wide range of retrieval tasks.
+
+[discrete]
+[[ml-nlp-rerank-performance-overview]]
+=== Overview
+
+* Average 40% improvement in ranking quality when reranking BM25 results
+* 184M parameter model matches performance of 2B parameter alternatives
+* Evaluated across 21 different datasets using the BEIR benchmark suite
+
+[discrete]
+[[ml-nlp-rerank-performance-benchmarks]]
+=== Key benchmark results
+
+* Natural Questions: 90% improvement
+* MS MARCO: 85% improvement
+* Climate-FEVER: 80% improvement
+* FiQA-2018: 76% improvement
+
+For detailed benchmark information, including complete dataset results and methodology, refer to the https://www.elastic.co/search-labs/introducing-elastic-rerank[Introducing Elastic Rerank blog].
+
+[discrete]
+[[ml-nlp-rerank-perf-considerations]]
+=== Performance considerations
+// What hardware-specific performance characteristics should users know about?
+
+[discrete]
+[[ml-nlp-rerank-benchmarks-hw]]
+=== Hardware benchmarks
+// Are there hardware-specific benchmark numbers we should include?
+
+[discrete]
+[[ml-nlp-rerank-limitations]]
+== Limitations
+// What are current known limitations beyond tech preview status?
+
+[discrete]
+[[ml-nlp-rerank-resources]]
+== Further resources
+// What additional resources should we link to?
+

From 4f2ac29ab79224c1a24ba79379c6c573912ced87 Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Thu, 28 Nov 2024 17:59:32 +0100
Subject: [PATCH 02/12] Formatting

---
 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index bfa44b99a..6151e2a98 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -55,7 +55,6 @@ To download and deploy Elastic Rerank, use the https://www.elastic.co/guide/en/e
 
 . Create an {infer} endpoint with the Elastic Rerank service by running:
 +
---
 [source,console]
 ----------------------------------
 PUT _inference/text_similarity/my-rerank-model
@@ -72,8 +71,7 @@ PUT _inference/text_similarity/my-rerank-model
   }
 }
 ----------------------------------
---
-
++
 NOTE: The API request automatically downloads and deploys the model. This example uses <<ml-nlp-auto-scale,autoscaling>> through adaptive allocation.
 
 After creating the Elastic Rerank {infer} endpoint, it's ready to use with a {ref}/retriever.html#text-similarity-reranker-retriever-example-elastic-rerank[`text_similarity_reranker`] retriever.

From 7cd04c80f722e9030e3ff204d71f7ac25901a02b Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Tue, 10 Dec 2024 16:06:38 +0100
Subject: [PATCH 03/12] Updates

---
 .../ml/nlp/ml-nlp-elastic-rerank.asciidoc     | 82 ++++++++++---------
 1 file changed, 45 insertions(+), 37 deletions(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index 6151e2a98..057010c34 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -2,9 +2,7 @@
 = Elastic Rerank
 
 Elastic Rerank is a state-of-the-art cross-encoder reranking model trained by Elastic that helps you improve search relevance with a few simple API calls.
-Elastic Rerank is Elastic's first semantic reranking model and is available out-of-the-box in supporting Elastic deployments.
-
-This model is recommended for English language documents and queries.
+Elastic Rerank is Elastic's first semantic reranking model and is available out-of-the-box in supporting Elastic deployments using the {es} Inference API.
 
 Use Elastic Rerank to improve existing search applications including:
 
@@ -30,22 +28,23 @@ Elastic Rerank is available in Elasticsearch Serverless projects as of November
 
 [discrete]
 [[ml-nlp-rerank-availability-elastic-stack]]
-=== Elastic stack (Cloud hosted and self-managed deployments)
+=== Elastic Cloud Hosted and self-managed deployments
 
-Elastic Rerank will be available in Elastic Stack version 8.17+:
+Elastic Rerank is available in Elastic Stack version 8.17+:
 
 * To use Elastic Rerank, you must have the appropriate subscription level or the trial period activated.
-* Requires ML nodes 
-** Subject to ML trial node limitations
-
-NOTE: BIG WIP TBD about trial deployments
-//TODO
+* A 4GB ML node
++
+[IMPORTANT]
+====
+Deploying the Elastic Rerank model in combination with ELSER (or other hosted models) requires at minimum an 8GB ML node. Please note that the current maximum size for trial ML nodes is 4GB (defaults to 1GB).
+====
 
 [discrete]
 [[ml-nlp-rerank-deploy]]
 == Download and deploy
 
-To download and deploy Elastic Rerank, use the https://www.elastic.co/guide/en/elasticsearch/reference/current/infer-service-elasticsearch.html[inference service API] to create an Elasticsearch service rerank endpoint.
+To download and deploy Elastic Rerank, use the {ref}/infer-service-elasticsearch.html[create inference API] to create an Elasticsearch service `rerank` endpoint.
 
 [discrete]
 [[ml-nlp-rerank-deploy-steps]]
@@ -76,10 +75,27 @@ NOTE: The API request automatically downloads and deploys the model. This exampl
 
 After creating the Elastic Rerank {infer} endpoint, it's ready to use with a {ref}/retriever.html#text-similarity-reranker-retriever-example-elastic-rerank[`text_similarity_reranker`] retriever.
 
-// Is air-gapped deployment supported?
 [discrete]
-[[ml-nlp-rerank-deploy-airgapped]]
-=== Air-gapped deployment
+[[ml-nlp-rerank-limitations]]
+== Limitations
+
+* English language only
+* Maximum context window of 512 tokens
++
+When using the {ref}/semantic-text.html[`semantic_text` field type], text is divided into chunks. By default, each chunk contains 250 words (approximately 400 tokens). Be cautious when increasing the chunk size - if the combined length of your query and chunk text exceeds 512 tokens, the model won't have access to the full content.
++
+When the combined inputs exceed the 512 token limit, a "balanced" truncation strategy is used. If both the query and input text are longer than 255 tokens each then both are truncated, otherwise the longest is truncated.
+
+[discrete]
+[[ml-nlp-rerank-perf-considerations]]
+== Performance considerations
+
+It's important to note that if you rerank to depth `n` then you will need to run `n` inferences per query. This will include the document text and will therefore be significantly more expensive than inference for query embeddings. Hardware can be scaled to run these inferences in parallel, but we would recommend shallow reranking for CPU inference: no more than top-30 results. You may find that the preview version is cost prohibitive for high query rates and low query latency requirements. We plan to address performance issues for GA.
+
+// // Is air-gapped deployment supported?
+// [discrete]
+// [[ml-nlp-rerank-deploy-airgapped]]
+// === Air-gapped deployment
 
 [discrete]
 [[ml-nlp-rerank-model-specs]]
@@ -179,14 +195,6 @@ Implemented parameter averaging along optimization trajectory:
 
 * Eliminates need for traditional learning rate scheduling and provides improvement in the final model quality
 
-[[ml-nlp-rerank-input-prep]]
-== Input preparation
-// Do we need guidance on preparing texts for reranking?
-
-[[ml-nlp-rerank-testing]]
-== Testing Elastic Rerank
-// How do we test the model? What tools/UI are available?
-
 [discrete]
 [[ml-nlp-rerank-performance]]
 == Performance
@@ -212,23 +220,23 @@ Elastic Rerank shows significant improvements in search quality across a wide ra
 
 For detailed benchmark information, including complete dataset results and methodology, refer to the https://www.elastic.co/search-labs/introducing-elastic-rerank[Introducing Elastic Rerank blog].
 
-[discrete]
-[[ml-nlp-rerank-perf-considerations]]
-=== Performance considerations
-// What hardware-specific performance characteristics should users know about?
-
-[discrete]
-[[ml-nlp-rerank-benchmarks-hw]]
-=== Hardware benchmarks
-// Are there hardware-specific benchmark numbers we should include?
-
-[discrete]
-[[ml-nlp-rerank-limitations]]
-== Limitations
-// What are current known limitations beyond tech preview status?
+// [discrete]
+// [[ml-nlp-rerank-benchmarks-hw]]
+// === Hardware benchmarks
+// Note: these are more for GA timeframe
 
 [discrete]
 [[ml-nlp-rerank-resources]]
 == Further resources
-// What additional resources should we link to?
+
+*Documentation*:
+
+* https://www.elastic.co/guide/en/elasticsearch/reference/8.17/semantic-reranking.html#semantic-reranking-in-es
+* https://www.elastic.co/guide/en/elasticsearch/reference/8.17/infer-service-elasticsearch.html#inference-example-elastic-reranker
+
+*Blogs*:
+
+* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-1
+* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-2
+* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-3
 

From e83ef2fb1a2f2a17f84009701db3fb3566b25bf4 Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Tue, 10 Dec 2024 16:10:37 +0100
Subject: [PATCH 04/12] Tidy up links

---
 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index 057010c34..b0c1f2e95 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -231,12 +231,12 @@ For detailed benchmark information, including complete dataset results and metho
 
 *Documentation*:
 
-* https://www.elastic.co/guide/en/elasticsearch/reference/8.17/semantic-reranking.html#semantic-reranking-in-es
-* https://www.elastic.co/guide/en/elasticsearch/reference/8.17/infer-service-elasticsearch.html#inference-example-elastic-reranker
+* {ref}/semantic-reranking.html#semantic-reranking-in-es[Semantic re-ranking in {es} overview]
+* {ref}/infer-service-elasticsearch.html#inference-example-elastic-reranker[Inference API example]
 
 *Blogs*:
 
-* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-1
-* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-2
-* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-3
+* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-1[Part 1]
+* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-2[Part 2]
+* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-3[Part 3]
 

From f7e52e17388d55f8cb0ab01e599be9b676d31c57 Mon Sep 17 00:00:00 2001
From: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Date: Wed, 11 Dec 2024 09:44:54 +0100
Subject: [PATCH 05/12] Apply suggestions from code review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
---
 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index b0c1f2e95..dbf6b42f5 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -2,7 +2,7 @@
 = Elastic Rerank
 
 Elastic Rerank is a state-of-the-art cross-encoder reranking model trained by Elastic that helps you improve search relevance with a few simple API calls.
-Elastic Rerank is Elastic's first semantic reranking model and is available out-of-the-box in supporting Elastic deployments using the {es} Inference API.
+Elastic Rerank is Elastic's first semantic reranking model and is available out-of-the-box in supporting Elastic deployments using the {es} {infer-cap} API.
 
 Use Elastic Rerank to improve existing search applications including:
 
@@ -18,10 +18,10 @@ When reranking BM25 results, it provides an average 40% improvement in ranking q
 [[ml-nlp-rerank-availability]]
 == Availability and requirements 
 
-IMPORTANT: This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
+experimental[]
 
 [discrete]
-[[ml-nlp-rerank-availability-serverless-]]
+[[ml-nlp-rerank-availability-serverless]]
 === Elastic Cloud Serverless
 
 Elastic Rerank is available in Elasticsearch Serverless projects as of November 25, 2024.
@@ -37,7 +37,7 @@ Elastic Rerank is available in Elastic Stack version 8.17+:
 +
 [IMPORTANT]
 ====
-Deploying the Elastic Rerank model in combination with ELSER (or other hosted models) requires at minimum an 8GB ML node. Please note that the current maximum size for trial ML nodes is 4GB (defaults to 1GB).
+Deploying the Elastic Rerank model in combination with ELSER (or other hosted models) requires at minimum an 8GB ML node. The maximum size for trial ML nodes is 4GB (defaults to 1GB).
 ====
 
 [discrete]
@@ -84,7 +84,7 @@ After creating the Elastic Rerank {infer} endpoint, it's ready to use with a {re
 +
 When using the {ref}/semantic-text.html[`semantic_text` field type], text is divided into chunks. By default, each chunk contains 250 words (approximately 400 tokens). Be cautious when increasing the chunk size - if the combined length of your query and chunk text exceeds 512 tokens, the model won't have access to the full content.
 +
-When the combined inputs exceed the 512 token limit, a "balanced" truncation strategy is used. If both the query and input text are longer than 255 tokens each then both are truncated, otherwise the longest is truncated.
+When the combined inputs exceed the 512 token limit, a balanced truncation strategy is used. If both the query and input text are longer than 255 tokens each then both are truncated, otherwise the longest is truncated.
 
 [discrete]
 [[ml-nlp-rerank-perf-considerations]]
@@ -107,7 +107,7 @@ It's important to note that if you rerank to depth `n` then you will need to run
 
 * Matches performance of billion-parameter reranking models
 
-* Built directly into Elasticsearch - no external services or dependencies needed
+* Built directly into {es} - no external services or dependencies needed
 
 [discrete]
 [[ml-nlp-rerank-arch-overview]]

From aeb429e4b59b7cbfd59f4f10ceaeaecc52c13856 Mon Sep 17 00:00:00 2001
From: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Date: Wed, 11 Dec 2024 12:04:55 +0100
Subject: [PATCH 06/12] Fix command typos

Co-authored-by: David Kyle <david.kyle@elastic.co>
---
 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index dbf6b42f5..9f1f35920 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -56,7 +56,7 @@ To download and deploy Elastic Rerank, use the {ref}/infer-service-elasticsearch
 +
 [source,console]
 ----------------------------------
-PUT _inference/text_similarity/my-rerank-model
+PUT _inference/rerank/my-rerank-model
 {
   "service": "elasticsearch",
   "service_settings": {
@@ -66,7 +66,7 @@ PUT _inference/text_similarity/my-rerank-model
       "max_number_of_allocations": 10
     },
     "num_threads": 1,
-    "model_id": ".elastic-rerank"
+    "model_id": ".rerank-v1"
   }
 }
 ----------------------------------

From c6bc61a17f4a320a71d36fb7760e156ea69f683c Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Wed, 11 Dec 2024 12:38:14 +0100
Subject: [PATCH 07/12] Add air-gapped info, timeout note

---
 .../ml/nlp/ml-nlp-elastic-rerank.asciidoc     | 131 +++++++++++++++++-
 1 file changed, 125 insertions(+), 6 deletions(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index 9f1f35920..3c5fa60b9 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -2,7 +2,7 @@
 = Elastic Rerank
 
 Elastic Rerank is a state-of-the-art cross-encoder reranking model trained by Elastic that helps you improve search relevance with a few simple API calls.
-Elastic Rerank is Elastic's first semantic reranking model and is available out-of-the-box in supporting Elastic deployments using the {es} {infer-cap} API.
+Elastic Rerank is Elastic's first semantic reranking model and is available out-of-the-box in supporting Elastic deployments using the {es} Inference API.
 
 Use Elastic Rerank to improve existing search applications including:
 
@@ -18,7 +18,7 @@ When reranking BM25 results, it provides an average 40% improvement in ranking q
 [[ml-nlp-rerank-availability]]
 == Availability and requirements 
 
-experimental[]
+experimental[] 
 
 [discrete]
 [[ml-nlp-rerank-availability-serverless]]
@@ -37,7 +37,7 @@ Elastic Rerank is available in Elastic Stack version 8.17+:
 +
 [IMPORTANT]
 ====
-Deploying the Elastic Rerank model in combination with ELSER (or other hosted models) requires at minimum an 8GB ML node. The maximum size for trial ML nodes is 4GB (defaults to 1GB).
+Deploying the Elastic Rerank model in combination with ELSER (or other hosted models) requires at minimum an 8GB ML node. Please note that the current maximum size for trial ML nodes is 4GB (defaults to 1GB). 
 ====
 
 [discrete]
@@ -73,8 +73,128 @@ PUT _inference/rerank/my-rerank-model
 +
 NOTE: The API request automatically downloads and deploys the model. This example uses <<ml-nlp-auto-scale,autoscaling>> through adaptive allocation.
 
+[NOTE]
+====
+You might see a 502 bad gateway error in the response when using the {kib} Console.
+This error usually just reflects a timeout, while the model downloads in the background.
+You can check the download progress in the {ml-app} UI.
+If using the Python client, you can set the `timeout` parameter to a higher value.
+====
+
 After creating the Elastic Rerank {infer} endpoint, it's ready to use with a {ref}/retriever.html#text-similarity-reranker-retriever-example-elastic-rerank[`text_similarity_reranker`] retriever.
 
+[discrete]
+[[ml-nlp-rerank-deploy-verify]]
+== Deploy in an air-gapped environment
+
+If you want to deploy the Elastic Rerank model in a restricted or closed network, you have two options:
+
+* Create your own HTTP/HTTPS endpoint with the model artifacts on it
+* Put the model artifacts into a directory inside the config directory on all master-eligible nodes.
+
+[discrete]
+[[ml-nlp-rerank-model-artifacts]]
+=== Model artifact files
+
+For the cross-platform version, you need the following files in your system:
+```
+https://ml-models.elastic.co/rerank-v1.metadata.json
+https://ml-models.elastic.co/rerank-v1.pt
+https://ml-models.elastic.co/rerank-v1.vocab.json
+```
+
+// For the optimized version, you need the following files in your system:
+// ```
+// https://ml-models.elastic.co/rerank-v1_linux-x86_64.metadata.json
+// https://ml-models.elastic.co/rerank-v1_linux-x86_64.pt
+// https://ml-models.elastic.co/rerank-v1_linux-x86_64.vocab.json
+// ```
+
+[discrete]
+=== Using an HTTP server
+
+INFO: If you use an existing HTTP server, note that the model downloader only 
+supports passwordless HTTP servers.
+
+You can use any HTTP service to deploy ELSER. This example uses the official 
+Nginx Docker image to set a new HTTP download service up.
+
+. Download the <<ml-nlp-rerank-model-artifacts,model artifact files>>.
+. Put the files into a subdirectory of your choice.
+. Run the following commands:
++
+--
+[source, shell]
+--------------------------------------------------
+export ELASTIC_ML_MODELS="/path/to/models"
+docker run --rm -d -p 8080:80 --name ml-models -v ${ELASTIC_ML_MODELS}:/usr/share/nginx/html nginx
+--------------------------------------------------
+
+Don't forget to change `/path/to/models` to the path of the subdirectory where 
+the model artifact files are located.
+
+These commands start a local Docker image with an Nginx server with the 
+subdirectory containing the model files. As the Docker image has to be 
+downloaded and built, the first start might take a longer period of time. 
+Subsequent runs start quicker.
+--
+. Verify that Nginx runs properly by visiting the following URL in your 
+browser:
++
+--
+```
+http://{IP_ADDRESS_OR_HOSTNAME}:8080/rerank-v1.metadata.json
+```
+
+If Nginx runs properly, you see the content of the metdata file of the model.
+--
+. Point your Elasticsearch deployment to the model artifacts on the HTTP server
+by adding the following line to the `config/elasticsearch.yml` file: 
++
+--
+```
+xpack.ml.model_repository: http://{IP_ADDRESS_OR_HOSTNAME}:8080
+```
+
+If you use your own HTTP or HTTPS server, change the address accordingly. It is 
+important to specificy the protocol ("http://" or "https://"). Ensure that all 
+master-eligible nodes can reach the server you specify.
+--
+. Repeat step 5 on all master-eligible nodes.
+. {ref}/restart-cluster.html#restart-cluster-rolling[Restart] the 
+master-eligible nodes one by one.
+. Create an inference endpoint to deploy the model per <<ml-nlp-rerank-deploy-steps,these steps>>.
+
+The HTTP server is only required for downloading the model. After the download 
+has finished, you can stop and delete the service. You can stop the Docker image 
+used in this example by running the following command:
+
+[source, shell]
+--------------------------------------------------
+docker stop ml-models
+--------------------------------------------------
+
+[discrete]
+=== Using file-based access
+
+For a file-based access, follow these steps:
+
+. Download the <<ml-nlp-model-artifacts,model artifact files>>. 
+. Put the files into a `models` subdirectory inside the `config` directory of 
+your {es} deployment.
+. Point your {es} deployment to the model directory by adding the 
+following line to the `config/elasticsearch.yml` file:
++
+--
+```
+xpack.ml.model_repository: file://${path.home}/config/models/`
+```
+--
+. Repeat step 2 and step 3 on all master-eligible nodes.
+. {ref}/restart-cluster.html#restart-cluster-rolling[Restart] the 
+master-eligible nodes one by one.
+. Create an inference endpoint to deploy the model per <<ml-nlp-rerank-deploy-steps,these steps>>.
+
 [discrete]
 [[ml-nlp-rerank-limitations]]
 == Limitations
@@ -107,7 +227,7 @@ It's important to note that if you rerank to depth `n` then you will need to run
 
 * Matches performance of billion-parameter reranking models
 
-* Built directly into {es} - no external services or dependencies needed
+* Built directly into Elasticsearch - no external services or dependencies needed
 
 [discrete]
 [[ml-nlp-rerank-arch-overview]]
@@ -238,5 +358,4 @@ For detailed benchmark information, including complete dataset results and metho
 
 * https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-1[Part 1]
 * https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-2[Part 2]
-* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-3[Part 3]
-
+* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-3[Part 3]
\ No newline at end of file

From e38169a52f36ef76c0674d620766a7316f4bfe5b Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Wed, 11 Dec 2024 12:39:13 +0100
Subject: [PATCH 08/12] Revert rewording

---
 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index 3c5fa60b9..191a8fcc6 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -24,7 +24,7 @@ experimental[]
 [[ml-nlp-rerank-availability-serverless]]
 === Elastic Cloud Serverless
 
-Elastic Rerank is available in Elasticsearch Serverless projects as of November 25, 2024.
+Elastic Rerank is available in {es} Serverless projects as of November 25, 2024.
 
 [discrete]
 [[ml-nlp-rerank-availability-elastic-stack]]
@@ -37,14 +37,14 @@ Elastic Rerank is available in Elastic Stack version 8.17+:
 +
 [IMPORTANT]
 ====
-Deploying the Elastic Rerank model in combination with ELSER (or other hosted models) requires at minimum an 8GB ML node. Please note that the current maximum size for trial ML nodes is 4GB (defaults to 1GB). 
+Deploying the Elastic Rerank model in combination with ELSER (or other hosted models) requires at minimum an 8GB ML node. The current maximum size for trial ML nodes is 4GB (defaults to 1GB). 
 ====
 
 [discrete]
 [[ml-nlp-rerank-deploy]]
 == Download and deploy
 
-To download and deploy Elastic Rerank, use the {ref}/infer-service-elasticsearch.html[create inference API] to create an Elasticsearch service `rerank` endpoint.
+To download and deploy Elastic Rerank, use the {ref}/infer-service-elasticsearch.html[create inference API] to create an {es} service `rerank` endpoint.
 
 [discrete]
 [[ml-nlp-rerank-deploy-steps]]
@@ -148,7 +148,7 @@ http://{IP_ADDRESS_OR_HOSTNAME}:8080/rerank-v1.metadata.json
 
 If Nginx runs properly, you see the content of the metdata file of the model.
 --
-. Point your Elasticsearch deployment to the model artifacts on the HTTP server
+. Point your {es} deployment to the model artifacts on the HTTP server
 by adding the following line to the `config/elasticsearch.yml` file: 
 +
 --
@@ -227,7 +227,7 @@ It's important to note that if you rerank to depth `n` then you will need to run
 
 * Matches performance of billion-parameter reranking models
 
-* Built directly into Elasticsearch - no external services or dependencies needed
+* Built directly into {es} - no external services or dependencies needed
 
 [discrete]
 [[ml-nlp-rerank-arch-overview]]

From 14663ac54df6720fe0f19e925527e072d125a242 Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Wed, 11 Dec 2024 12:42:59 +0100
Subject: [PATCH 09/12] typo

---
 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index 191a8fcc6..9d9e8fe39 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -116,7 +116,7 @@ https://ml-models.elastic.co/rerank-v1.vocab.json
 INFO: If you use an existing HTTP server, note that the model downloader only 
 supports passwordless HTTP servers.
 
-You can use any HTTP service to deploy ELSER. This example uses the official 
+You can use any HTTP service to deploy the model. This example uses the official 
 Nginx Docker image to set a new HTTP download service up.
 
 . Download the <<ml-nlp-rerank-model-artifacts,model artifact files>>.

From e7554bc856855bb833af25aafb73ba4286fb1260 Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Wed, 11 Dec 2024 12:43:47 +0100
Subject: [PATCH 10/12] Del comments

---
 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index 9d9e8fe39..dd9e2a8e4 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -212,11 +212,6 @@ When the combined inputs exceed the 512 token limit, a balanced truncation strat
 
 It's important to note that if you rerank to depth `n` then you will need to run `n` inferences per query. This will include the document text and will therefore be significantly more expensive than inference for query embeddings. Hardware can be scaled to run these inferences in parallel, but we would recommend shallow reranking for CPU inference: no more than top-30 results. You may find that the preview version is cost prohibitive for high query rates and low query latency requirements. We plan to address performance issues for GA.
 
-// // Is air-gapped deployment supported?
-// [discrete]
-// [[ml-nlp-rerank-deploy-airgapped]]
-// === Air-gapped deployment
-
 [discrete]
 [[ml-nlp-rerank-model-specs]]
 == Model specifications

From 5e6955b7957934719c047f1a4065111380e56970 Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Wed, 11 Dec 2024 13:56:24 +0100
Subject: [PATCH 11/12] Fix link

---
 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index dd9e2a8e4..e76f6084c 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -179,7 +179,7 @@ docker stop ml-models
 
 For a file-based access, follow these steps:
 
-. Download the <<ml-nlp-model-artifacts,model artifact files>>. 
+. Download the <<ml-nlp-rerank-model-artifacts,model artifact files>>. 
 . Put the files into a `models` subdirectory inside the `config` directory of 
 your {es} deployment.
 . Point your {es} deployment to the model directory by adding the 

From 7c1af36cf69bce47cdff0f58f48a6f894d749d39 Mon Sep 17 00:00:00 2001
From: Liam Thompson <leemthompo@gmail.com>
Date: Wed, 11 Dec 2024 15:52:10 +0100
Subject: [PATCH 12/12] Delete trailing backtick

---
 docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
index e76f6084c..759482c88 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elastic-rerank.asciidoc
@@ -187,7 +187,7 @@ following line to the `config/elasticsearch.yml` file:
 +
 --
 ```
-xpack.ml.model_repository: file://${path.home}/config/models/`
+xpack.ml.model_repository: file://${path.home}/config/models/
 ```
 --
 . Repeat step 2 and step 3 on all master-eligible nodes.