elastic · hey-sa · May 12, 2026
@@ -76,7 +76,9 @@
 | Jina Embeddings v5 Small {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
 | Jina Embeddings v3 {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
 | Jina Embeddings v5 (Small) {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
 | Jina Embeddings v5 (Nano) {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
+| Jina Embeddings v5 Omni Nano {applies_to}`stack: ga 9.5+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. Audio, video, and PDF inputs require {{stack}} 9.5+ (transport version `inference_api_audio_video_pdf_support`). |
+| Jina Embeddings v5 Omni Small {applies_to}`stack: ga 9.5+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. Audio, video, and PDF inputs require {{stack}} 9.5+ (transport version `inference_api_audio_video_pdf_support`). |
 | Jina Reranker v2 {applies_to}`stack: ga 9.3+`     | 600             | -                       | 6,000,000               | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
 | Jina Reranker v3 {applies_to}`stack: ga 9.3+`     | 600             | -                       | 6,000,000               | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
 

@@ -4,6 +4,8 @@ Elastic,ELSER v2,elser_model_2,[ELSER docs](https://www.elastic.co/docs/explore-
 Jina,Embeddings v3,jina-embeddings-v3,[jina-embeddings-v3](https://jina.ai/models/jina-embeddings-v3/),[Elastic Terms](https://www.elastic.co/legal/terms-of-use),Text,Embedding,,0,No,"US, SG, EU",Generally Available,9.3
 Jina,Embeddings v5 Text Nano,jina-embeddings-v5-text-nano,[jina-embeddings-v5-text-nano](https://huggingface.co/jinaai/jina-embeddings-v5-text-nano),[Elastic Terms](https://www.elastic.co/legal/terms-of-use),Text,Embedding,,0,No,"US, SG, EU",Generally Available,9.3
 Jina,Embeddings v5 Text Small,jina-embeddings-v5-text-small,[jina-embeddings-v5-text-small](https://huggingface.co/jinaai/jina-embeddings-v5-text-small),[Elastic Terms](https://www.elastic.co/legal/terms-of-use),Text,Embedding,,0,No,"US, SG, EU",Generally Available,9.3
+Jina,Embeddings v5 Omni Nano,jina-embeddings-v5-omni-nano,[jina-embeddings-v5-omni-nano](https://jina.ai/models/jina-embeddings-v5-omni-nano/),[Elastic Terms](https://www.elastic.co/legal/terms-of-use),"Text, Image, Audio, Video, PDF",Embedding,,0,No,"US, SG, EU",Generally Available,9.5
+Jina,Embeddings v5 Omni Small,jina-embeddings-v5-omni-small,[jina-embeddings-v5-omni-small](https://jina.ai/models/jina-embeddings-v5-omni-small/),[Elastic Terms](https://www.elastic.co/legal/terms-of-use),"Text, Image, Audio, Video, PDF",Embedding,,0,No,"US, SG, EU",Generally Available,9.5
 Google,Gemini Embedding v1,google-gemini-embedding-001,[Gemini Embedding 001](https://deepmind.google/research/publications/157741/),[Google terms](https://cloud.google.com/terms),Text,Text,,55 days,No,US,Generally Available,9.3
 Microsoft,Multilingual E5 Large,microsoft-multilingual-e5-large,[Multilingual E5 Large System Card](https://huggingface.co/intfloat/e5-large-v2),DeepInfra terms,Text,Embedding,,0,No,US,Generally Available,9.3
 OpenAI,Text Embedding 003 Large,openai-text-embedding-3-large,[Text Embedding 003 Large](https://platform.openai.com/docs/models/text-embedding-3-large),[OpenAI terms](https://openai.com/en-GB/policies/row-terms-of-use/),Text,Text,,Unknown,No,US,Generally Available,9.3

@@ -15,13 +15,19 @@
 Jina models are currently available only through [Elastic {{infer-cap}} Service (EIS)](/explore-analyze/elastic-inference/eis.md) or [external {{infer}}](docs-content://explore-analyze/elastic-inference/external.md) providers. Since these models rely on external connectivity, they cannot currently be deployed on [{{ml}} nodes](/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#ml-node-role) and are not compatible with fully air-gapped environments.
 :::
 
+:::{tip}
+If you may need to search images, audio, video, or PDFs alongside text, start with a `jina-embeddings-v5-omni-*` model. The v5 omni models share the same text embedding space as their matching v5 text models, so existing `v5-text-*` vectors can be compared with text vectors from the matching omni model without reindexing.
+:::
+
 Currently, the following models are available as built-in models:
 
 **Embedding models**
 
-* [`jina-embeddings-v5-text-small`](#jina-embeddings-v5-text-small)
-* [`jina-embeddings-v5-text-nano`](#jina-embeddings-v5-text-nano)
-* [`jina-embeddings-v3`](#jina-embeddings-v3)
+* [`jina-embeddings-v5-omni-small`](#jina-embeddings-v5-omni-small) — multimodal (text, image, audio, video, PDF)
+* [`jina-embeddings-v5-omni-nano`](#jina-embeddings-v5-omni-nano) — multimodal (text, image, audio, video, PDF)
+* [`jina-embeddings-v5-text-small`](#jina-embeddings-v5-text-small) — text-only
+* [`jina-embeddings-v5-text-nano`](#jina-embeddings-v5-text-nano) — text-only
+* [`jina-embeddings-v3`](#jina-embeddings-v3) — text-only
 
 **Rerankers**
 
@@ -32,7 +38,161 @@
 
 Embedding models convert text into vector embeddings, which are fixed-length numerical representations that capture semantic meaning.
 Texts with similar meaning are mapped to nearby points in vector space, so you can retrieve relevant documents with vector similarity search.
-When you send text to an EIS {{infer}} endpoint that uses an embedding model, the model returns a vector of floating-point numbers (for example, 1024 values). {{es}} stores these vectors in [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) fields or through the [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) filed and uses vector similarity search to retrieve the most relevant documents for a given query. Unlike [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md), which expands text into sparse token-weight vectors, these models produce compact dense vectors that are well suited for multilingual and cross-domain use cases.
+When you send text to an EIS {{infer}} endpoint that uses an embedding model, the model returns a vector of floating-point numbers (for example, 1024 values). {{es}} stores these vectors in [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) fields or through the [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field and uses vector similarity search to retrieve the most relevant documents for a given query. Unlike [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md), which expands text into sparse token-weight vectors, these models produce compact dense vectors that are well suited for multilingual and cross-domain use cases.
+
+### Jina v5 omni embedding models [jina-embeddings-v5-omni]
+
+The `jina-embeddings-v5-omni-*` models accept **text, image, audio, video, and PDF** inputs and place all supported input types in a shared vector space. Use them when you need cross-modal retrieval, such as querying a text index with an image or finding videos from a text query.
+
+The v5 omni models are available through Elastic {{infer-cap}} Service (EIS), so no {{ml}} node scaling or model deployment is required.
+
+#### `jina-embeddings-v5-omni-small` [jina-embeddings-v5-omni-small]
+
+```{applies_to}
+stack: ga 9.5
+serverless: ga
+```
+
+[`jina-embeddings-v5-omni-small`](https://www.elastic.co/search-labs/blog/jina-embeddings-v5-omni-all-media-one-index) is the recommended Jina embedding model for deployments that need higher-quality mixed-media search. It produces 1024-dimension embeddings by default, supports a 32768 token input context window, and uses the same text embedding space as [`jina-embeddings-v5-text-small`](#jina-embeddings-v5-text-small).
+
+For more information about the model, refer to the [Elastic blog post](https://www.elastic.co/search-labs/blog/jina-embeddings-v5-omni-all-media-one-index) or the [model page](https://jina.ai/models/jina-embeddings-v5-omni-small/).
+
+#### `jina-embeddings-v5-omni-nano` [jina-embeddings-v5-omni-nano]
+
+```{applies_to}
+stack: ga 9.5
+serverless: ga
+```
+
+[`jina-embeddings-v5-omni-nano`](https://www.elastic.co/search-labs/blog/jina-embeddings-v5-omni-all-media-one-index) is the compact, lower-cost member of the Jina v5 omni family. It produces 768-dimension embeddings by default, supports a 32768 token input context window, and uses the same text embedding space as [`jina-embeddings-v5-text-nano`](#jina-embeddings-v5-text-nano).
+
+For more information about the model, refer to the [Elastic blog post](https://www.elastic.co/search-labs/blog/jina-embeddings-v5-omni-all-media-one-index) or the [model page](https://jina.ai/models/jina-embeddings-v5-omni-nano/).
+
+#### Requirements [jina-embeddings-v5-omni-req]
+
+To use a v5 omni model, you must have the [appropriate subscription](https://www.elastic.co/subscriptions) level. {{ecloud}} trial accounts cannot use the v5 omni models; start a paid {{ecloud}} deployment or {{serverless-short}} project to access them.
+
+All input types require {{stack}} 9.5 or later.
+
+#### Getting started with v5 omni models through Elastic {{infer-cap}} Service
+
+For text input, the recommended entry point is a `semantic_text` field that references one of the preconfigured v5 omni {{infer}} endpoints. {{es}} provisions the endpoint on first reference.
+
+Create an index with a `semantic_text` field:
+
+```console
+PUT multimodal-semantic-index
+{
+  "mappings": {
+    "properties": {
+      "content": {
+        "type": "semantic_text",
+        "inference_id": ".jina-embeddings-v5-omni-small"
+      }
+    }
+  }
+}
+```
+
+Index documents normally. {{es}} generates embeddings through the {{infer}} endpoint:
+
+```console
+POST multimodal-semantic-index/_doc
+{
+  "content": "'Kraft Dinner' is what Canadians call macaroni and cheese when prepared from a kit."
+}
+```
+
+Query the field with a `semantic` query:
+
+```console
+GET multimodal-semantic-index/_search
+{
+  "query": {
+    "semantic": {
+      "field": "content",
+      "query": "Was bedeutet 'Kraft Dinner' für Kanadier?"
+    }
+  }
+}
+```
+
+To use `jina-embeddings-v5-omni-nano`, set `inference_id` to `.jina-embeddings-v5-omni-nano` instead.
+
+To create an explicit {{infer}} endpoint instead of using the preconfigured endpoint, use the `embedding` task type:
+
+```console
+PUT _inference/embedding/eis-jina-embeddings-v5-omni-small
+{
+  "service": "elastic",
+  "service_settings": {
+    "model_id": "jina-embeddings-v5-omni-small"
+  }
+}
+```
+
+#### Multimodal ingestion and querying [jina-embeddings-v5-omni-multimodal]
+
+`semantic_text` ingests text content. To embed image, audio, video, or PDF input, or to issue a cross-modal query against a text index, call the {{infer}} endpoint directly and store or compare the resulting vector against a `dense_vector` field.
+
+The request body is a structured `input` array. Each element holds a `content` object describing one piece of media, and each request can hold up to 16 input items. Media values are base64-encoded data URIs:
+
+```console
+POST _inference/embedding/.jina-embeddings-v5-omni-small
+{
+  "input": [
+    { "content": { "type": "image", "format": "base64", "value": "data:image/png;base64,iVBORw0KGgo..." } },
+    { "content": { "type": "audio", "format": "base64", "value": "data:audio/wav;base64,UklGRiQAAAB..." } },
+    { "content": { "type": "video", "format": "base64", "value": "data:video/mp4;base64,AAAAIGZ0eXA..." } },
+    { "content": { "type": "pdf",   "format": "base64", "value": "data:application/pdf;base64,JVBE..." } }
+  ]
+}
+```
+
+To combine several media items into a single embedding, pass an array of content fields under one `input` element:
+
+```console
+POST _inference/embedding/.jina-embeddings-v5-omni-small
+{
+  "input": [
+    {
+      "content": [
+        { "type": "text",  "value": "A description of the scene" },
+        { "type": "image", "format": "base64", "value": "data:image/png;base64,iVBORw0KGgo..." },
+        { "type": "audio", "format": "base64", "value": "data:audio/wav;base64,UklGRiQAAAB..." }
+      ]
+    }
+  ]
+}
+```
+
+The response is shaped `{"embeddings": [{"embedding": [...]}, ...]}`. The array length matches the number of input items, except for PDF input, which produces one embedding per page.
+
+#### Upgrading from `jina-embeddings-v5-text-*` [jina-embeddings-v5-omni-migrate]
+
+The v5 omni models share their text embedding space with the matching v5 text models. Existing `dense_vector` data populated by `jina-embeddings-v5-text-*` remains directly comparable to vectors produced from text input by the corresponding omni model, so no reindex is required.
+
+Use the matching pair:
+
+* `jina-embeddings-v5-omni-small` with `jina-embeddings-v5-text-small` (1024 dimensions)
+* `jina-embeddings-v5-omni-nano` with `jina-embeddings-v5-text-nano` (768 dimensions)
+
+Do not mix across the small and nano families, because their vector spaces and dimensions differ.
+
+For `semantic_text` mappings, set the `inference_id` to the corresponding `.jina-embeddings-v5-omni-*` endpoint on new indices. Existing indices continue to work unchanged.
+
+For code that calls `_inference` directly, the task type changes from `text_embedding` to `embedding`, and the request body changes from a flat string `input` to an array of `content` objects. See [Multimodal ingestion and querying](#jina-embeddings-v5-omni-multimodal).
+
+For pure text workloads, keep using `v5-text-*` endpoints. Use `v5-omni-*` when mixed media is in scope.
+
+#### Performance considerations [jina-embeddings-v5-omni-performance]
+
+* Use `jina-embeddings-v5-omni-small` when retrieval quality is the main priority. Use `jina-embeddings-v5-omni-nano` when ingestion volume, latency, or cost is the main constraint.
+* Each {{infer}} request can contain up to 16 input items.
+* Image inputs must be at least 28×28 pixels (784 pixels total).
+* PDF inputs return one embedding per page.
+* Video is sampled at 32 uniformly spaced frames regardless of clip length. For long videos, segment into shorter clips for finer temporal resolution.
+* Although the models support a 32768 token context window, consider chunking very large text fields to control latency and cost.
 
 ### `jina-embeddings-v5-text-small` [jina-embeddings-v5-text-small]
 
@@ -270,6 +430,7 @@
 
 The following blog posts provide additional background and context:
 
+* [jina-embeddings-v5-omni for text, images, video, and audio](https://www.elastic.co/search-labs/blog/jina-embeddings-v5-omni-all-media-one-index)
 * [jina-embeddings-v5-text: Compact state-of-the-art text embeddings for search and intelligent applications](https://www.elastic.co/search-labs/blog/jina-embeddings-v5-text)
 * [Jina rerankers bring fast, multilingual reranking to Elastic Inference Service (EIS)](https://www.elastic.co/search-labs/blog/jina-rerankers-elastic-inference-service)
 * [jina-embeddings-v3 is now available on Elastic Inference Service](https://www.elastic.co/search-labs/blog/jina-embeddings-v3-elastic-inference-service)