# Simplified Vector Search (kNN) Implementation Guide


# Loading the Embedding Model
Loading embedding model: [sentence-transformers/all-distilroberta-v1](https://huggingface.co/sentence-transformers/all-distilroberta-v1)

Loading code borrowed from [elasticsearch-labs](https://www.elastic.co/search-labs) NLP text search [example notebook](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb)


In [None]:
!pip install torch==2.2.0

In [2]:
# import modules
import pandas as pd, json
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
from getpass import getpass
from urllib.request import urlopen
from pprint import pprint

In [3]:
API_KEY = "SjkteHBKWUJZcml2dGNPLTVSY1I6UVFnOXhPbF9PLTBLZUxRWEhIbERIZw=="
HUB_MODEL_ID = "sentence-transformers/all-distilroberta-v1"
es = Elasticsearch("http://localhost:9200", api_key=API_KEY)
es.info()  # should return cluster info

ObjectApiResponse({'name': 'e230fd301e9b', 'cluster_name': 'docker-cluster', 'cluster_uuid': 'S3u4N1xWQJ--WBh1mpUODQ', 'version': {'number': '9.0.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '112859b85d50de2a7e63f73c8fc70b99eea24291', 'build_date': '2025-04-08T15:13:46.049795831Z', 'build_snapshot': False, 'lucene_version': '10.1.0', 'minimum_wire_compatibility_version': '8.18.0', 'minimum_index_compatibility_version': '8.0.0'}, 'tagline': 'You Know, for Search'})

In [None]:
!eland_import_hub_model --url http://localhost:9200 --hub-model-id $HUB_MODEL_ID --task-type text_embedding --es-api-key $API_KEY --start

# Ingest pipeline setup

Map the field we want to create an embedding for, my_text, to the name the embedding model expects text_field in this case
Configure which model to use with model_id. This is the name of the model within Elasticsearch
Handle when any errors may occur for monitoring

In [11]:
pipeline = {
    "processors": [
        {
            "inference": {
                "field_map": {"my_text": "text_field"},             # map model's text_field to my_text
                "model_id": "sentence-transformers__all-distilroberta-v1",
                "target_field": "ml.inference.my_vector",   # map model's output to my_vector
                "on_failure": [
                    {
                        "append": {
                            "field": "_source._ingest.inference_errors",
                            "value": [
                                {
                                    "message": "Processor 'inference' in pipeline 'ml-inference-title-vector' failed with message '{{ _ingest.on_failure_message }}'",
                                    "pipeline": "ml-inference-title-vector",
                                    "timestamp": "{{{ _ingest.timestamp }}}",
                                }
                            ],
                        }
                    }
                ],
            }
        },
        {
            "set": { # set the value of my_vector to the predicted_value
                "field": "my_vector",
                "if": "ctx?.ml?.inference != null && ctx.ml.inference['my_vector'] != null",    # check if the predicted_value is not null
                "copy_from": "ml.inference.my_vector.predicted_value", # copy the predicted_value to my_vector
                "description": "Copy the predicted_value to 'my_vector'", # description of the processor
            }
        },
        {"remove": {
            "field": "ml.inference.my_vector", # remove the ml.inference.my_vector field
            "ignore_missing": True # ignore the missing field
            }
        },
    ]
}

pipeline_id = "vector_embedding_demo"
response = es.ingest.put_pipeline(id=pipeline_id, body=pipeline)

# Print the response
print(response)

{'acknowledged': True}


# Index Mapping / Template setup

Embeddings (vectors) are stored in the dense_vector field type in Elasticsearch. Next we will configure the index template before indexing documents and generating embeddings.

The below API call will create an index template to match any indices with the pattern my_vector_index-*

It will:

1.Configure dense_vector for my_vector as outlined in the documentation.




2.It is recommended to Exclude the vector field from _source





3.We will also include one text field, my_text in this example which will be the source the embedding is generated from.

In [13]:
index_patterns = ["my_vector_index-*"]

priority = 1

settings = {
    "index.default_pipeline": pipeline_id,
}

mappings = {
    "properties": {
        "my_vector": {"type": "dense_vector", "dims": 768,"index": true, "similarity": "dot_product"},
        "my_text": {"type": "text"},
    },
    "_source": {"excludes": ["my_vector"]},
}

# Exclude `my_vector` from `_source` explicitly
source_exclusions = {"_source": {"excludes": ["my_vector"]}}

# Create the index template using put_index_template
response = es.indices.put_index_template(
    name="my_vector_index_template",  # Template name
    index_patterns=index_patterns,
    priority=priority,
    template={
        "settings": settings,
        "mappings": mappings,
    },
)

# Print the response
print(response)

{'acknowledged': True}


# Indexing Data


In [15]:
index_name = "my_vector_index-01"

data = [
    ("Hey, careful, man, there's a beverage here!", "The Dude"),
    (
        "I’m The Dude. So, that’s what you call me. You know, that or, uh, His Dudeness, or, uh, Duder, or El Duderino, if you’re not into the whole brevity thing",
        "The Dude",
    ),
    (
        "You don't go out looking for a job dressed like that? On a weekday?",
        "The Big Lebowski",
    ),
    ("What do you mean brought it bowling, Dude?", "Walter Sobchak"),
    (
        "Donny was a good bowler, and a good man. He was one of us. He was a man who loved the outdoors... and bowling, and as a surfer he explored the beaches of Southern California, from La Jolla to Leo Carrillo and... up to... Pismo",
        "Walter Sobchak",
    ),
]

actions = [
    {
        "_op_type": "index",
        "_index": index_name,
        "_source": {"my_text": text, "my_metadata": metadata},
    }
    for text, metadata in data
]

bulk(es, actions)

# Refresh the index to make sure all data is searchable
es.indices.refresh(index=index_name)

ObjectApiResponse({'_shards': {'total': 2, 'successful': 1, 'failed': 0}})

# Querying Data


Approximate k-nearest neighbor (kNN)

In [20]:
knn = {
    "field": "my_vector",
    "k": 2,
    "num_candidates": 5,
    "query_vector_builder": {
        "text_embedding": {
            "model_id": "sentence-transformers__all-distilroberta-v1",
            "model_text": "Watchout I have a drink",    # Frontend will pass the text to be embedded
        }
    },
}

response = es.search(index=index_name, knn=knn, source=True)

pprint(response["hits"]["hits"])

[{'_id': 'ndmhrpYBca9xMcEAnock',
  '_index': 'my_vector_index-01',
  '_score': 0.7825787,
  '_source': {'ml': {'inference': {}},
              'my_metadata': 'The Dude',
              'my_text': "Hey, careful, man, there's a beverage here!"}},
 {'_id': 'ntmhrpYBca9xMcEAnock',
  '_index': 'my_vector_index-01',
  '_score': 0.60257983,
  '_source': {'ml': {'inference': {}},
              'my_metadata': 'The Dude',
              'my_text': 'I’m The Dude. So, that’s what you call me. You know, '
                         'that or, uh, His Dudeness, or, uh, Duder, or El '
                         'Duderino, if you’re not into the whole brevity '
                         'thing'}}]


In [21]:
response
# ObjectApiResponse({'took': 8, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 2, 'relation': 'eq'}, 'max_score': 0.7825787, 'hits': [{'_index': 'my_vector_index-01', '_id': 'ndmhrpYBca9xMcEAnock', '_score': 0.7825787, '_source': {'my_text': "Hey, careful, man, there's a beverage here!", 'my_metadata': 'The Dude', 'ml': {'inference': {}}}}, {'_index': 'my_vector_index-01', '_id': 'ntmhrpYBca9xMcEAnock', '_score': 0.60257983, '_source': {'my_text': 'I’m The Dude. So, that’s what you call me. You know, that or, uh, His Dudeness, or, uh, Duder, or El Duderino, if you’re not into the whole brevity thing', 'my_metadata': 'The Dude', 'ml': {'inference': {}}}}]}})

ObjectApiResponse({'took': 8, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 2, 'relation': 'eq'}, 'max_score': 0.7825787, 'hits': [{'_index': 'my_vector_index-01', '_id': 'ndmhrpYBca9xMcEAnock', '_score': 0.7825787, '_source': {'my_text': "Hey, careful, man, there's a beverage here!", 'my_metadata': 'The Dude', 'ml': {'inference': {}}}}, {'_index': 'my_vector_index-01', '_id': 'ntmhrpYBca9xMcEAnock', '_score': 0.60257983, '_source': {'my_text': 'I’m The Dude. So, that’s what you call me. You know, that or, uh, His Dudeness, or, uh, Duder, or El Duderino, if you’re not into the whole brevity thing', 'my_metadata': 'The Dude', 'ml': {'inference': {}}}}]}})

## Hybrid Searching (kNN + BM25) with RRF

In [17]:
query = {"match": {"my_text": "bowling"}}   # Keyword search query

knn = {
    "field": "my_vector",
    "k": 3,
    "num_candidates": 5,
    "query_vector_builder": {
        "text_embedding": {
            "model_id": "sentence-transformers__all-distilroberta-v1",
            "model_text": "He enjoyed the game",    # Semantic search query
        }
    },
}

rank: {"rrf": {}}

fields = ["my_text", "my_metadata"]


response = es.search(
    index=index_name, fields=fields, knn=knn, query=query, size=2, source=False
)

pprint(response["hits"]["hits"])

[{'_id': 'oNmhrpYBca9xMcEAnock',
  '_index': 'my_vector_index-01',
  '_score': 1.8420708,
  'fields': {'my_metadata': ['Walter Sobchak'],
             'my_text': ['What do you mean brought it bowling, Dude?']}},
 {'_id': 'odmhrpYBca9xMcEAnock',
  '_index': 'my_vector_index-01',
  '_score': 1.2540475,
  'fields': {'my_metadata': ['Walter Sobchak'],
             'my_text': ['Donny was a good bowler, and a good man. He was one '
                         'of us. He was a man who loved the outdoors... and '
                         'bowling, and as a surfer he explored the beaches of '
                         'Southern California, from La Jolla to Leo Carrillo '
                         'and... up to... Pismo']}}]


## Filtering

In [18]:
knn = {
    "field": "my_vector",
    "k": 1,
    "num_candidates": 5,
    "query_vector_builder": {
        "text_embedding": {
            "model_id": "sentence-transformers__all-distilroberta-v1",
            "model_text": "Did you bring the dog?",
        }
    },
    "filter": {"term": {"my_metadata.keyword": "The Dude"}},
}

fields = ["my_text", "my_metadata"]

response = es.search(index=index_name, fields=fields, knn=knn, source=False)

pprint(response["hits"]["hits"])

[{'_id': 'ndmhrpYBca9xMcEAnock',
  '_index': 'my_vector_index-01',
  '_score': 0.59394693,
  'fields': {'my_metadata': ['The Dude'],
             'my_text': ["Hey, careful, man, there's a beverage here!"]}}]


# Aggregrations
and Select fields returned

In [19]:
knn = {
    "field": "my_vector",
    "k": 2,
    "num_candidates": 5,
    "query_vector_builder": {
        "text_embedding": {
            "model_id": "sentence-transformers__all-distilroberta-v1",
            "model_text": "did you bring it?",
        }
    },
}

aggs = {"metadata": {"terms": {"field": "my_metadata.keyword"}}}

fields = ["my_text", "my_metadata"]

response = es.search(index=index_name, fields=fields, aggs=aggs, knn=knn, source=False)

pprint(response["hits"]["hits"])

[{'_id': 'oNmhrpYBca9xMcEAnock',
  '_index': 'my_vector_index-01',
  '_score': 0.74338245,
  'fields': {'my_metadata': ['Walter Sobchak'],
             'my_text': ['What do you mean brought it bowling, Dude?']}},
 {'_id': 'ndmhrpYBca9xMcEAnock',
  '_index': 'my_vector_index-01',
  '_score': 0.6028073,
  'fields': {'my_metadata': ['The Dude'],
             'my_text': ["Hey, careful, man, there's a beverage here!"]}}]
