# **Lexical and Semantic Search with Elasticsearch**

In this example, you will explore various approaches to retrieving information using Elasticsearch, focusing specifically on text, lexical and semantic search.

To accomplish this, this example demonstrate various search scenarios on a dataset generated to simulate e-commerce product information.

This dataset contains over 2,500 products, each with a description. These products are categorized into 76 distinct product categories, with each category containing a varying number of products.

## **🧰 Requirements**

For this example, you will need:

- Python 3.6 or later
- The Elastic Python client
- Elastic 8.8 deployment or later, with 8GB memory machine learning node
- The Elastic Learned Sparse EncodeR model that comes pre-loaded into Elastic installed and started on your deployment

We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html), a [free trial](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-notebook) is available.

## Setup Elasticsearch environment:

To get started, we'll need to connect to our Elastic deployment using the Python client.

Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.


In [None]:
%pip install elasticsearch

In [3]:
from elasticsearch import (
    Elasticsearch,
    helpers,
)  # Import the Elasticsearch client and helpers module
from urllib.request import urlopen  # library for opening URLs
import json  # module for handling JSON data
from pathlib import Path  # module for working with file paths

import getpass  # handling password input

Now we can instantiate the Python Elasticsearch client.

First we prompt the user for their password and Cloud ID.

🔐 NOTE: `getpass` enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.

Then we create a `client` object that instantiates an instance of the `Elasticsearch` class.

In [None]:
# Found in the 'Manage Deployment' page
ELASTIC_ENDPOINT = getpass.getpass("Enter Elastic Endpoint:  ")

# Password for the 'elastic' user generated by Elasticsearch
ELASTIC_API_KEY = getpass.getpass("Enter Elastic API Key:  ")

# Create the client instance
client = Elasticsearch(
    hosts=[ELASTIC_ENDPOINT], api_key=ELASTIC_API_KEY, request_timeout=3600
)

In [5]:
print(client.info())

{'name': 'serverless', 'cluster_name': 'f15e57523cf84631a30f3aaf16c3ecf0', 'cluster_uuid': 'Wumo0cJWRZC8YfiHnVNwnQ', 'version': {'number': '8.11.0', 'build_flavor': 'serverless', 'build_type': 'docker', 'build_hash': '00000000', 'build_date': '2023-10-31', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '8.11.0', 'minimum_index_compatibility_version': '8.11.0'}, 'tagline': 'You Know, for Search'}


## Setup emebdding model

Next we upload the all-mpnet-base-v2 embedding model into Elasticsearch and create an ingest pipeline with inference processors for text embedding and text expansion, using the description field for both. This field contains the description of each product.

In [20]:
# set the model to .multilingual-e5-small-elasticsearch
es_model_id = ".multilingual-e5-small"
es_model_endpoint = ".multilingual-e5-small-elasticsearch"

# verify the model is loaded, deployed, and ready to use
models = client.ml.get_trained_models()
for model in models["trained_model_configs"]:
    if model["model_id"] == es_model_id:
        print(f"Model ID: {model['model_id']}")
        print(f"Description: {model.get('description', 'No description')}")
        print(
            f"Inference Config: {model.get('inference_config', 'No inference config')}"
        )
        print(f"Version: {model.get('version', 'N/A')}")
        print(f"Tags: {model.get('tags', [])}")
        break
else:
    print(f"Model {es_model_id} not found.")

print("------")

inference_endpoint = client.inference.get(inference_id=es_model_endpoint)
inference_endpoint = inference_endpoint["endpoints"][0]
print(inference_endpoint)
print(f"Inference Endpoint ID: {es_model_endpoint}")
print(
    f"Model ID: {inference_endpoint.get('service_settings', {}).get('model_id', 'N/A')}"
)
print(f"Task Type: {inference_endpoint['task_type']}")

Model ID: .multilingual-e5-small
Description: E5 small multilingual
Inference Config: {'text_embedding': {'vocabulary': {'index': '.ml-inference-native-000002'}, 'tokenization': {'xlm_roberta': {'do_lower_case': False, 'with_special_tokens': True, 'max_sequence_length': 512, 'truncate': 'first', 'span': -1}}, 'embedding_size': 384}}
Version: 12.0.0
Tags: []
------
{'inference_id': '.multilingual-e5-small-elasticsearch', 'task_type': 'text_embedding', 'service': 'elasticsearch', 'service_settings': {'num_threads': 1, 'model_id': '.multilingual-e5-small_linux-x86_64', 'adaptive_allocations': {'enabled': True, 'min_number_of_allocations': 0, 'max_number_of_allocations': 32}}, 'chunking_settings': {'strategy': 'sentence', 'max_chunk_size': 250, 'sentence_overlap': 1}}
Inference Endpoint ID: .multilingual-e5-small-elasticsearch
Model ID: .multilingual-e5-small_linux-x86_64
Task Type: text_embedding


In [62]:
# Creating an ingest pipeline with inference processors to use ELSER (sparse) and all-mpnet-base-v2 (dense) to infer against data that will be ingested in the pipeline.

client.ingest.put_pipeline(
    id="ecommerce-pipeline",
    processors=[
        {
            "inference": {
                "model_id": ".elser-2-elasticsearch",
                "input_output": [
                    {
                        "input_field": "description",
                        "output_field": "elser_description_vector",
                    }
                ],
            }
        },
        {
            "inference": {
                "model_id": ".multilingual-e5-small-elasticsearch",  # Inference endpoint ID
                "input_output": [
                    {
                        "input_field": "description",
                        "output_field": "e5_description_vector",
                    }
                ],
                "inference_config": {"text_embedding": {}},
            }
        },
    ],
)

ObjectApiResponse({'acknowledged': True})

## Index documents

Then, we create a source index to load `products-ecommerce.json`, this will be the `ecommerce` index and a destination index to extract the documents from the source and index these documents into the destination `ecommerce-search`.

For the `ecommerce-search` index we add a field to support dense vector storage and search `description_vector.predicted_value`, this is the target field for inference results. The field type in this case is `dense_vector`, the `all-mpnet-base-v2` model has embedding_size of 768, so dims is set to 768. We also add a `rank_features` field type to support the text expansion output.

In [52]:
# Index to load products-ecommerce.json docs
if client.indices.exists(index="ecommerce"):
    client.indices.delete(index="ecommerce")

client.indices.create(
    index="ecommerce",
    mappings={
        "properties": {
            "product": {
                "type": "text",
            },
            "description": {
                "type": "text",
            },
            "category": {
                "type": "text",
            },
        }
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'ecommerce'})

In [63]:
# Reindex dest index

INDEX = "ecommerce-search"
if client.indices.exists(index=INDEX):
    client.indices.delete(index=INDEX)
client.indices.create(
    index=INDEX,
    mappings={
        # Saving disk space by excluding the ELSER tokens and the dense_vector field from document source.
        # Note: That should only be applied if you are certain that reindexing will not be required in the future.
        "properties": {
            "product": {
                "type": "text",
            },
            "description": {
                "type": "text",
            },
            "category": {
                "type": "text",
            },
            "elser_description_vector": {"type": "sparse_vector"},
            "e5_description_vector": {  # Inference results field, target_field.predicted_value
                "type": "dense_vector",
                "dims": 384,  # The all-mpnet-base-v2 model has embedding_size of 768, so dims is set to 768.
                "index": "true",
                "similarity": "cosine",  #  When indexing vectors for approximate kNN search, you need to specify the similarity function for comparing the vectors.
            },
        },
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'ecommerce-search'})

## Load documents

Then we load `products-ecommerce.json` into the `ecommerce` index.

In [55]:
#  dataset

import json

with open("products-ecommerce.json", "r") as f:
    data_json = json.load(f)


def create_index_body(doc):
    """Generate the body for an Elasticsearch document."""
    return {
        "_index": "ecommerce",
        "_source": doc,
    }


# Prepare the documents to be indexed
documents = [create_index_body(doc) for doc in data_json]

# Use helpers.bulk to index
helpers.bulk(client, documents)

print("Done indexing documents into `ecommerce` index")

Done indexing documents into `ecommerce` index


## Reindex

Now we can reindex data from the `source` index `ecommerce` to the `dest` index `ecommerce-search` with the ingest pipeline `ecommerce-pipeline` we created.

After this step our `dest` index will have the fields we need to perform Semantic Search.

In [64]:
# Reindex data from one index 'source' to another 'dest' with the 'ecommerce-pipeline' pipeline.

client.reindex(
    wait_for_completion=True,
    source={"index": "ecommerce"},
    dest={"index": "ecommerce-search", "pipeline": "ecommerce-pipeline"},
)

ObjectApiResponse({'took': 76252, 'timed_out': False, 'total': 2506, 'updated': 0, 'created': 2506, 'deleted': 0, 'batches': 3, 'version_conflicts': 0, 'noops': 0, 'retries': {'bulk': 0, 'search': 0}, 'throttled_millis': 0, 'requests_per_second': -1.0, 'throttled_until_millis': 0, 'failures': []})

## Text Analysis with Standard Analyzer

In [65]:
# Performs text analysis on a string and returns the resulting tokens.

# Define the text to be analyzed
text = "Comfortable furniture for a large balcony"

# Define the analyze request
request_body = {"analyzer": "standard", "text": text}  # Standard Analyzer

# Perform the analyze request
response = client.indices.analyze(
    analyzer=request_body["analyzer"], text=request_body["text"]
)

# Extract and display the analyzed tokens
tokens = [token["token"] for token in response["tokens"]]
print("Analyzed Tokens:", tokens)

Analyzed Tokens: ['comfortable', 'furniture', 'for', 'a', 'large', 'balcony']


## Text Analysis with Stop Analyzer

In [66]:
# Performs text analysis on a string and returns the resulting tokens.

# Define the text to be analyzed
text = "Comfortable furniture for a large balcony"

# Define the analyze request
request_body = {"analyzer": "stop", "text": text}  # Stop Analyzer

# Perform the analyze request
response = client.indices.analyze(
    analyzer=request_body["analyzer"], text=request_body["text"]
)

# Extract and display the analyzed tokens
tokens = [token["token"] for token in response["tokens"]]
print("Analyzed Tokens:", tokens)

Analyzed Tokens: ['comfortable', 'furniture', 'large', 'balcony']


## Lexical Search

In [67]:
# BM25

response = client.search(
    size=2,
    index="ecommerce-search",
    query={
        "match": {
            "description": {
                "query": "Comfortable furniture for a large balcony",
                "analyzer": "stop",
            }
        }
    },
)
hits = response["hits"]["hits"]

if not hits:
    print("No matches found")
else:
    for hit in hits:
        score = hit["_score"]
        product = hit["_source"]["product"]
        category = hit["_source"]["category"]
        description = hit["_source"]["description"]
        print(
            f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
        )


Score: 13.206358
Product: Barbie Dreamhouse
Category: Toys
Description: is a classic Barbie playset with multiple rooms, furniture, a large balcony, a pool, and accessories. It allows kids to create their dream Barbie world.


Score: 7.827815
Product: Comfortable Rocking Chair
Category: Indoor Furniture
Description: enjoy relaxing moments with this comfortable rocking chair. Its smooth motion and cushioned seat make it an ideal piece of furniture for unwinding.



## Semantic Search with Dense Vector

In [68]:
# KNN

response = client.search(
    index="ecommerce-search",
    size=2,
    knn={
        "field": "e5_description_vector",
        "k": 50,  # Number of nearest neighbors to return as top hits.
        "num_candidates": 500,  # Number of nearest neighbor candidates to consider per shard. Increasing num_candidates tends to improve the accuracy of the final k results.
        "query_vector_builder": {  # Object indicating how to build a query_vector. kNN search enables you to perform semantic search by using a previously deployed text embedding model.
            "text_embedding": {
                "model_id": ".multilingual-e5-small-elasticsearch",  # Text embedding model id
                "model_text": "Comfortable furniture for a large balcony",  # Query
            }
        },
    },
)

for hit in response["hits"]["hits"]:

    score = hit["_score"]
    product = hit["_source"]["product"]
    category = hit["_source"]["category"]
    description = hit["_source"]["description"]
    print(
        f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )


Score: 0.93147576
Product: Metal Garden Bench with Cushion
Category: Garden Furniture
Description: is a stylish and comfortable metal garden bench, complete with a cushion for added support.


Score: 0.9304026
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.



## Semantic Search with Sparse Vector

In [74]:
# Elastic Learned Sparse Encoder - ELSER

response = client.search(
    index="ecommerce-search",
    size=2,
    query={
        "sparse_vector": {
            "field": "elser_description_vector",
            "inference_id": ".elser-2-elasticsearch",
            "query": "Comfortable furniture for a large balcony",
        }
    },
)

for hit in response["hits"]["hits"]:

    score = hit["_score"]
    product = hit["_source"]["product"]
    category = hit["_source"]["category"]
    description = hit["_source"]["description"]
    print(
        f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )


Score: 11.354144
Product: Garden Lounge Set with Side Table
Category: Garden Furniture
Description: is a comfortable and stylish garden lounge set, including a sofa, chairs, and a side table for outdoor relaxation.


Score: 11.189863
Product: Garden Lounge Chair with Sunshade
Category: Garden Furniture
Description: is a comfortable and versatile garden lounge chair with a built-in sunshade, perfect for hot sunny days.



## Hybrid Search - BM25+KNN linear combination

In [96]:
# BM25 + KNN (Linear Combination)

response = client.search(
    index="ecommerce-search",
    size=2,
    query={
        "bool": {
            "should": [
                {
                    "match": {
                        "description": {
                            "query": "A dining table and comfortable chairs for a large balcony",
                            "boost": 1,  # You can adjust the boost value
                        }
                    }
                }
            ]
        }
    },
    knn={
        "field": "e5_description_vector",
        "k": 2,
        "num_candidates": 20,
        "boost": 1,  # You can adjust the boost value
        "query_vector_builder": {
            "text_embedding": {
                "model_id": ".multilingual-e5-small-elasticsearch",
                "model_text": "A dining table and comfortable chairs for a large balcony",
            }
        },
    },
)

for hit in response["hits"]["hits"]:

    score = hit["_score"]
    product = hit["_source"]["product"]
    category = hit["_source"]["category"]
    description = hit["_source"]["description"]
    print(
        f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )


Score: 18.161213
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.


Score: 17.770641
Product: Garden Dining Set with Swivel Rockers
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel rockers for easy movement.



## Hybrid Search - BM25+KNN RRF

In [100]:
# BM25 + KNN (RRF)
top_k = 2
response = client.search(
    index="ecommerce-search",
    retriever={
        "rrf": {
            "retrievers": [
                {
                    "standard": {
                        "query": {
                            "match": {
                                "description": "A dining table and comfortable chairs for a large balcony"
                            }
                        }
                    }
                },
                {
                    "knn": {
                        "field": "e5_description_vector",
                        "query_vector_builder": {
                            "text_embedding": {
                                "model_id": ".multilingual-e5-small",
                                "model_text": "A dining table and comfortable chairs for a large balcony",
                            }
                        },
                        "k": 2,
                        "num_candidates": 20,
                    }
                },
            ],
            "rank_window_size": 2,
            "rank_constant": 20,
        }
    },
)

for hit in response["hits"]["hits"]:

    score = hit["_score"]
    category = hit["_source"]["category"]
    product = hit["_source"]["product"]
    description = hit["_source"]["description"]
    print(
        f"Score: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )

Score: 0.09307359
Product: Garden Dining Set with Swivel Rockers
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel rockers for easy movement.

Score: 0.04761905
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.



## Hybrid Search - BM25+ELSER linear combination

In [104]:
# BM25 + Elastic Learned Sparse Encoder (Linear Combination)

response = client.search(
    index="ecommerce-search",
    size=2,
    query={
        "bool": {
            "should": [
                {
                    "match": {
                        "description": {
                            "query": "A dining table and comfortable chairs for a large balcony",
                            "boost": 1,  # You can adjust the boost value
                        }
                    }
                },
                {
                    "sparse_vector": {
                        "field": "elser_description_vector",
                        "inference_id": ".elser-2-elasticsearch",
                        "query": "A dining table and comfortable chairs for a large balcony",
                    }
                },
            ]
        }
    },
)

for hit in response["hits"]["hits"]:

    score = hit["_score"]
    product = hit["_source"]["product"]
    category = hit["_source"]["category"]
    description = hit["_source"]["description"]
    print(
        f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )


Score: 35.605705
Product: Garden Dining Set with Swivel Rockers
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel rockers for easy movement.


Score: 33.858994
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.



## Hybrid Search - BM25+ELSER RRF

In [107]:
# BM25 + ELSER (RRF)
top_k = 2
response = client.search(
    index="ecommerce-search",
    retriever={
        "rrf": {
            "retrievers": [
                {
                    "standard": {
                        "query": {
                            "match": {
                                "description": "A dining table and comfortable chairs for a large balcony"
                            }
                        }
                    }
                },
                {
                    "standard": {
                        "query": {
                            "sparse_vector": {
                                "field": "elser_description_vector",
                                "inference_id": ".elser-2-elasticsearch",
                                "query": "A dining table and comfortable chairs for a large balcony",
                            }
                        }
                    }
                },
            ],
            "rank_window_size": 2,
            "rank_constant": 20,
        }
    },
)

for hit in response["hits"]["hits"]:

    score = hit["_score"]
    category = hit["_source"]["category"]
    product = hit["_source"]["product"]
    description = hit["_source"]["description"]
    print(
        f"Score: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )

Score: 0.0952381
Product: Garden Dining Set with Swivel Rockers
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel rockers for easy movement.

Score: 0.045454547
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.

