# **Lexical and Semantic Search with Elasticsearch**

In this example, you will explore various approaches to retrieving information using Elasticsearch, focusing specifically on text, lexical and semantic search.

To accomplish this, this example demonstrates various search scenarios on a dataset generated to simulate e-commerce product information.

This dataset contains over 2,500 products, each with a description. These products are categorized into 76 distinct product categories, with each category containing a varying number of products.

## **🧰 Requirements**

For this example, you will need:

- Python 3.11 or later
- The Elastic Python client
- Elastic 9.0 deployment or later, with 8GB memory machine learning node


We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html), a [free trial](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-notebook) is available.

## Setup Elasticsearch environment:

To get started, we'll need to connect to our Elastic deployment using the Python client.

Because we're using an Elastic Cloud deployment, we'll use the **Cloud Endpoint** and **Cloud API Key** to identify our deployment.


In [None]:
!pip install elasticsearch

In [None]:
# import the Elasticsearch client and bulk function
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk

# import json module to read JSON file of products
import json  # module for handling JSON data

import getpass  # handling password input

# display search results in a table
import pandas as pd
from IPython.display import display, Markdown

Now we can instantiate the Python Elasticsearch client.

First we prompt the user for their password and Cloud ID.

🔐 NOTE: `getpass` enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.

Then we create a `client` object that instantiates an instance of the `Elasticsearch` class.

In [None]:
# your endpoint for your Elasticsearch instance
ELASTIC_ENDPOINT = getpass.getpass("Enter Elastic Endpoint:  ")

# your Elastic API Key for Elasticsearch
ELASTIC_API_KEY = getpass.getpass("Enter Elastic API Key:  ")

# create the Elasticsearch client instance
client = Elasticsearch(
    hosts=[ELASTIC_ENDPOINT], api_key=ELASTIC_API_KEY, request_timeout=3600
)

Let's verify that our client is connected.

In [183]:
resp = client.ping()
print(f"Connected: {resp}")

Connected: True


## Define our embedding model

Next we upload the all-mpnet-base-v2 embedding model into Elasticsearch and create an ingest pipeline with inference processors for text embedding and text expansion, using the description field for both. This field contains the description of each product.

In [None]:
# set the model to .multilingual-e5-small-elasticsearch
es_model_id = ".multilingual-e5-small"
es_model_endpoint = ".multilingual-e5-small-elasticsearch"

# verify the model is loaded, deployed, and ready to use
models = client.ml.get_trained_models()
for model in models["trained_model_configs"]:
    if model["model_id"] == es_model_id:
        print(f"Model ID: {model['model_id']}")
        print(f"Description: {model.get('description', 'No description')}")
        print(f"Version: {model.get('version', 'N/A')}")
        break
else:
    print(f"Model {es_model_id} not found.")

print("------")

# verify the inference endpoint is ready to use
inference_endpoint = client.inference.get(inference_id=es_model_endpoint)
inference_endpoint = inference_endpoint["endpoints"][0]
print(f"Inference Endpoint ID: {es_model_endpoint}")
print(
    f"Model ID: {inference_endpoint.get('service_settings', {}).get('model_id', 'N/A')}"
)
print(f"Task Type: {inference_endpoint['task_type']}")

Model ID: .multilingual-e5-small
Description: E5 small multilingual
Version: 12.0.0
------
Inference Endpoint ID: .multilingual-e5-small-elasticsearch
Model ID: .multilingual-e5-small_linux-x86_64
Task Type: text_embedding


## Create an inference pipeline
This function will create an ingest pipeline with inference processors to use `ELSER` (sparse_vector) and `e5_multilingual_small` (dense_vector) to infer against data that will be ingested in the pipeline.

In [200]:
client.ingest.put_pipeline(
    id="ecommerce-pipeline",
    processors=[
        {
            "inference": {
                "model_id": ".elser-2-elasticsearch",  # inference endpoint ID
                "input_output": [
                    {
                        "input_field": "description",  # source field
                        "output_field": "elser_description_vector",  # destination vector field
                    }
                ],
            }
        },
        {
            "inference": {
                "model_id": ".multilingual-e5-small-elasticsearch",  # inference endpoint ID
                "input_output": [
                    {
                        "input_field": "description",  # source field
                        "output_field": "e5_description_vector",  # destination vector field
                    }
                ],
                "inference_config": {"text_embedding": {}},
            }
        },
    ],
)

ObjectApiResponse({'acknowledged': True})

## Index documents
The `ecommerce-search` index we are creating will include fields to support dense and sparse vector storage and search. 

We define the `e5_description_vector` and the `elser_description_vector` fields to store the inference pipeline results. The field type in `e5_description_vector` is a `dense_vector`. The `.e5_multilingual_small` model has embedding_size of 384, so the dimension of the fector (dims) is set to 384. 

We also add a `elser_description_vector` field type to support the `sparse_vector` output from our `.elser_model_2_linux-x86_64` model. No further configuration is needed for this field for our use case.

In [221]:
# define the index name and mapping
commerce_index = "ecommerce-search"
mappings = {
    "properties": {
        "product": {
            "type": "text",
        },
        "description": {
            "type": "text",
        },
        "category": {
            "type": "text",
        },
        "elser_description_vector": {"type": "sparse_vector"},
        "e5_description_vector": {
            "type": "dense_vector",
            "dims": 384,
            "index": "true",
            "similarity": "cosine",
        },
    },
}


if client.indices.exists(index=commerce_index):
    client.indices.delete(index=commerce_index)
client.indices.create(
    index=commerce_index,
    mappings=mappings,
)

# set the ecommerce-pipeline as a the default pipeline for the ecommerce-search index
client.indices.put_settings(
    index=commerce_index,
    body={"default_pipeline": "ecommerce-pipeline"},
)

ObjectApiResponse({'acknowledged': True})

## Load documents

Then we load `products-ecommerce.json` into the `ecommerce-search` index. We will use the `bulk` helper function to index our documents en masse. 

In [222]:
# Load the dataset
with open("products-ecommerce.json", "r") as f:
    data_json = json.load(f)


# helper function to create bulk indexing body
def create_index_body(doc):
    return {
        "_index": "ecommerce-search",
        "_source": doc,
    }


# prepare the documents to be indexed
documents = [create_index_body(doc) for doc in data_json]

# use bulk function to index
try:
    print("Indexing documents...")
    bulk(client, documents)
    print("Documents indexed successfully.")
except Exception as e:
    print(f"Error indexing documents: {e}")

Indexing documents...
Documents indexed successfully.


## Text Analysis with Standard Analyzer

In [169]:
# Performs text analysis on a string and returns the resulting tokens.

# Define the text to be analyzed
text = "Comfortable furniture for a large balcony"

# Define the analyze request
request_body = {"analyzer": "standard", "text": text}  # Standard Analyzer

# Perform the analyze request
response = client.indices.analyze(
    analyzer=request_body["analyzer"], text=request_body["text"]
)

# Extract and display the analyzed tokens
tokens = [token["token"] for token in response["tokens"]]
print("Analyzed Tokens:", tokens)

Analyzed Tokens: ['comfortable', 'furniture', 'for', 'a', 'large', 'balcony']


## Text Analysis with Stop Analyzer

In [None]:
# Performs text analysis on a string and returns the resulting tokens.
# TODO: Partial Smoosh together
# Define the text to be analyzed
text = "Comfortable furniture for a large balcony"

# Define the analyze request
request_body = {"analyzer": "stop", "text": text}  # Stop Analyzer

# Perform the analyze request
response = client.indices.analyze(
    analyzer=request_body["analyzer"], text=request_body["text"]
)

# Extract and display the analyzed tokens
tokens = [token["token"] for token in response["tokens"]]
print("Analyzed Tokens:", tokens)

Analyzed Tokens: ['comfortable', 'furniture', 'large', 'balcony']


## Lexical Search

In [225]:
results_list = []

# Regular BM25 (Lexical) Search
resp = client.search(
    size=2,
    index="ecommerce-search",
    query={
        "match": {
            "description": {
                "query": "Comfortable furniture for a large balcony",
                "analyzer": "stop",
            }
        }
    },
    source_excludes=["*_description_vector"],  # Exclude vector fields from response
)

lexical_search_results = resp["hits"]["hits"]
results_list.append({"lexical_search": lexical_search_results})

if not lexical_search_results:
    print("No matches found")
else:
    for hit in lexical_search_results:
        score = hit["_score"]
        product = hit["_source"]["product"]
        category = hit["_source"]["category"]
        description = hit["_source"]["description"]
        print(
            f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
        )


Score: 13.408413
Product: Barbie Dreamhouse
Category: Toys
Description: is a classic Barbie playset with multiple rooms, furniture, a large balcony, a pool, and accessories. It allows kids to create their dream Barbie world.


Score: 7.5048585
Product: Rattan Patio Conversation Set
Category: Outdoor Furniture
Description: is a stylish and comfortable outdoor furniture set, including a sofa, two chairs, and a coffee table, all made of durable rattan material.



## Semantic Search with Dense Vector

In [226]:
# KNN
# TODO: Add Semantic_Text type?
response = client.search(
    index="ecommerce-search",
    size=2,
    knn={
        "field": "e5_description_vector",
        "k": 50,  # Number of nearest neighbors to return as top hits.
        "num_candidates": 500,  # Number of nearest neighbor candidates to consider per shard. Increasing num_candidates tends to improve the accuracy of the final k results.
        "query_vector_builder": {  # Object indicating how to build a query_vector. kNN search enables you to perform semantic search by using a previously deployed text embedding model.
            "text_embedding": {
                "model_id": ".multilingual-e5-small-elasticsearch",  # Text embedding model id
                "model_text": "Comfortable furniture for a large balcony",  # Query
            }
        },
    },
    source_excludes=["*_description_vector"],  # Exclude vector fields from response
)

dense_semantic_search_results = response["hits"]["hits"]
results_list.append({"dense_semantic_search": dense_semantic_search_results})

for hit in dense_semantic_search_results:

    score = hit["_score"]
    product = hit["_source"]["product"]
    category = hit["_source"]["category"]
    description = hit["_source"]["description"]
    print(
        f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )


Score: 0.93147576
Product: Metal Garden Bench with Cushion
Category: Garden Furniture
Description: is a stylish and comfortable metal garden bench, complete with a cushion for added support.


Score: 0.9304026
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.



## Semantic Search with Sparse Vector

In [227]:
# Elastic Learned Sparse Encoder - ELSER

resp = client.search(
    index="ecommerce-search",
    size=2,
    query={
        "sparse_vector": {
            "field": "elser_description_vector",
            "inference_id": ".elser-2-elasticsearch",
            "query": "Comfortable furniture for a large balcony",
        }
    },
    source_excludes=["*_description_vector"],  # Exclude vector fields from response
)


sparse_semantic_search_results = resp["hits"]["hits"]
results_list.append({"sparse_semantic_search": sparse_semantic_search_results})

for hit in sparse_semantic_search_results:

    score = hit["_score"]
    product = hit["_source"]["product"]
    category = hit["_source"]["category"]
    description = hit["_source"]["description"]
    print(
        f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )


Score: 11.1893
Product: Garden Lounge Chair with Sunshade
Category: Garden Furniture
Description: is a comfortable and versatile garden lounge chair with a built-in sunshade, perfect for hot sunny days.


Score: 11.187605
Product: Rattan Patio Conversation Set
Category: Outdoor Furniture
Description: is a stylish and comfortable outdoor furniture set, including a sofa, two chairs, and a coffee table, all made of durable rattan material.



## Hybrid Search - BM25+KNN linear combination

In [229]:
# BM25 + KNN (Linear Combination)
query = "A dining table and comfortable chairs for a large balcony"
resp = client.search(
    index="ecommerce-search",
    size=2,
    query={
        "bool": {
            "should": [
                {
                    "match": {
                        "description": {
                            "query": query,
                            "boost": 1,
                        }
                    }
                }
            ]
        }
    },
    knn={
        "field": "e5_description_vector",
        "k": 2,
        "num_candidates": 20,
        "boost": 1,
        "query_vector_builder": {
            "text_embedding": {
                "model_id": ".multilingual-e5-small-elasticsearch",
                "model_text": query,
            }
        },
    },
    source_excludes=["*_description_vector"],  # Exclude vector fields from response
)

dense_linear_search_results = resp["hits"]["hits"]
results_list.append({"dense_linear_search": dense_linear_search_results})

for hit in dense_linear_search_results:

    score = hit["_score"]
    product = hit["_source"]["product"]
    category = hit["_source"]["category"]
    description = hit["_source"]["description"]
    print(
        f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )


Score: 18.161213
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.


Score: 17.770641
Product: Garden Dining Set with Swivel Rockers
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel rockers for easy movement.



## Hybrid Search - BM25+KNN RRF

In [230]:
# BM25 + KNN (RRF)
top_k = 2
resp = client.search(
    index="ecommerce-search",
    retriever={
        "rrf": {
            "retrievers": [
                {
                    "standard": {
                        "query": {
                            "match": {
                                "description": "A dining table and comfortable chairs for a large balcony"
                            }
                        }
                    }
                },
                {
                    "knn": {
                        "field": "e5_description_vector",
                        "query_vector_builder": {
                            "text_embedding": {
                                "model_id": ".multilingual-e5-small",
                                "model_text": "A dining table and comfortable chairs for a large balcony",
                            }
                        },
                        "k": 2,
                        "num_candidates": 20,
                    }
                },
            ],
            "rank_window_size": 2,
            "rank_constant": 20,
        }
    },
    source_excludes=["*_description_vector"],  # Exclude vector fields from response
)

dense_rrf_search_results = resp["hits"]["hits"]
results_list.append({"dense_rrf_search": dense_rrf_search_results})

for hit in dense_rrf_search_results:

    score = hit["_score"]
    category = hit["_source"]["category"]
    product = hit["_source"]["product"]
    description = hit["_source"]["description"]
    print(
        f"Score: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )

Score: 0.0952381
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.

Score: 0.045454547
Product: Patio Dining Set with Bench
Category: Outdoor Furniture
Description: is a spacious and functional patio dining set, including a dining table, chairs, and a bench for additional seating.



## Hybrid Search - BM25+ELSER linear combination

In [232]:
# BM25 + Elastic Learned Sparse Encoder (Linear Combination)

resp = client.search(
    index="ecommerce-search",
    size=2,
    query={
        "bool": {
            "should": [
                {
                    "match": {
                        "description": {
                            "query": "A dining table and comfortable chairs for a large balcony",
                            "boost": 1,  # You can adjust the boost value
                        }
                    }
                },
                {
                    "sparse_vector": {
                        "field": "elser_description_vector",
                        "inference_id": ".elser-2-elasticsearch",
                        "query": "A dining table and comfortable chairs for a large balcony",
                    }
                },
            ]
        }
    },
    source_excludes=["*_description_vector"],  # Exclude vector fields from response
)

sparse_linear_search_results = resp["hits"]["hits"]
results_list.append({"sparse_linear_search": sparse_linear_search_results})

for hit in sparse_linear_search_results:
    score = hit["_score"]
    product = hit["_source"]["product"]
    category = hit["_source"]["category"]
    description = hit["_source"]["description"]
    print(
        f"\nScore: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )


Score: 33.896286
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.


Score: 32.462887
Product: Patio Dining Set with Bench
Category: Outdoor Furniture
Description: is a spacious and functional patio dining set, including a dining table, chairs, and a bench for additional seating.



## Hybrid Search - BM25+ELSER RRF

In [233]:
# BM25 + ELSER (RRF)
top_k = 2
resp = client.search(
    index="ecommerce-search",
    retriever={
        "rrf": {
            "retrievers": [
                {
                    "standard": {
                        "query": {
                            "match": {
                                "description": "A dining table and comfortable chairs for a large balcony"
                            }
                        }
                    }
                },
                {
                    "standard": {
                        "query": {
                            "sparse_vector": {
                                "field": "elser_description_vector",
                                "inference_id": ".elser-2-elasticsearch",
                                "query": "A dining table and comfortable chairs for a large balcony",
                            }
                        }
                    }
                },
            ],
            "rank_window_size": 2,
            "rank_constant": 20,
        }
    },
    source_excludes=["*_description_vector"],  # Exclude vector fields from response
)

sparse_rrf_search_results = resp["hits"]["hits"]
results_list.append({"sparse_rrf_search_results": sparse_rrf_search_results})

for hit in sparse_rrf_search_results:

    score = hit["_score"]
    category = hit["_source"]["category"]
    product = hit["_source"]["product"]
    description = hit["_source"]["description"]
    print(
        f"Score: {score}\nProduct: {product}\nCategory: {category}\nDescription: {description}\n"
    )

Score: 0.0952381
Product: Garden Dining Set with Swivel Chairs
Category: Garden Furniture
Description: is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.

Score: 0.045454547
Product: Patio Dining Set with Bench
Category: Outdoor Furniture
Description: is a spacious and functional patio dining set, including a dining table, chairs, and a bench for additional seating.



TODO: 
- Semantic Text / Query BUilder (ask Serena)
- Table of Results
- Conclusion
- Next steps


In [None]:
# Flatten results for each search type
rows = []
for result in results_list:
    search_type = list(result.keys())[0]

    for doc in result[search_type]:
        row = {
            "search_type": search_type,
            "product": doc["_source"].get("product"),
            "category": doc["_source"].get("category"),
            "description": doc["_source"].get("description"),
            "score": doc.get("_score"),
        }
        rows.append(row)

df = pd.DataFrame(rows)

for search_type, group in df.groupby("search_type"):
    display(Markdown(f"### {search_type.replace('_', ' ').title()}"))
    styled = (
        group.drop(columns="search_type")
        .reset_index(drop=True)
        .style.set_properties(
            subset=["description"],
            **{"white-space": "pre-wrap", "word-break": "break-word"},
        )
    )
    display(styled)

### Dense Linear Search

Unnamed: 0,product,category,description,score
0,Garden Dining Set with Swivel Chairs,Garden Furniture,"is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.",18.161213
1,Garden Dining Set with Swivel Rockers,Garden Furniture,"is a functional and comfortable garden dining set, including a table and chairs with swivel rockers for easy movement.",17.770641
2,Garden Dining Set with Swivel Chairs,Garden Furniture,"is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.",18.161213
3,Garden Dining Set with Swivel Rockers,Garden Furniture,"is a functional and comfortable garden dining set, including a table and chairs with swivel rockers for easy movement.",17.770641


### Dense Rrf Search

Unnamed: 0,product,category,description,score
0,Garden Dining Set with Swivel Chairs,Garden Furniture,"is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.",0.095238
1,Patio Dining Set with Bench,Outdoor Furniture,"is a spacious and functional patio dining set, including a dining table, chairs, and a bench for additional seating.",0.045455


### Dense Semantic Search

Unnamed: 0,product,category,description,score
0,Metal Garden Bench with Cushion,Garden Furniture,"is a stylish and comfortable metal garden bench, complete with a cushion for added support.",0.931476
1,Garden Dining Set with Swivel Chairs,Garden Furniture,"is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.",0.930403


### Lexical Search

Unnamed: 0,product,category,description,score
0,Barbie Dreamhouse,Toys,"is a classic Barbie playset with multiple rooms, furniture, a large balcony, a pool, and accessories. It allows kids to create their dream Barbie world.",13.408413
1,Rattan Patio Conversation Set,Outdoor Furniture,"is a stylish and comfortable outdoor furniture set, including a sofa, two chairs, and a coffee table, all made of durable rattan material.",7.504859


### Sparse Linear Search

Unnamed: 0,product,category,description,score
0,Garden Dining Set with Swivel Chairs,Garden Furniture,"is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.",33.896286
1,Patio Dining Set with Bench,Outdoor Furniture,"is a spacious and functional patio dining set, including a dining table, chairs, and a bench for additional seating.",32.462887
2,Garden Dining Set with Swivel Chairs,Garden Furniture,"is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.",33.896286
3,Patio Dining Set with Bench,Outdoor Furniture,"is a spacious and functional patio dining set, including a dining table, chairs, and a bench for additional seating.",32.462887


### Sparse Rrf Search Results

Unnamed: 0,product,category,description,score
0,Garden Dining Set with Swivel Chairs,Garden Furniture,"is a functional and comfortable garden dining set, including a table and chairs with swivel seats for convenience.",0.095238
1,Patio Dining Set with Bench,Outdoor Furniture,"is a spacious and functional patio dining set, including a dining table, chairs, and a bench for additional seating.",0.045455


### Sparse Semantic Search

Unnamed: 0,product,category,description,score
0,Garden Lounge Chair with Sunshade,Garden Furniture,"is a comfortable and versatile garden lounge chair with a built-in sunshade, perfect for hot sunny days.",11.1893
1,Rattan Patio Conversation Set,Outdoor Furniture,"is a stylish and comfortable outdoor furniture set, including a sofa, two chairs, and a coffee table, all made of durable rattan material.",11.187605
