# Semantic Search using ELSER v2 text expansion (Restaurants in Ann Arbor)

Learn how to use the [ELSER](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html) for text expansion-powered semantic search.

**`Note:`** This notebook demonstrates how to use ELSER model `.elser_model_2` model which offers an improved retrieval accuracy.

First, we need to import the modules we need.
🔐 NOTE: `getpass` enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.

In [1]:
from elasticsearch import Elasticsearch, helpers, exceptions
from urllib.request import urlopen
from getpass import getpass
import json
import time
import glob

Now we can instantiate the Python Elasticsearch client.

First we prompt the user for their password and Cloud ID.
Then we create a `client` object that instantiates an instance of the `Elasticsearch` class.

In [2]:
# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")

# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
ELASTIC_API_KEY = getpass("Elastic Api Key: ")

# Create the client instance
client = Elasticsearch(
    # For local development
    # hosts=["http://localhost:9200"]
    cloud_id=ELASTIC_CLOUD_ID,
    api_key=ELASTIC_API_KEY,
)

Refer to https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect to a self-managed deployment.

Read https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect using API keys.


# Download and Deploy ELSER Model

In [3]:
# delete model if already downloaded and deployed
try:
    client.ml.delete_trained_model(model_id=".elser_model_2", force=True)
    print("Model deleted successfully, We will proceed with creating one")
except exceptions.NotFoundError:
    print("Model doesn't exist, but We will proceed with creating one")

# Creates the ELSER model configuration. Automatically downloads the model if it doesn't exist.
client.ml.put_trained_model(
    model_id=".elser_model_2", input={"field_names": ["text_field"]}
)

Model deleted successfully, We will proceed with creating one


ObjectApiResponse({'model_id': '.elser_model_2', 'model_type': 'pytorch', 'model_package': {'packaged_model_id': 'elser_model_2', 'model_repository': 'https://ml-models.elastic.co', 'minimum_version': '11.0.0', 'size': 438123914, 'sha256': '2e0450a1c598221a919917cbb05d8672aed6c613c028008fedcd696462c81af0', 'metadata': {}, 'tags': [], 'vocabulary_file': 'elser_model_2.vocab.json'}, 'created_by': 'api_user', 'version': '12.0.0', 'create_time': 1732920936906, 'model_size_bytes': 0, 'estimated_operations': 0, 'license_level': 'platinum', 'description': 'Elastic Learned Sparse EncodeR v2', 'tags': ['elastic'], 'metadata': {}, 'input': {'field_names': ['text_field']}, 'inference_config': {'text_expansion': {'vocabulary': {'index': '.ml-inference-native-000002'}, 'tokenization': {'bert': {'do_lower_case': True, 'with_special_tokens': True, 'max_sequence_length': 512, 'truncate': 'first', 'span': -1}}}}, 'location': {'index': {'name': '.ml-inference-native-000002'}}})

The above command will download the ELSER model. This will take a few minutes to complete. Use the following command to check the status of the model download.

In [4]:
while True:
    status = client.ml.get_trained_models(
        model_id=".elser_model_2", include="definition_status"
    )

    if status["trained_model_configs"][0]["fully_defined"]:
        print("ELSER Model is downloaded and ready to be deployed.")
        break
    else:
        print("ELSER Model is downloaded but not ready to be deployed.")
    time.sleep(5)

ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded and ready to be deployed.


Once the model is downloaded, we can deploy the model in our ML node. Use the following command to deploy the model.

In [5]:
# Start trained model deployment if not already deployed
client.ml.start_trained_model_deployment(
    model_id=".elser_model_2", number_of_allocations=1, wait_for="starting"
)

while True:
    status = client.ml.get_trained_models_stats(
        model_id=".elser_model_2",
    )
    if status["trained_model_stats"][0]["deployment_stats"]["state"] == "started":
        print("ELSER Model has been successfully deployed.")
        break
    else:
        print("ELSER Model is currently being deployed.")
    time.sleep(5)

ELSER Model is currently being deployed.
ELSER Model is currently being deployed.
ELSER Model has been successfully deployed.


This also will take a few minutes to complete.

# Indexing Documents with ELSER

In order to use ELSER on our Elastic Cloud deployment we'll need to create an ingest pipeline that contains an inference processor that runs the ELSER model.
Let's add that pipeline using the [`put_pipeline`](https://www.elastic.co/guide/en/elasticsearch/reference/master/put-pipeline-api.html) method.

In [6]:
client.ingest.put_pipeline(
    id="elser-ingest-pipeline",
    description="Ingest pipeline for ELSER",
    processors=[
        {
            "inference": {
                "model_id": ".elser_model_2",
                "input_output": [
                    {"input_field": "Description", "output_field": "description_embedding"}
                ],
            }
        }
    ],
)

ObjectApiResponse({'acknowledged': True})

Let's note a few important parameters from that API call:

- `inference`: A processor that performs inference using a machine learning model.
- `model_id`: Specifies the ID of the machine learning model to be used. In this example, the model ID is set to `.elser_model_2`.
- `input_output`: Specifies input and output fields
- `input_field`: Field name from which the `sparse_vector` representation are created.
- `output_field`:  Field name which contains inference results. 

## Create index


In [18]:
client.indices.delete(index="elser-example-restaurants", ignore_unavailable=True)
client.indices.create(
    index="elser-example-restaurants",
    settings={"index": {"default_pipeline": "elser-ingest-pipeline"}},
    mappings={
        "properties": {
            "Description": {
                "type": "text",
                "fields": {"keyword": {"type": "keyword", "ignore_above": 256}},
            },
            "description_embedding": {"type": "sparse_vector"},
            "docid": { "type": "keyword" },
            "Title": { "type": "text" },
            "Price": { "type": "keyword" },
            "Place ID": { "type": "text" },
            "Type ID": { "type": "keyword" },
            "Type": { "type": "keyword" },
            "Menu": {
                "type": "object",
                "properties": {
                    "link": {"type": "text"},
                    "source": {"type": "text"}
                }
            },
            # "Menu Items": { "type": "keyword"},
            "Address": { "type": "text" },
            "GPS Coordinates": { "type": "object",
                  "properties": {
                      "latitude": { "type": "float" },
                      "longitude": { "type": "float"}
                  }
            },
            "Phone Number": { "type": "keyword" },
            "Rating": { "type": "float" },
            "Rating Summary": {
                "type": "nested",  # List of dictionaries
                "properties": {
                    "stars": { "type": "integer" },
                    "amount": { "type": "integer" }
                }
            },
            "User Reviews": {
                "type": "nested",
                "properties": {
                    "summary": {
                        "type": "nested",
                        "properties": {
                            "snippet": {"type": "text"}
                        }
                    },
                    "most_relevant": {
                        "type": "nested",
                        "properties": {
                            "username": {"type": "text"},
                            "rating": {"type": "integer"},
                            "contributor_id": {"type": "text"},
                            "description": {"type": "text"},
                            "link": {"type": "text"},
                            "images": {
                                "type": "nested",
                                "properties": {
                                    "thumbnail": {"type": "text"}
                                }
                            },
                            "date": {"type": "text"}
                        }
                    }
                }
            },
            "Opening Hours": {
                "type": "nested",
                "properties": {
                    "friday": {"type": "text"},
                    "saturday": {"type": "text"},
                    "sunday": {"type": "text"},
                    "monday": {"type": "text"},
                    "tuesday": {"type": "text"},
                    "wednesday": {"type": "text"},
                    "thursday": {"type": "text"}
                }
            },
            "Details": {
                "type": "nested",
                "properties": {
                    "popular_for": {"type": "keyword"},
                    "accessibility": {"type": "keyword"},
                    "offerings": {"type": "keyword"},
                    "dining_options": {"type": "keyword"},
                    "amenities": {"type": "keyword"},
                    "atmosphere": {"type": "keyword"},
                    "crowd": {"type": "keyword"},
                    "payments": {"type": "keyword"},
                    "children": {"type": "keyword"}
                }
            },
            "Services": {
                "type": "object",
                "properties": {
                    "dine_in": {"type": "boolean"},
                    "takeout": {"type": "boolean"}
                }
            }
        }
    },
)


ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-example-restaurants'})

## Insert Documents

In [19]:
# Helper function to load data
def load_data(files):
    for file in files:
        with open(file, 'r') as f:
            restaurant = json.load(f)
            yield {
                "_index": "elser-example-restaurants",
                "_id": restaurant["docid"],
                "_source": restaurant
            }

files = glob.glob("data/*.json")

# Indexing data to Elasticsearch
success, failed = 0, 0
try:
    for ok, action in helpers.streaming_bulk(client, load_data(files)):
        if ok:
            success += 1
        else:
            failed += 1
except helpers.BulkIndexError as e:
    print(f"Bulk indexing error: {e}")
    for error in e.errors:
        print(json.dumps(error, indent=2))

print(f"Successfully indexed {success} documents.")
print(f"Failed to index {failed} documents.")

Successfully indexed 194 documents.
Failed to index 0 documents.


In [20]:
mapping = client.indices.get_mapping(index="elser-example-restaurants")
print(mapping)

{'elser-example-restaurants': {'mappings': {'properties': {'Address': {'type': 'text'}, 'Description': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'Details': {'type': 'nested', 'properties': {'accessibility': {'type': 'keyword'}, 'amenities': {'type': 'keyword'}, 'atmosphere': {'type': 'keyword'}, 'children': {'type': 'keyword'}, 'crowd': {'type': 'keyword'}, 'dining_options': {'type': 'keyword'}, 'from_the_business': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'highlights': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'offerings': {'type': 'keyword'}, 'parking': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'payments': {'type': 'keyword'}, 'pets': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'planning': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'popular_for

# Searching Documents

In [21]:
response = client.search(
    index="elser-example-restaurants",
    size=10,
    query={
        "text_expansion": {
            "description_embedding": {
                "model_id": ".elser_model_2",
                "model_text": "show me the best restaurants to get a steak at in ann arbor",
            }
        }
    },
)


for hit in response["hits"]["hits"]:
    doc_id = hit["_id"]
    score = hit["_score"]
    title = hit["_source"].get("Title", "No Title")
    description = hit["_source"].get("Description", "No Description")
    print(f"Score: {score}\nTitle: {title} - {doc_id}\nDescription: {description}\n")

Score: 12.648598
Title: Knight's Steakhouse - 6
Description: Old-school outfit serving a variety of steaks, seafood & burgers in a comfy setting with a full bar.

Score: 12.648598
Title: Knight's Steakhouse - 21
Description: Old-school outfit serving a variety of steaks, seafood & burgers in a comfy setting with a full bar.

Score: 11.546998
Title: Gandy Dancer - 4
Description: Elegant restaurant in a restored 1886 building serving seafood options, as well as steak & pasta.

Score: 10.859073
Title: Texas Roadhouse - 97
Description: Lively chain steakhouse serving American fare with a Southwestern spin amid Texas-themed decor.

Score: 10.459659
Title: The Chop House Ann Arbor - 13
Description: Chophouse decorated with elegant gas lamps offers premium steak, wine & interactive tablet menus.

Score: 10.288989
Title: Mister Spots Ann Arbor - 131
Description: Casual joint serving Philadelphia-style hoagies, steak sandwiches & signature wings.

Score: 10.247902
Title: Texas de Brazil - Ann A

  response = client.search(


------------------