# 1) Prepare OpenSearch

We need to do a couple of things first to make OpenSearch ready for Hybrid Search

In this notebook we'll learn about how to use a ML model to enrich our Icecat data with vectors, and then we'll balance a lexical search with a neural search.

### Prerequesites

Make sure to have OpenSearch running.

We are making use of different features in OpenSearch, mainly:

* [Using pre-trained models in OpenSearch](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models)
* [Hybrid search](https://opensearch.org/docs/latest/search-plugins/hybrid-search/)

In [3]:
import requests
import json
import mercury as mr
import time

app = mr.App(title="Let's Run a Hybrid Search", static_notebook=True)

## Configure ML Commons plugin

The ML Commons plugin lets us use machine learning models during index and query time.

We configure it to be able to run on the same node that holds our indexed data (`"only_run_on_ml_node": "false"`). In a production setting we'd likely want to move machine learning workloads to dedicated ML nodes (`"only_run_on_ml_node": "true"`).

In [6]:
url = "http://localhost:9200/_cluster/settings"
headers = {
    'Content-Type': 'application/json'
}

payload = {
    "persistent": {
        "plugins": {
            "ml_commons": {
                "only_run_on_ml_node": "false",
                "model_access_control_enabled": "true",
                "native_memory_threshold": "99"
            }
        }
    }
}


response = requests.request("PUT", url, headers=headers, data=json.dumps(payload))

In [7]:
mr.JSON(response.json(), level=4)

## Register a model group

First we register a model group, next we assign a model to this group. This is mainly a way to organise models in OpenSearch.

We need to grab the model_group_id from the OpenSearch response to register the model in the correct group.

Learn more about registering a model at https://opensearch.org/docs/latest/ml-commons-plugin/api/model-apis/register-model/.

In [8]:
url = "http://localhost:9200/_plugins/_ml/model_groups/_register"

payload = {
  "name": "NLP_model_group",
  "description": "A model group for NLP models"
}


response = requests.request("POST", url, headers=headers, data=json.dumps(payload))
mr.JSON(response.json(), level=4)

model_group_id = response.json()['model_group_id']

print(f"Created Model Group ID {model_group_id}")

Created Model Group ID y5rgMpIBFSlgWAuG7TM5


## Register the model to the model group
Under the covers we are downloading from HuggingFace ![image.png](attachment:a5c26b69-d076-48bc-912e-2c7554fffbca.png) the model and saving it into our cluster. This can take some time!

We are using the model `all-MiniLM-L6-v2`, one of several options hosted on HuggingFace to choose from in OpenSearch. You can find a list of supported pretrained models in the [OpenSearch docs](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models#supported-pretrained-models).

Note that `all-MiniLM-L6-v2` uses 384 dimensions. This will become important later.

In [9]:
url = "http://localhost:9200/_plugins/_ml/models/_register"

# Previously had huggingface/sentence-transformers/msmarco-distilbert-base-tas-b

payload = {
  "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
  "version": "1.0.1",
  "model_group_id": model_group_id,
  "model_format": "TORCH_SCRIPT"
}


response = requests.request("POST", url, headers=headers, data=json.dumps(payload))
mr.JSON(response.json(), level=4)
task_id = response.json()['task_id']

## Check status of registering model

Since downloading and registering the model may take a while we need to check the status of this operation and wait for it to complete before we proceed.

In [10]:
url = f"http://localhost:9200/_plugins/_ml/tasks/{task_id}"

max_attempts = 10
attempts = 0

state = None
while state != 'COMPLETED' and attempts < max_attempts:
    time.sleep(5) # wait five second and then check again.  We are downloading msmarco model from huggingface.
    response = requests.request("GET", url, headers=headers)
    #mr.JSON(response.json(), level=4)
    print(response.json()['state'])
    state = response.json()['state']
    

model_id = response.json()['model_id']

print(f"Created Model ID {model_id}")

COMPLETED
Created Model ID zZrgMpIBFSlgWAuG9zO3


## Deploy the model
Under the covers we are downloading and deploying into OpenSearch pytorch4j!

In [11]:
url = f"http://localhost:9200/_plugins/_ml/models/{model_id}/_deploy"

response = requests.request("POST", url, headers=headers)
mr.JSON(response.json(), level=4)
deploy_model_task_id = response.json()['task_id']

## Check the status

We need to check the status again before proceeding to make sure it's available on our OpenSearch node.

In [12]:
url = f"http://localhost:9200/_plugins/_ml/tasks/{deploy_model_task_id}"

max_attempts = 10
attempts = 0

state = None
while state != 'COMPLETED' and attempts < max_attempts:
    attempts += 1
    time.sleep(5) # wait five second and then check again.  Deploying the model can take a while
    response = requests.request("GET", url, headers=headers)
    #mr.JSON(response.json(), level=4)
    print(response.json()['state'])
    state = response.json()['state']
    
    

model_id = response.json()['model_id']

print(f"Finished Deploying Model ID {model_id}")

RUNNING
RUNNING
COMPLETED
Finished Deploying Model ID zZrgMpIBFSlgWAuG9zO3


### Test out using the Model to create sentence embeddings

To test the model to get a sense what it's doing behind the scenes. We can do that without actually indexing or querying data.

We use the `_predict` endpoint and the registered model to create embeddings for a textual input. Embeddings are numerical representation of text that machines can work with. These embeddings are the actual things we store when we do neural search. Another word for embedding is vector.
We also create embeddings for the query and then the search engine is able to calculate the distance between the indexed vector and the query vector to see how similar they are.

For now we'll only look at how these embeddings look like for an example string `today is sunny`.

Feel free to expand the returned data by hitting the + symbol in the response.
You will notice that the returned data has 384 items - the exact number of dimensions the model has. The second time we encounter this number!

In [13]:
url = f"http://localhost:9200/_plugins/_ml/_predict/text_embedding/{model_id}"

payload = {
  "text_docs":[ "today is sunny"],
  "return_number": True,
  "target_response": ["sentence_embedding"]
}


response = requests.request("POST", url, headers=headers, data=json.dumps(payload))
mr.JSON(response.json(), level=5)

## Create an Ingest pipeline
Here we are creating an embedding using our model on the title field called `title_embedding` that we can refer to later. During index time the content of the field `title` is used and embeddings are generated by the registered model.

The generated embeddings will be stored in the field `title_embedding`.


Learn more at https://opensearch.org/docs/latest/ingest-pipelines/processors/text-embedding/.

In [15]:
url = "http://localhost:9200/_ingest/pipeline/nlp-ingest-pipeline"

payload = {
  "description": "A text embedding pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": model_id,
        "field_map": {
          "product_title": "title_embedding"
        }
      }
    }
  ]
}


response = requests.request("PUT", url, headers=headers, data=json.dumps(payload))
mr.JSON(response.json(), level=4)
