**Locally Hosted Semantic Reranker**

# Objectives

In this challenge you will:
- Learn how to load a semantic reranker into Elasticsearch with Eland
- Create a reranker inference API
- Modify the query to use the reranker as part of the query to gather contextual documents

## If this is your first time using Jupyter notebook:

<img src="https://play.instruqt.com/assets/tracks/xh4efwjkleh1/9de47748dceadc1b6546908519ea4ba6/assets/CleanShot%202024-09-12%20at%2014.06.51%402x.png" width="150"/>
Click the small play icon to the left of the cell<br>

<img src="https://play.instruqt.com/assets/tracks/xh4efwjkleh1/f7949234f997ba39ff8879304648efaa/assets/CleanShot%202024-09-12%20at%2014.07.22%402x.png" width="150"/>
If the cell runs successfully you will see a green check markt at the bottom left in the cell<br>

<img src="https://play.instruqt.com/assets/tracks/xh4efwjkleh1/0fd068121e9d48f13b49d8e02a21fe42/assets/CleanShot%202024-09-12%20at%2014.09.32%402x.png" width="150"/>
If there is an error, you will see a red x and may see error output below

# Setup

Run the cells in this section to:
- Import the required libraries
- Create an elasticsearch python client connection


These should already be installed in your notebook environment.
You can uncomment and run if needed

In [None]:
#!pip install -qU elasticsearch
#!pip install -qU eland[pytorch]

Import the required python libraries

In [None]:
import os
from elasticsearch import Elasticsearch, helpers, exceptions
from urllib.request import urlopen
from getpass import getpass
import json
import time

Create an Elasticsearch Python client

In [None]:
es = Elasticsearch(
    hosts = ["http://kubernetes-vm:9200"],
    basic_auth=("elastic", "changeme")
)

# Upload Hugging Face model with Eland
Run this cell to:
- Upload the model from Hugging Face to Elasticsearch
- Use Eland's `eland_import_hub_model` command to upload the model to Elasticsearch.

For this example we've chosen the [`cross-encoder/ms-marco-MiniLM-L-6-v2`](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) text similarity model.
<br><br>
**Note**:
While we are importing the model for use as a reranker, Eland and Elasticsearch do not have a dedicated rerank task type, so we still use `text_similarity`

In [None]:
MODEL_ID = "cross-encoder/ms-marco-MiniLM-L-6-v2"

!eland_import_hub_model \
  --url "http://kubernetes-vm:9200" \
  -u "elastic" \
  -p "changeme" \
  --hub-model-id $MODEL_ID \
  --task-type text_similarity

# Create Inference Endpoint
Run this cell to:
- Create an inference Endpoint
- Deploy the reranking model we impoted in the previous section
We need to create an endpoint queries can use for reranking

Key points about the `model_config`
- `service` - in this case `elasticsearch` will tell the inference API to use a locally hosted (in Elasticsearch) model
- `num_allocations` sets the number of allocations to 1
    - Allocations are independent units of work for NLP tasks. Scaling this allows for an increase in concurrent throughput
- `num_threads` - sets the number of threads per allocation to 1
    - Threads per allocation affect the number of threads used by each allocation during inference. Scaling this generally increased the speed of inference requests (to a point).
- `model_id` - This is the id of the model as it is named in Elasticsearch



In [None]:
model_config = {
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": "cross-encoder__ms-marco-minilm-l-6-v2"
  },
      "task_settings": {
        "return_documents": True
    }
}

inference_id = "semantic-reranking"

create_endpoint = es.inference.put(
    inference_id=inference_id,
    task_type="rerank",
    body=model_config
)

create_endpoint.body

###Verify it was created

- Run the two cells in this section to verify:
- The Inference Endpoint has been completed
- The model has been deployed

You should see JSON output with information about the semantic endpoint

In [None]:
check_endpoint = es.inference.get(
    inference_id=inference_id,
)

check_endpoint.body

Verify the model was successfully deployed

The cell below should return `started`




In [None]:
ES_MODEL_ID = "cross-encoder__ms-marco-minilm-l-6-v2"

model_info = es.ml.get_trained_models_stats(model_id=ES_MODEL_ID)

model_info.body['trained_model_stats'][0]['deployment_stats']['nodes'][0]['routing_state']['routing_state']

# Query with Reranking

This containes a `text_similarity_reranker` retriever which:
1. Uses a Standard Retriever to :
    1. Perform a semantic query against the chunked ELSER embeddings
    2. Return the top 2 inner hit chunks
2. Perform a reranking:
    1. Taks as input the top 50 results from the previous search
      - `"rank_window_size": 50`
    2. Taks as input the uer's question
      - `"inference_text": USER_QUESTION`
    3.  Uses our previously created reranking API and model


In [None]:
USER_QUESTION = "Where can I get good pizza?"

response = es.search(
    index="restaurant_reviews",
    body={
      "retriever": {
        "text_similarity_reranker": {
          "retriever": {
            "standard": {
              "query": {
                "nested": {
                  "path": "semantic_body.inference.chunks",
                  "query": {
                    "sparse_vector": {
                      "inference_id": "my-elser-endpoint",
                      "field": "semantic_body.inference.chunks.embeddings",
                      "query": USER_QUESTION
                    }
                  },
                  "inner_hits": {
                    "size": 2,
                    "name": "restaurant_reviews.semantic_body",
                    "_source": [
                      "semantic_body.inference.chunks.text"
                    ]
                  }
                }
              }
            }
          },
          "field": "Review",
          "inference_id": "semantic-reranking",
          "inference_text": USER_QUESTION,
          "rank_window_size": 50
        }
      }
    }
)

response.raw

Print out the formatted response

In [None]:
for review in response.raw['hits']['hits']:
    print(f"Restaurant {review['_source']['Restaurant']} - Rating: {review['_source']['Rating']} - Reviewer: {review['_source']['Reviewer']}")
