*Reranking with a locally hosted reranker model from HuggingFace*

# Setup the notebook

## Install required libs

In [None]:
!pip install -qqU elasticsearch
!pip install -qqU eland[pytorch]
!pip install -qqU datasets

## Import the required python libraries

In [None]:
import os
from elasticsearch import Elasticsearch, helpers, exceptions
from urllib.request import urlopen
from getpass import getpass
import json
import time
from datasets import load_dataset
import pandas as pd

## Create an Elasticsearch Python client



In [None]:
es_url="http://kubernetes-vm:9200"
es_user="elastic"
es_pass="changeme"

es = Elasticsearch(
    hosts = [es_url],
    basic_auth=(es_user, es_pass)
)

try:
    es.info()
    print("Successfully connected to Elasticsearch!")
except exceptions.ConnectionError as e:
    print(f"Error connecting to Elasticsearch: {e}")

# Ready Elasticsearch

## Hugging Face Reranking Model
Run this cell to:
- Use Eland's `eland_import_hub_model` command to upload the reranking model to Elasticsearch.

For this example we've chosen the [`cross-encoder/ms-marco-MiniLM-L-6-v2`](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) text similarity model.
<br><br>
**Note**:
While we are importing the model for use as a reranker, Eland and Elasticsearch do not have a dedicated rerank task type, so we still use `text_similarity`

In [None]:
model_id = "cross-encoder/ms-marco-MiniLM-L-6-v2"

!eland_import_hub_model \
  --url $es_url \
  -u $es_user \
  -p $es_pass \
  --hub-model-id $model_id \
  --task-type text_similarity

## Create Inference Endpoint
Run this cell to:
- Create an inference Endpoint
- Deploy the reranking model we impoted in the previous section
We need to create an endpoint queries can use for reranking

Key points about the `model_config`
- `service` - in this case `elasticsearch` will tell the inference API to use a locally hosted (in Elasticsearch) model
- `num_allocations` sets the number of allocations to 1
    - Allocations are independent units of work for NLP tasks. Scaling this allows for an increase in concurrent throughput
- `num_threads` - sets the number of threads per allocation to 1
    - Threads per allocation affect the number of threads used by each allocation during inference. Scaling this generally increased the speed of inference requests (to a point).
- `model_id` - This is the id of the model as it is named in Elasticsearch



In [None]:
model_config = {
    "service": "elasticsearch",
    "service_settings": {
        "num_allocations": 1,
        "num_threads": 1,
        "model_id": "cross-encoder__ms-marco-minilm-l-6-v2",
    },
    "task_settings": {"return_documents": True},
}

inference_id = "semantic-reranking"

create_endpoint = es.inference.put(
    inference_id=inference_id, task_type="rerank", body=model_config
)

create_endpoint.body

### Verify it was created

- Run the two cells in this section to verify:
- The Inference Endpoint has been completed
- The model has been deployed

You should see JSON output with information about the semantic endpoint

In [None]:
check_endpoint = es.inference.get(
    inference_id=inference_id,
)

check_endpoint.body

## Create the index mapping

We are going to index the `title` and `abstract` from the dataset.  

In [None]:
index_name = "arxiv-papers"

index_mapping = {
    "mappings": {
        "properties": {"title": {"type": "text"}, "abstract": {"type": "text"}}
    }
}


try:
    es.indices.create(index=index_name, body=index_mapping)
    print(f"Index '{index_name}' created successfully.")
except exceptions.RequestError as e:
    if e.error == "resource_already_exists_exception":
        print(f"Index '{index_name}' already exists.")
    else:
        print(f"Error creating index '{index_name}': {e}")

## Ready the dataset
We are going to use the [CShorten/ML-ArXiv-Papers](https://huggingface.co/datasets/CShorten/ML-ArXiv-Papers) dataset.

## Download Dataset
**Note** You may get a warking *The secret `HF_TOKEN` does not exist in your Colab secrets*.

You can safely ignore this.

In [None]:
dataset = load_dataset("CShorten/ML-ArXiv-Papers")

### Index into Elasticsearch

We will loop through the dataset and send batches of rows to Elasticsearch
- This may take a couple minutes depending on your cluster sizing.

In [None]:
def bulk_insert_elasticsearch(dataset, index_name, chunk_size=1000):
    actions = []
    for record in dataset:
        action = {
            "_index": index_name,
            "_source": {"title": record["title"], "abstract": record["abstract"]},
        }
        actions.append(action)

        if len(actions) == chunk_size:
            helpers.bulk(es, actions)
            actions = []

    if actions:
        helpers.bulk(es, actions)


bulk_insert_elasticsearch(dataset["train"], index_name)

# Query with Reranking

This containes a `text_similarity_reranker` retriever which:
1. Uses a Standard Retriever to :
    1. Perform a lexical query against `title field
2. Perform a reranking:
    1. Taks as input the top 100 results from the previous search
      - `"rank_window_size": 100`
    2. Taks as input the query
      - `"inference_text": query`
    3.  Uses our previously created reranking API and model


In [None]:
query = "sparse vector embedding"

# Query scored from score
response_scored = es.search(
    index="arxiv-papers",
    body={
        "size": 10,
        "retriever": {"standard": {"query": {"match": {"title": query}}}},
        "fields": ["title", "abstract"],
        "_source": False,
    },
)

# Query with Semantic Reranker
response_reranked = es.search(
    index="arxiv-papers",
    body={
        "size": 10,
        "retriever": {
            "text_similarity_reranker": {
                "retriever": {"standard": {"query": {"match": {"title": query}}}},
                "field": "abstract",
                "inference_id": "semantic-reranking",
                "inference_text": query,
                "rank_window_size": 100,
            }
        },
        "fields": ["title", "abstract"],
        "_source": False,
    },
)

## Print the table comparing the scored and reranked results

In [None]:
titles_scored = [
    paper["fields"]["title"][0] for paper in response_scored.body["hits"]["hits"]
]
titles_reranked = [
    paper["fields"]["title"][0] for paper in response_reranked.body["hits"]["hits"]
]

# Creating a DataFrame
df = pd.DataFrame(
    {"Scored Results": titles_scored, "Reranked Results": titles_reranked}
)

df_styled = df.style.set_properties(**{"text-align": "left"}).set_caption(
    f"Comparison of Scored and Semantic Reranked Results - Query: '{query}'"
)

# Display the table
df_styled

## Print out Title and Abstract
This will print the title and the abstract for the top 10 results after semantic reranking.

In [None]:
for paper in response_scored.body["hits"]["hits"]:
    print(
        f"Title {paper['fields']['title'][0]} \n  Abstract: {paper['fields']['abstract'][0]}"
    )