# Semantic Reranking with ElasticSearch and HuggingFace

In this example, we will implement semantic reranking in ElasticSearch by uploading a model from HuggingFace into an ElasticSearch cluster. We will use the `retriever` abstraction, a simpler ElasticSearch syntax for crafting queries and combining different search operations.

Make sure you have `ELASTIC_CLOUD_ID` and `ELASTIC_DEPL_API_KEY` ready.

## Setups

In [None]:
!pip install -qU elasticsearch eland[pytorch]

In [None]:
from elasticsearch import Elasticsearch, helpers

## Initialize Elasticsearch python client

In [None]:
from google.colab import userdata

ELASTIC_CLOUD_ID = userdata.get('ELASTIC_CLOUD_ID')
ELASTIC_DEPL_API_KEY = userdata.get('ELASTIC_DEPL_API_KEY')

es = Elasticsearch(
    cloud_id=ELASTIC_CLOUD_ID,
    api_key=ELASTIC_DEPL_API_KEY,
)

In [None]:
print(client.info())

## Load data

In this example, we will load a small dataset of movies.

In [None]:
from urllib.request import urlopen
import json
import time

url = "https://huggingface.co/datasets/leemthompo/small-movies/raw/main/small-movies.json"
response = urlopen(url)

data_json = json.loads(response.read())

# Prepare the documents to be indexed
documents = []
for doc in data_json:
    documents.append({
        '_index': 'movies',
        '_source': doc
    })

# Use helpers.bulk to index
helpers.bulk(client, documents)
print('Done indexing documents into `movies` index')
time.sleep(3)

## Upload HuggingFace model using Eland

We will use Eland's `eland_import_hub_model` command to upload the model to ElasticSearch. In this example, we will upload the `cross-encoder/ms-macro-MiniLM-L-6-v2` text similarity model.

In [None]:
!eland_import_hub_model \
  --cloud-id $ELASTIC_CLOUD_ID \
  --es-api-key $ELASTIC_API_KEY \
  --hub-model-id cross-encoder/ms-marco-MiniLM-L-6-v2 \
  --task-type text_similarity \
  --clear-previous \
  --start

## Create inference endpoint

Next we will create an inference endpoint for the `rerank` task to deploy and manage our mdoel and, if necessary, spin up the necessary ML resources behind the scenes.

In [None]:
client.inference.put(
    task_type='rerank',
    inference_id='my-msmarco-minilm-model',
    inference_config={
        'service': 'elasticsearch',
        'service_settings': {
            'model_id': 'cross-encoder__ms-marco-minilm-l-6-v2',
            'num_allocations': 1,
            'num_threads': 1
        }
    }
)

In [None]:
client.inference.get()

When we deploy our model, we may need to sync our ML saved objects in the Kibana (or Serverless) UI.

## Lexical queries

We will start with a `standard` retriever to test out some lexical (or full-text) searches and then we will compare the improvments when we layer in semantic reranking.

### Lexical match with `query_string` query

Assuming that we vaguely remember that there is a famous movie about a killer who eats his victims and we pretend we have forgotten the word "cannibal".

We can perform a query `query_string` to find the phrase `"flesh-eating bad guy"` in the `plot` field of our ElasticSearch documents:

In [None]:
resp = client.search(
    index='movies',
    retriever={
        'standard': {
            'query': {
                'query_string': {
                    'query': 'flesh-eating bad guy',
                    'default_field': 'plot'
                }
            }
        }
    }
)

if resp['hits']['hits']:
    for hit in resp['hits']['hits']:
        title = hit['_source']['title']
        plot = hit['_source']['plot']
        print(f"Title: {title}\nPlot: {plot}\n")
else:
    print('No search results found.')

### Simple `multi_match` query

This lexical query performs a standard keyword search for the term `"crime"` within the `"plot"` and `"genre"` fields of our ElasticSearch documents.

In [None]:
resp = client.search(
    index='movies',
    retriever={
        'standard': {
            'query': {
                'multi_match': {
                    'query': 'crime',
                    'fields': ['plot', 'genre']
                }
            }
        }
    }
)

for hit in resp['hits']['hits']:
    title = hit['_source']['title']
    plot = hit['_source']['plot']
    print(f"Title: {title}\nPlot: {plot}\n")

Note that the searched term is more broad instead of just "flesh-eating bad guy".

## Semantic reranker

Now we will wrap our standard query retriever in a `text_similarity_reranker`. This allows us to leverage the NLP model we deployed to ElasticSearch to rerank the results based on the phrase "flesh-eating bad guy".

In [None]:
resp = client.search(
    index='movies',
    retriever={
        'text_similarity_reranker': {
            'retriever': {
                'standard': {
                    'query': {
                        'multi_match': {
                            'query': 'crime',
                            'fields': ['plot', 'genre']
                        }
                    }
                }
            },
            'field': 'plot',
            'inference_id': 'my-msmarco-minilm-model',
            'inference_text': 'flesh-eating bad guy'
        }
    }
)

for hit in resp['hits']['hits']:
    title = hit['_source']['title']
    plot = hit['_source']['plot']
    print(f"Title: {title}\nPlot: {plot}\n")

Semantic reranking helped us find the most relevant result by parsing a natural language query, overcoming the limitations of lexical search which relies more on exact matching. Semantic reranking enables semantic search in a few steps, without the need for generating and storing embeddings. Being able to use open source models hosted on Hugging Face natively in our Elasticsearch cluster is great for prototyping, testing, and building search experiences.