![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# Evaluating search techniques

## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/vector-search/04_search_benchmark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### Install Packages

In [None]:
%pip install -q "redisvl>=0.4.1" sentence-transformers pandas "redis>=5.2.0" beir bm25s elasticsearch

### Data/Index Preparation

In [12]:
# load beir dataset
from beir import util
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval

# link for dataset: https://sites.google.com/view/fiqa/
dataset = "fiqa"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{dataset}.zip"

out_dir = "./beir_datasets"
data_path = util.download_and_unzip(url, out_dir)

corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")

  from tqdm.autonotebook import tqdm


  0%|          | 0/57638 [00:00<?, ?it/s]

In [None]:
# corpus ex
corpus["3"]

{'text': "I'm not saying I don't like the idea of on-the-job training too, but you can't expect the company to do that. Training workers is not their job - they're building software. Perhaps educational systems in the U.S. (or their students) should worry a little about getting marketable skills in exchange for their massive investment in education, rather than getting out with thousands in student debt and then complaining that they aren't qualified to do anything.",
 'title': ''}

In [None]:
# query ex
queries["8"]

'How to deposit a cheque issued to an associate in my business into my business account?'

In [13]:
# qrel ex
qrels['8']

{'566392': 1, '65404': 1}

### Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings and full text fields. **We need to have a Redis
instance available.**

#### Local Redis
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [None]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

#### Alternative Redis Access (Cloud, Docker, other)
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [14]:
import os
import warnings

warnings.filterwarnings('ignore')

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Load corpus data into Redis for evaluation

#### format/vectorize corpus data for search index

In [15]:
from redisvl.utils.vectorize import HFTextVectorizer

# load model for embedding our movie descriptions
emb_model = HFTextVectorizer('sentence-transformers/all-MiniLM-L6-v2')

In [None]:
corpus_data = []

# this can take a minute
for key in corpus:
    text = corpus[key]["title"] + " " + corpus[key]["text"]
    corpus_data.append(
        {
            "_id": key,
            "text": text,
            "title": corpus[key]["title"],
            "vector": emb_model.embed(text, as_buffer=True)
        })

In [17]:
# corpus_data ex
corpus_data[0]

{'_id': '3',
 'text': " I'm not saying I don't like the idea of on-the-job training too, but you can't expect the company to do that. Training workers is not their job - they're building software. Perhaps educational systems in the U.S. (or their students) should worry a little about getting marketable skills in exchange for their massive investment in education, rather than getting out with thousands in student debt and then complaining that they aren't qualified to do anything.",
 'title': '',
 'vector': b'\xef),\xbd\xc6\xae-\xbcTj\xc5;\x8e$\x8f=\x81na=\x9b\xf1\x15=\x9cD\xed;\x8f\x83\x84\xbc\xe5Mz\xbdJd;\xbd(?\xcf<\x06i\xdb=%T\xa0\xbc\x94s\xed:\xa9[\x8a\xbd\xb4>,\xbd\xf2~U=\x923\x83\xbd\xb7\x1b\x98=k\x8cJ\xbd\xde\xab\xd8\xbcZ$\xe6;\x95\xa4\x80=\xe6\xb3\x8d=N\xaa];\xe4\xdf\x01<\xbc\xbb\x13\xbb\xa8\xa1\x9a\xbd\x7f\x0e\x8c\xbcK\xf8\xb5\xbcN\x11\x1f\xbc\x03\x02\x16<+\x8b\x12<\xafw\x9d<J\x89.=)l\x8f= \x99-<s\x06\xb8=5\xbb\xde\xbc\x86\xd8\xd5\xbc\xd1\xbe\x1b:\xfa.\xec\xbc3v\x9d\xbc\x7fS4\x

### Define Redis index schema

Below, we build a schema that represents our movie objects.

In [38]:
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex


schema = IndexSchema.from_dict({
  "index": {
    "name": dataset,
    "prefix": dataset,
    "storage": "hash"
  },
  "fields": [
    { "name": "_id", "type": "tag" },
    { "name": "text", "type": "text" },
    { "name": "title", "type": "text" },
    {
        "name": "vector",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
})


index = SearchIndex(schema, redis_url=REDIS_URL)
index.create(overwrite=False, drop=False)

09:29:43 redisvl.index.index INFO   Index already exists, not overwriting.


### Populate index

Load corpus objects into Redis

In [21]:
index.load(corpus_data)

['fiqa:01JPQE6GXN3P7X3YERF0WG3N01',
 'fiqa:01JPQE6GXNR77Q93NTH20JTMD1',
 'fiqa:01JPQE6GXN09N0P0675TPD9W0Q',
 'fiqa:01JPQE6GXN7MNCJGP6BXR5YEEA',
 'fiqa:01JPQE6GXN5AC7D0FNVDTT5S11',
 'fiqa:01JPQE6GXNT9M3RJK4K42DZZ5W',
 'fiqa:01JPQE6GXNKD8TXWBPXAXGZ9JM',
 'fiqa:01JPQE6GXNPQ085A6G65WRTPBJ',
 'fiqa:01JPQE6GXNX2QZAFVSQSAZSAWC',
 'fiqa:01JPQE6GXND8726J3W663H2815',
 'fiqa:01JPQE6GXN8MYRDMT7EGA7AWNF',
 'fiqa:01JPQE6GXND2WHST0QFB7G52H3',
 'fiqa:01JPQE6GXNNSMCCYGWSWJ1EHFY',
 'fiqa:01JPQE6GXNG4WNDK1C20GGKZCB',
 'fiqa:01JPQE6GXNBJ3ZMF89VT8RZ9FT',
 'fiqa:01JPQE6GXNTPH33B9E9Z9TT6QS',
 'fiqa:01JPQE6GXNF04KBKJ60ZRW25BP',
 'fiqa:01JPQE6GXN01F5M3EZQ5F8HNM3',
 'fiqa:01JPQE6GXND0SCK50RJMTA7ZQT',
 'fiqa:01JPQE6GXN3XENMJWJSVC51GTX',
 'fiqa:01JPQE6GXNBZDYQ3YY05ZQRTZS',
 'fiqa:01JPQE6GXNAJS034GCP0BWXJC5',
 'fiqa:01JPQE6GXNW9NW7211EPY9ZRRR',
 'fiqa:01JPQE6GXPQSJTP1E41BCKHDS8',
 'fiqa:01JPQE6GXPPJCPMX0T3JST32X2',
 'fiqa:01JPQE6GXP9MFZ1B8VYXSNGEY1',
 'fiqa:01JPQE6GXPQYJN4ST00GVWG0WP',
 'fiqa:01JPQE6GXPD5FM202XZ9A

### Review index metrics for loading corpus

In [39]:
index.info()["num_docs"]

57638

In [None]:
float(index.info()["total_indexing_time"]) / (1000) # seconds to index 57k records in redis

73.919796875

# Evaluate search approaches

## Baseline BM25 with bm25s

Pulled from [bm25s](https://github.com/xhluca/bm25s/blob/main/examples/evaluate_on_beir.py)

In [98]:
import beir.util
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval

import bm25s
from bm25s.utils.beir import (
    BASE_URL,
)

def postprocess_results_for_eval(results, scores, query_ids):
    """
    Given the queried results and scores output by BM25S, postprocess them
    to be compatible with BEIR evaluation functions.
    query_ids is a list of query ids in the same order as the results.
    """

    results_record = [
        {"id": qid, "hits": results[i], "scores": list(scores[i])}
        for i, qid in enumerate(query_ids)
    ]

    result_dict_for_eval = {
        res["id"]: {
            docid: float(score) for docid, score in zip(res["hits"], res["scores"])
        }
        for res in results_record
    }

    return result_dict_for_eval

def run_benchmark(dataset, save_dir="datasets"):
    #### Download dataset and unzip the dataset
    data_path = beir.util.download_and_unzip(BASE_URL.format(dataset), save_dir)
    split = "test" if dataset != "msmarco" else "dev"
    corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split=split)

    corpus_ids, corpus_lst = [], []
    for key, val in corpus.items():
        corpus_ids.append(key)
        corpus_lst.append(val["title"] + " " + val["text"])
    del corpus

    qids, queries_lst = [], []
    for key, val in queries.items():
        qids.append(key)
        queries_lst.append(val)
    
    corpus_tokens = bm25s.tokenize(
        corpus_lst, leave=False
    )

    del corpus_lst

    query_tokens = bm25s.tokenize(
        queries_lst, leave=False
    )

    model = bm25s.BM25(method="lucene", k1=1.2, b=0.75)
    model.index(corpus_tokens, leave_progress=False)
    
    ############## BENCHMARKING BEIR HERE ##############
    queried_results, queried_scores = model.retrieve(
        query_tokens, corpus=corpus_ids, k=10, n_threads=4
    )

    results_dict = postprocess_results_for_eval(queried_results, queried_scores, qids)
    ndcg, _map, recall, precision = EvaluateRetrieval.evaluate(
        qrels, results_dict, [1, 10]
    )

    print(ndcg)
    print(recall)
    print(precision)
    
    return ndcg, _map, recall, precision, results_dict


bm25s_ndcg, _map, bm25s_recall, bm25s_precision, bm25s_results_dict = run_benchmark("fiqa")  # Change to dataset you want

  0%|          | 0/57638 [00:00<?, ?it/s]

Split strings:   0%|          | 0/57638 [00:00<?, ?it/s]

Split strings:   0%|          | 0/648 [00:00<?, ?it/s]

BM25S Count Tokens:   0%|          | 0/57638 [00:00<?, ?it/s]

BM25S Compute Scores:   0%|          | 0/57638 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/648 [00:00<?, ?it/s]

{'NDCG@1': 0.21759, 'NDCG@10': 0.23836}
{'Recall@1': 0.10704, 'Recall@10': 0.30433}
{'P@1': 0.21759, 'P@10': 0.06744}


## Elasticsearch bm25

[source](https://github.com/beir-cellar/beir/blob/main/examples/retrieval/evaluation/lexical/evaluate_bm25.py)

In [None]:

from beir import util
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval
from beir.retrieval.search.lexical import BM25Search as BM25


#### /print debug information to stdout

#### Download scifact.zip dataset and unzip the dataset
dataset = "fiqa"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{dataset}.zip"
out_dir = "./datasets"
data_path = util.download_and_unzip(url, out_dir)


corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")

#### Lexical Retrieval using Bm25 (Elasticsearch) ####
#### Provide a hostname (localhost) to connect to ES instance
#### Define a new index name or use an already existing one.
#### We use default ES settings for retrieval
#### https://www.elastic.co/

hostname = "localhost:9200"  # localhost
index_name = f"{dataset}"  # fiqa
es_username = "elastic"
es_password = "ZFDOZCxr"


#### Intialize ####
# (1) True - Delete existing index and re-index all documents from scratch
# (2) False - Load existing index
initialize = True  # False

#### Sharding ####
# (1) For datasets with small corpus (datasets ~ < 5k docs) => limit shards = 1
# SciFact is a relatively small dataset! (limit shards to 1)
number_of_shards = 1
model = BM25(
    index_name=index_name,
    hostname=f"http://{es_username}:{es_password}@{hostname}",
    initialize=initialize,
    number_of_shards=number_of_shards,
)

# (2) For datasets with big corpus ==> keep default configuration
# model = BM25(index_name=index_name, hostname=hostname, initialize=initialize)
retriever = EvaluateRetrieval(model)

#### Retrieve dense results (format of results is identical to qrels)
results = retriever.retrieve(corpus, queries)

#### Evaluate your retrieval using NDCG@k, MAP@K ...
# logging.info(f"Retriever evaluation for k in: {retriever.k_values}")
es_ndcg, _map, es_recall, es_precision = retriever.evaluate(qrels, results, retriever.k_values)


### Results for ES BM25

In [176]:
print('NDCG@1: ', es_ndcg['NDCG@1'], 'NDCG@10:', es_ndcg['NDCG@10'])
print('Recall@1: ', es_recall['Recall@1'], 'Recall@10: ', es_recall['Recall@10'])
print('P@1: ', es_precision['P@1'], 'P@10: ', es_precision['P@10'])

NDCG@1:  0.23148 NDCG@10: 0.25364
Recall@1:  0.11606 Recall@10:  0.3244
P@1:  0.23148 P@10:  0.07022


## Pure BM25 with redis

BM25 is a very popular search technique that powers much of the traditional search engine experience. This type of search works on term frequency so we will also use a `tokenizer` function to remove common "stopwords" and improve the hit rate of our queries. 

In [9]:
from typing import Any, Dict, List
from redis.commands.search.aggregation import AggregateRequest, Desc

from redisvl.query import FilterQuery
from redisvl.query.filter import Text
from redisvl.utils.token_escaper import TokenEscaper

# list of stopwords to filter out noise from query string
# stopwords_en = set([
#     "a", "about", "above", "after", "again", "against", "ain", "all", "am", "an", 
#     "and", "any", "are", "aren", "aren't", "as", "at", "be", "because", "been", 
#     "before", "being", "below", "between", "both", "but", "by", "can", "couldn", 
#     "couldn't", "d", "did", "didn", "didn't", "do", "does", "doesn", "doesn't", 
#     "doing", "don", "don't", "down", "during", "each", "few", "for", "from", 
#     "further", "had", "hadn", "hadn't", "has", "hasn", "hasn't", "have", "haven", 
#     "haven't", "having", "he", "her", "here", "hers", "herself", "him", "himself", 
#     "his", "how", "i", "if", "in", "into", "is", "isn", "isn't", "it", "it's", 
#     "its", "itself", "just", "ll", "m", "ma", "me", "mightn", "mightn't", "more", 
#     "most", "mustn", "mustn't", "my", "myself", "needn", "needn't", "no", "nor", 
#     "not", "now", "o", "of", "off", "on", "once", "only", "or", "other", "our", 
#     "ours", "ourselves", "out", "over", "own", "re", "s", "same", "shan", "shan't", 
#     "she", "she's", "should", "should've", "shouldn", "shouldn't", "so", "some", 
#     "such", "t", "than", "that", "that'll", "the", "their", "theirs", "them", 
#     "themselves", "then", "there", "these", "they", "this", "those", "through", 
#     "to", "too", "under", "until", "up", "ve", "very", "was", "wasn", "wasn't", 
#     "we", "were", "weren", "weren't", "what", "when", "where", "which", "while", 
#     "who", "whom", "why", "will", "with", "won", "won't", "wouldn", "wouldn't", 
#     "y", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", 
#     "yourselves"
# ])

import nltk
out_file  = nltk.download('stopwords')
stopwords = nltk.corpus.stopwords.words('english')
stopwords_en = set(stopwords)

escaper = TokenEscaper()

def tokenize_and_escape_query(user_query: str) -> str:
    """Convert a raw user query to a redis full text query joined by ORs"""
    tokens = [escaper.escape(token.strip().strip(",").replace("“", "").replace("”", "").lower()) for token in user_query.split()]
    return " | ".join([token for token in tokens if token and token not in stopwords_en])

def bm25_query(text_field: str, user_query: str, num_results: int, scorer="BM25STD") -> FilterQuery:
    """Generate a Redis full-text query given a user query string."""
    return FilterQuery(
        filter_expression=f"~({Text(text_field) % tokenize_and_escape_query(user_query)})",
        num_results=num_results,
        return_fields=["_id", "text", "title"],
        dialect=2,
    ).scorer(scorer).with_scores()

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/robert.shelton/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [10]:
q = "failed for 8072, What does it mean “sell on ask” , “sell on bid” in stocks?"

tokenize_and_escape_query(q)

'failed | 8072 | mean | sell | ask | sell | bid | stocks?'

### Run queries and format response for evaluation

In [18]:
%%capture
redis_res_bm25 = {}

def make_score_dict(res):
    return {rec["_id"]: rec["score"] for rec in res.docs}

for key in queries:
    text_query = queries[key]
    ft_query = bm25_query("text", text_query, 10)
    try:
        res = index.search(ft_query)
        score_dict = make_score_dict(res)
    except Exception as e:
        print(f"failed for {key}, {text_query}")
        score_dict = {}
    redis_res_bm25[key] = score_dict

### Results for pure BM25

In [19]:
redis_bm25_ndcg, _map, redis_bm25_recall, redis_bm25_precision = EvaluateRetrieval.evaluate(
    qrels, redis_res_bm25, [1, 10]
)

print(redis_bm25_ndcg)
print(redis_bm25_recall)
print(redis_bm25_precision)

{'NDCG@1': 0.2037, 'NDCG@10': 0.23083}
{'Recall@1': 0.1057, 'Recall@10': 0.29653}
{'P@1': 0.2037, 'P@10': 0.0642}


## Pure vector search

In [40]:
from redisvl.query import VectorQuery

def vector_query(query: str, num_results: int) -> VectorQuery:
    vector = emb_model.embed(query, as_buffer=True)
    
    return VectorQuery(
        vector=vector,
        vector_field_name="vector",
        num_results=num_results,
        return_fields=["_id", "text", "title", "vector_distance"],
    )

In [41]:
query = vector_query("Something about finance", 10)

res = index.query(query)

res

[{'id': 'fiqa:01JPQE87695T8HG2BKA1G83TN6',
  'vector_distance': '0.353164196014',
  '_id': '459638',
  'text': ' "&gt;""In finance"" What kind of amorphous bullshit is that? There are literally hundreds of different things that can varyingly be termed ""in finance"".    Finally someone says it! Throwing arpimd vague words like finance for describing jobs is a pretty sure-fire indicator that you probably shouldn\'t be entering the industry in the first place."',
  'title': ''},
 {'id': 'fiqa:01JPQE8A7V6Q86NXFGPZHPMEEY',
  'vector_distance': '0.390319168568',
  '_id': '479752',
  'text': ' "This is the best tl;dr I could make, [original](https://jacobinmag.com/2017/10/finance-capital-shareholders-profit-market) reduced by 81%. (I\'m a bot) ***** &gt; The term financialization is used all over the place, but it&amp;#039;s usually defined in a pretty circular way: financialization means &amp;quot;More finance,&amp;quot; more things controlled by finance.  &gt; The role of finance in enforc

### Run queries

In [270]:
%%capture
redis_res_vector = {}

def make_score_dict_vec(res):
    return {rec["_id"]: (2 - float(rec["vector_distance"]) / 2) for rec in res}

for key in queries:
    text_query = queries[key]
    vec_query = vector_query(text_query, 10)
    try:
        res = index.query(vec_query)
        score_dict = make_score_dict_vec(res)
    except Exception as e:
        print(f"failed for {key}, {text_query}")
        score_dict = {}
    redis_res_vector[key] = score_dict

### Results for pure vector search

In [272]:
vec_ndcg, _map, vec_recall, vec_precision = EvaluateRetrieval.evaluate(
    qrels, redis_res_vector, [1, 10]
)

print(vec_ndcg)
print(vec_recall)
print(vec_precision)

2025-03-24 13:45:09 - For evaluation, we ignore identical query and document ids (default), please explicitly set ``ignore_identical_ids=False`` to ignore this.
2025-03-24 13:45:09 - 

2025-03-24 13:45:09 - NDCG@1: 0.3272
2025-03-24 13:45:09 - NDCG@10: 0.3447
2025-03-24 13:45:09 - 

2025-03-24 13:45:09 - MAP@1: 0.1579
2025-03-24 13:45:09 - MAP@10: 0.2715
2025-03-24 13:45:09 - 

2025-03-24 13:45:09 - Recall@1: 0.1579
2025-03-24 13:45:09 - Recall@10: 0.4113
2025-03-24 13:45:09 - 

2025-03-24 13:45:09 - P@1: 0.3272
2025-03-24 13:45:09 - P@10: 0.0998
{'NDCG@1': 0.32716, 'NDCG@10': 0.34469}
{'Recall@1': 0.15786, 'Recall@10': 0.41133}
{'P@1': 0.32716, 'P@10': 0.09985}


## Weighted combination of BM25 and vector search

Now we will test a linear combination of bm25 scoring with vector search.

In [None]:
def vector_query_filter(user_query: str, num_results: int, filters = None) -> VectorQuery:
    """Generate a Redis vector query given user query string."""
    vector = emb_model.embed(user_query, as_buffer=True, dtype="float32")
    query = VectorQuery(
        vector=vector,
        vector_field_name="vector",
        num_results=num_results,
        return_fields=["_id", "title", "text"]
    )
    if filters:
        query.set_filter(filters)
    
    return query

def linear_combo(user_query: str, alpha: float, num_results: int = 3) -> AggregateRequest:
    # Add the optional flag, "~", so that this doesn't also act as a strict text filter
    text = f"(~{Text('text') % tokenize_and_escape_query(user_query)})"

    # Build vector query
    query = vector_query_filter(user_query, num_results=num_results, filters=text)
    
    # Build aggregation request
    req = (
        AggregateRequest(query.query_string())
            .scorer("BM25STD")
            .add_scores()
            .apply(cosine_similarity="(2 - @vector_distance)/2", bm25_score="@__score")
            .apply(hybrid_score=f"{1-alpha}*@bm25_score + {alpha}*@cosine_similarity")
            .sort_by(Desc("@hybrid_score"), max=num_results)
            .load("_id", "title", "text", "cosine_similarity", "bm25_score", "hybrid_score")
            .dialect(2)
    )

    query_params = {'vector': query._vector}

    return req, query_params

### Run queries

In [273]:
%%capture
from redisvl.redis.utils import make_dict, convert_bytes

redis_res_lin_combo = {}

def agg_scores_dict(res):
    if res:
        results = [make_dict(row) for row in convert_bytes(res.rows)]
        return {rec["_id"]: float(rec["hybrid_score"]) for rec in results}

for key in queries:
    text_query = queries[key]
    alpha = 0.8 # weight for cosine similarity vs bm25
    agg_req, query_params = linear_combo(text_query, alpha, 10)
    try:
        res = index.aggregate(agg_req, query_params=query_params)
        score_dict = agg_scores_dict(res)
    except Exception as e:
        print(f"failed for {key}, {text_query}")
        score_dict = {}
    redis_res_lin_combo[key] = score_dict

### Results for linear combination of vector and bm25 alpha=0.7

In [264]:
lin_combo_ndcg, _map, lin_combo_recall, lin_combo_precision = EvaluateRetrieval.evaluate(
    qrels, redis_res_lin_combo, [1, 10]
)

print(lin_combo_ndcg)
print(lin_combo_recall)
print(lin_combo_precision)

2025-03-24 13:25:21 - For evaluation, we ignore identical query and document ids (default), please explicitly set ``ignore_identical_ids=False`` to ignore this.
2025-03-24 13:25:21 - 

2025-03-24 13:25:21 - NDCG@1: 0.2978
2025-03-24 13:25:21 - NDCG@10: 0.3297
2025-03-24 13:25:21 - 

2025-03-24 13:25:21 - MAP@1: 0.1498
2025-03-24 13:25:21 - MAP@10: 0.2512
2025-03-24 13:25:21 - 

2025-03-24 13:25:21 - Recall@1: 0.1498
2025-03-24 13:25:21 - Recall@10: 0.4149
2025-03-24 13:25:21 - 

2025-03-24 13:25:21 - P@1: 0.2978
2025-03-24 13:25:21 - P@10: 0.1003
{'NDCG@1': 0.29784, 'NDCG@10': 0.32969}
{'Recall@1': 0.14983, 'Recall@10': 0.4149}
{'P@1': 0.29784, 'P@10': 0.10031}


## Client-side fusion with RRF

Instead of relying on document scores like cosine similarity and BM25/TFIDF, we can fetch items and focus on their rank. This rank can be utilized to create a new ranking metric known as [Reciprocal Rank Fusion (RRF)](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). RRF is powerful because it can handle ranked lists of different length, scores of different scales, and other complexities.

In [250]:
def fuse_rankings_rrf(*ranked_lists, weights=None, k=60):
    """
    Perform Weighted Reciprocal Rank Fusion on N number of ordered lists.
    """
    item_scores = {}
    
    if weights is None:
        weights = [1.0] * len(ranked_lists)
    else:
        assert len(weights) == len(ranked_lists), "Number of weights must match number of ranked lists"
        assert all(0 <= w <= 1 for w in weights), "Weights must be between 0 and 1"
    
    for ranked_list, weight in zip(ranked_lists, weights):
        for rank, item in enumerate(ranked_list, start=1):
            if item not in item_scores:
                item_scores[item] = 0
            item_scores[item] += weight * (1 / (rank + k))
    
    # Sort items by their weighted RRF scores in descending order
    return sorted(item_scores.items(), key=lambda x: x[1], reverse=True)

In [251]:
def weighted_rrf(
    user_query: str,
    alpha: float = 0.5,
    num_results: int = 4,
    k: int = 60,
) -> List[Dict[str, Any]]:
    """Implemented client-side RRF after querying from Redis."""
    # Create the vector query
    vector_query = vector_query_filter(user_query, num_results=10)

    # Create the full-text bm25 query
    full_text_query = bm25_query("text", user_query, num_results=10)

    # Run queries individually
    vector_query_results = index.query(vector_query)
    full_text_query_results = index.query(full_text_query)

    # Extract _id from results
    vector_ids = [res["_id"] for res in vector_query_results]
    full_text_ids = [res["_id"] for res in full_text_query_results]

    # Perform weighted RRF
    return fuse_rankings_rrf(vector_ids, full_text_ids, weights=[alpha, 1-alpha], k=k)[:num_results]

In [268]:
%%capture
redis_res_w_rrf = {}

def make_score_dict_w_rff(res):
    return {rec[0]: rec[1] for rec in res}

for key in queries:
    text_query = queries[key]
    try:
        w_rff = weighted_rrf(text_query, num_results=10, k=20)
        scores_dict = make_score_dict_w_rff(w_rff)
    except:
        print(f"failed for {key}, {text_query}")
        scores_dict = {}

    redis_res_w_rrf[key] = scores_dict

# Results for client side fusion with RRF

In [253]:
w_rrf_ndcg, _map, w_rrf_recall, w_rrf_precision = EvaluateRetrieval.evaluate(
    qrels, redis_res_w_rrf, [1, 10]
)

print(w_rrf_ndcg)
print(w_rrf_recall)
print(w_rrf_precision)

2025-03-24 13:17:02 - For evaluation, we ignore identical query and document ids (default), please explicitly set ``ignore_identical_ids=False`` to ignore this.
2025-03-24 13:17:02 - 

2025-03-24 13:17:02 - NDCG@1: 0.3287
2025-03-24 13:17:02 - NDCG@10: 0.3491
2025-03-24 13:17:02 - 

2025-03-24 13:17:02 - MAP@1: 0.1614
2025-03-24 13:17:02 - MAP@10: 0.2691
2025-03-24 13:17:02 - 

2025-03-24 13:17:02 - Recall@1: 0.1614
2025-03-24 13:17:02 - Recall@10: 0.4279
2025-03-24 13:17:02 - 

2025-03-24 13:17:02 - P@1: 0.3287
2025-03-24 13:17:02 - P@10: 0.1000
{'NDCG@1': 0.3287, 'NDCG@10': 0.34911}
{'Recall@1': 0.16139, 'Recall@10': 0.42786}
{'P@1': 0.3287, 'P@10': 0.1}


## Client-side reranking

An alternative approach to RRF is to simply use an external reranker to order the final recommendations. RedisVL has built-in integrations to a few popular reranking modules.

In [None]:
from redisvl.utils.rerank import HFCrossEncoderReranker

# Load the ms marco MiniLM cross encoder model from huggingface
reranker = HFCrossEncoderReranker("cross-encoder/ms-marco-MiniLM-L-6-v2")


def rerank(
    user_query: str,
    num_results: int = 10,
) -> List[Dict[str, Any]]:
    """Rerank the candidates based on the user query with an external model/module."""
    # Create the vector query
    vector_query = vector_query_filter(user_query, num_results=num_results)

    # Create the full-text query
    full_text_query = bm25_query("text", user_query, num_results=num_results)

    # Run queries individually
    vector_query_results = index.query(vector_query)
    full_text_query_results = index.query(full_text_query)

    # Assemble list of potential candidates with their IDs
    candidate_map = {}
    for res in vector_query_results + full_text_query_results:
        candidate = f"Id: {res['_id']}. Text: {res['text']}"
        if candidate not in candidate_map:
            candidate_map[candidate] = res

    # Rerank candidates
    reranked, scores = reranker.rank(
        query=user_query,
        docs=list(candidate_map.keys()),
        limit=num_results,
        return_score=True
    )

    # Fetch full objects for the reranked results
    return [
        (candidate_map[rr['content']]["_id"], score)
        for rr, score in zip(reranked, scores)
    ]


In [275]:
%%capture
redis_res_rerank = {}

def make_score_dict_rerank(res):
    return {rec[0]: rec[1] for rec in res}

for key in queries:
    text_query = queries[key]
    try:
        rerank_res = rerank(text_query, num_results=10)
        scores_dict = make_score_dict_rerank(rerank_res)
    except:
        print(f"failed for {key}, {text_query}")
        scores_dict = {}

    redis_res_rerank[key] = scores_dict

In [276]:
rerank_ndcg, _map, rerank_recall, rerank_precision = EvaluateRetrieval.evaluate(
    qrels, redis_res_rerank, [1, 10], ignore_identical_ids=False
)

print(rerank_ndcg)
print(rerank_recall)
print(rerank_precision)

2025-03-24 13:47:44 - 

2025-03-24 13:47:44 - NDCG@1: 0.2747
2025-03-24 13:47:44 - NDCG@10: 0.3281
2025-03-24 13:47:44 - 

2025-03-24 13:47:44 - MAP@1: 0.1349
2025-03-24 13:47:44 - MAP@10: 0.2443
2025-03-24 13:47:44 - 

2025-03-24 13:47:44 - Recall@1: 0.1349
2025-03-24 13:47:44 - Recall@10: 0.4397
2025-03-24 13:47:44 - 

2025-03-24 13:47:44 - P@1: 0.2747
2025-03-24 13:47:44 - P@10: 0.1031
{'NDCG@1': 0.27469, 'NDCG@10': 0.32813}
{'Recall@1': 0.13487, 'Recall@10': 0.43971}
{'P@1': 0.27469, 'P@10': 0.10309}


In [277]:
import pandas as pd

res_data = {
    "Model": ["BM25S", "es_bm25", "redis_BM25", "redis_vector", "lin_combo", "weighted_rrf", "rerank"],
    "NDCG@1": [bm25s_ndcg['NDCG@1'], es_ndcg['NDCG@1'], redis_bm25_ndcg['NDCG@1'], vec_ndcg['NDCG@1'], lin_combo_ndcg['NDCG@1'], w_rrf_ndcg['NDCG@1'], rerank_ndcg['NDCG@1']],
    "NDCG@10": [bm25s_ndcg['NDCG@10'], es_ndcg['NDCG@10'], redis_bm25_ndcg['NDCG@10'], vec_ndcg['NDCG@10'], lin_combo_ndcg['NDCG@10'], w_rrf_ndcg['NDCG@10'], rerank_ndcg['NDCG@10']],
    "Recall@1": [bm25s_recall['Recall@1'], es_recall['Recall@1'], redis_bm25_recall['Recall@1'], vec_recall['Recall@1'], lin_combo_recall['Recall@1'], w_rrf_recall['Recall@1'], rerank_recall['Recall@1']],
    "Recall@10": [bm25s_recall['Recall@10'], es_recall['Recall@10'], redis_bm25_recall['Recall@10'], vec_recall['Recall@10'], lin_combo_recall['Recall@10'], w_rrf_recall['Recall@10'], rerank_recall['Recall@10']],
    "Precision@1": [bm25s_precision['P@1'], es_precision['P@1'], redis_bm25_precision['P@1'], vec_precision['P@1'], lin_combo_precision['P@1'], w_rrf_precision['P@1'], rerank_precision['P@1']],
    "Precision@10": [bm25s_precision['P@10'], es_precision['P@10'], redis_bm25_precision['P@10'], vec_precision['P@10'], lin_combo_precision['P@10'], w_rrf_precision['P@10'], rerank_precision['P@10']]
}

df = pd.DataFrame(res_data)

In [278]:
df[["Model", "Recall@10", "NDCG@10", "Precision@1"]].sort_values(by=["Recall@10"], ascending=False).reset_index(drop=True)

Unnamed: 0,Model,Recall@10,NDCG@10,Precision@1
0,rerank,0.43971,0.32813,0.27469
1,weighted_rrf,0.42786,0.34911,0.3287
2,lin_combo,0.4149,0.32969,0.29784
3,redis_vector,0.41133,0.34469,0.32716
4,es_bm25,0.3244,0.25364,0.23148
5,BM25S,0.30433,0.23836,0.21759
6,redis_BM25,0.29653,0.23083,0.2037


In [279]:
df.to_csv("fiqa_results.csv", index=False)