# **BEIR: A Heterogenous benchmark for Zero-shot Evaluation of Information Retrieval models**

This notebook contains an simple and easy examples to evaluate retrieval models from our new benchmark.

## Introduction
The BEIR benchmark contains 9 diverse retrieval tasks including 17 diverse datasets. We evaluate 9 state-of-the-art retriever models all in a zero-shot evaluation setup. Today, in this colab notebook, we first will show how to download and load the 14 open-sourced datasets with just three lines of code. Afterward, we would load some state-of-the-art dense retrievers (bi-encoders) such as SBERT, ANCE, DPR models and use them for retrieval and evaluate them in a zero-shot setup.

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

Developed by Nandan Thakur, Researcher @ UKP Lab, TU Darmstadt

(https://nthakur.xyz) (nandant@gmail.com)

In [1]:
!nvidia-smi

Fri Dec  6 14:59:31 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.94                 Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3070      WDDM  |   00000000:07:00.0  On |                  N/A |
| 30%   39C    P8             21W /  220W |    1695MiB /   8192MiB |     11%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
from beir import util, LoggingHandler

import logging
import pathlib, os

#### Just some code to print debug information to stdout
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])
#### /print debug information to stdout

  from tqdm.autonotebook import tqdm


# **BEIR Datasets**

BEIR contains 17 diverse datasets overall. You can view all the datasets (14 downloadable) with the link below:

[``https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/``](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/)

Please refer GitHub page to evaluate on other datasets (3 of them).


We include the following datasets in BEIR:

| Dataset   | Website| BEIR-Name | Domain     | Relevancy| Queries  | Documents | Avg. Docs/Q | Download |
| -------- | -----| ---------| ----------- | ---------| ---------| --------- | ------| ------------|
| MSMARCO    | [``Homepage``](https://microsoft.github.io/msmarco/)| ``msmarco`` | Misc.       |  Binary  |  6,980   |  8.84M     |    1.1 | Yes |  
| TREC-COVID |  [``Homepage``](https://ir.nist.gov/covidSubmit/index.html)| ``trec-covid``| Bio-Medical |  3-level|50|  171K| 493.5 | Yes |
| NFCorpus   | [``Homepage``](https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/) | ``nfcorpus``  | Bio-Medical |  3-level |  323     |  3.6K     |  38.2 | Yes |
| BioASQ     | [``Homepage``](http://bioasq.org) | ``bioasq``| Bio-Medical |  Binary  |   500    |  14.91M    |  8.05 | No |
| NQ         | [``Homepage``](https://ai.google.com/research/NaturalQuestions) | ``nq``| Wikipedia   |  Binary  |  3,452   |  2.68M  |  1.2 | Yes |
| HotpotQA   | [``Homepage``](https://hotpotqa.github.io) | ``hotpotqa``| Wikipedia   |  Binary  |  7,405   |  5.23M  |  2.0 | Yes |
| FiQA-2018  | [``Homepage``](https://sites.google.com/view/fiqa/) | ``fiqa``    | Finance     |  Binary  |  648     |  57K    |  2.6 | Yes |
| Signal-1M (RT) | [``Homepage``](https://research.signal-ai.com/datasets/signal1m-tweetir.html)| ``signal1m`` | Twitter     |  3-level  |   97   |  2.86M  |  19.6 | No |
| TREC-NEWS  | [``Homepage``](https://trec.nist.gov/data/news2019.html) | ``trec-news``    | News     |  5-level  |   57    |  595K    |  19.6 | No |
| ArguAna    | [``Homepage``](http://argumentation.bplaced.net/arguana/data) | ``arguana`` | Misc.       |  Binary  |  1,406     |  8.67K    |  1.0 | Yes |
| Touche-2020| [``Homepage``](https://webis.de/events/touche-20/shared-task-1.html) | ``webis-touche2020``| Misc.       |  6-level  |  49     |  382K    |  49.2 |  Yes |
| CQADupstack| [``Homepage``](http://nlp.cis.unimelb.edu.au/resources/cqadupstack/) | ``cqadupstack``| StackEx.      |  Binary  |  13,145 |  457K  |  1.4 |  Yes |
| Quora| [``Homepage``](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs) | ``quora``| Quora  | Binary  |  10,000     |  523K    |  1.6 |  Yes |
| DBPedia | [``Homepage``](https://github.com/iai-group/DBpedia-Entity/) | ``dbpedia-entity``| Wikipedia |  3-level  |  400    |  4.63M    |  38.2 |  Yes |
| SCIDOCS| [``Homepage``](https://allenai.org/data/scidocs) | ``scidocs``| Scientific |  Binary  |  1,000     |  25K    |  4.9 |  Yes |
| FEVER| [``Homepage``](http://fever.ai) | ``fever``| Wikipedia     |  Binary  |  6,666     |  5.42M    |  1.2|  Yes |
| Climate-FEVER| [``Homepage``](http://climatefever.ai) | ``climate-fever``| Wikipedia |  Binary  |  1,535     |  5.42M |  3.0 |  Yes |
| SciFact| [``Homepage``](https://github.com/allenai/scifact) | ``scifact``| Scientific |  Binary  |  300     |  5K    |  1.1 |  Yes |


For Simplicity, we will show example with the one of the smallest datasets - ``SciFact`` for our example.

You can evaluate any dataset you wish by looking at the table above.

In [4]:
import pathlib, os
from beir import util

dataset = "msmarco"
url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(dataset)
out_dir = os.path.join(os.getcwd(), "datasets")
data_path = util.download_and_unzip(url, out_dir)
print("Dataset downloaded here: {}".format(data_path))

2024-12-06 14:54:34 - Downloading msmarco.zip ...


d:\Repos\search-ranking-eval\eval\datasets\msmarco.zip: 100%|██████████| 1.01G/1.01G [03:18<00:00, 5.45MiB/s]


2024-12-06 14:57:53 - Unzipping msmarco.zip ...
Dataset downloaded here: d:\Repos\search-ranking-eval\eval\datasets\msmarco


# **Folder Structure of any BEIR dataset**

* scifact/
    * corpus.jsonl
    * queries.jsonl
    * qrels/
        * train.tsv
        * dev.tsv
        * test.tsv

In [4]:
!ls datasets/msmarco/

corpus.jsonl
qrels
queries.jsonl


# **Data Loading**

In [5]:
from beir.datasets.data_loader import GenericDataLoader

data_path = "datasets/msmarco"
corpus, queries, qrels = GenericDataLoader(data_path).load(split="test") # or split = "train" or "dev"

2024-12-06 14:59:50 - Loading Corpus...


100%|██████████| 8841823/8841823 [00:47<00:00, 188000.05it/s]


2024-12-06 15:00:54 - Loaded 8841823 TEST Documents.
2024-12-06 15:00:57 - Doc Example: {'text': 'The presence of communication amid scientific minds was equally important to the success of the Manhattan Project as scientific intellect was. The only cloud hanging over the impressive achievement of the atomic researchers and engineers is what their success truly meant; hundreds of thousands of innocent lives obliterated.', 'title': ''}
2024-12-06 15:00:57 - Loading Queries...
2024-12-06 15:00:58 - Loaded 43 TEST Queries.
2024-12-06 15:00:58 - Query Example: anthropological definition of environment


# **Dense Retrieval using Exact Search**

## **Sentence-BERT**
We use the [``distilbert-base-msmarco-v3``](https://www.sbert.net/docs/pretrained-models/msmarco-v3.html) SBERT model in this example.

In [6]:
from beir.retrieval.evaluation import EvaluateRetrieval
from beir.retrieval import models
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

#### Dense Retrieval using SBERT (Sentence-BERT) ####
#### Provide any pretrained sentence-transformers model
#### The model was fine-tuned using cosine-similarity.
#### Complete list - https://www.sbert.net/docs/pretrained_models.html

model = DRES(models.SentenceBERT("msmarco-distilbert-base-v3"), batch_size=128)
retriever = EvaluateRetrieval(model, score_function="cos_sim")

#### Retrieve dense results (format of results is identical to qrels)
results = retriever.retrieve(corpus, queries)

2024-12-06 15:01:53 - PyTorch version 2.5.1 available.
2024-12-06 15:01:54 - Loading faiss with AVX2 support.
2024-12-06 15:01:54 - Successfully loaded faiss with AVX2 support.
2024-12-06 15:01:54 - Use pytorch device_name: cpu
2024-12-06 15:01:54 - Load pretrained SentenceTransformer: msmarco-distilbert-base-v3
2024-12-06 15:01:59 - Encoding Queries...


Batches: 100%|██████████| 1/1 [00:00<00:00,  4.79it/s]


2024-12-06 15:01:59 - Sorting Corpus by document length (Longest first)...
2024-12-06 15:02:42 - Scoring Function: Cosine Similarity (cos_sim)
2024-12-06 15:02:42 - Encoding Batch 1/177...


Batches: 100%|██████████| 391/391 [48:35<00:00,  7.46s/it] 


2024-12-06 15:51:22 - Encoding Batch 2/177...


Batches:  23%|██▎       | 89/391 [09:43<32:58,  6.55s/it]


KeyboardInterrupt: 

In [None]:
#### Evaluate your retrieval using NDCG@k, MAP@K ...

logging.info("Retriever evaluation for k in: {}".format(retriever.k_values))
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)

2021-04-20 15:46:44 - Retriever evaluation for k in: [1, 3, 5, 10, 100, 1000]
2021-04-20 15:46:44 - 

2021-04-20 15:46:44 - NDCG@1: 0.4233
2021-04-20 15:46:44 - NDCG@3: 0.4842
2021-04-20 15:46:44 - NDCG@5: 0.5104
2021-04-20 15:46:44 - NDCG@10: 0.5379
2021-04-20 15:46:44 - NDCG@100: 0.5759
2021-04-20 15:46:44 - NDCG@1000: 0.5913
2021-04-20 15:46:44 - 

2021-04-20 15:46:44 - MAP@1: 0.3994
2021-04-20 15:46:44 - MAP@3: 0.4593
2021-04-20 15:46:44 - MAP@5: 0.4768
2021-04-20 15:46:44 - MAP@10: 0.4889
2021-04-20 15:46:44 - MAP@100: 0.4974
2021-04-20 15:46:44 - MAP@1000: 0.4980
2021-04-20 15:46:44 - 

2021-04-20 15:46:44 - Recall@1: 0.3994
2021-04-20 15:46:44 - Recall@3: 0.5256
2021-04-20 15:46:44 - Recall@5: 0.5887
2021-04-20 15:46:44 - Recall@10: 0.6723
2021-04-20 15:46:44 - Recall@100: 0.8460
2021-04-20 15:46:44 - Recall@1000: 0.9683
2021-04-20 15:46:44 - 

2021-04-20 15:46:44 - P@1: 0.4233
2021-04-20 15:46:44 - P@3: 0.1933
2021-04-20 15:46:44 - P@5: 0.1333
2021-04-20 15:46:44 - P@10: 0.0757

In [None]:
import random

#### Print top-k documents retrieved ####
top_k = 10

query_id, ranking_scores = random.choice(list(results.items()))
scores_sorted = sorted(ranking_scores.items(), key=lambda item: item[1], reverse=True)
logging.info("Query : %s\n" % queries[query_id])

for rank in range(top_k):
    doc_id = scores_sorted[rank][0]
    # Format: Rank x: ID [Title] Body
    logging.info("Rank %d: %s [%s] - %s\n" % (rank+1, doc_id, corpus[doc_id].get("title"), corpus[doc_id].get("text")))

2021-04-20 15:57:32 - Query : Rapid up-regulation and higher basal expression of interferon-induced genes increase survival of granule cell neurons that are infected by West Nile virus.

2021-04-20 15:57:32 - Rank 1: 7717468 [Loss of allergen 1 confers a hypervirulent phenotype that resembles mucoid switch variants of Cryptococcus neoformans.] - Microbial survival in a host is usually dependent on the ability of a pathogen to undergo changes that promote escape from host defense mechanisms. The human-pathogenic fungus Cryptococcus neoformans undergoes phenotypic switching in vivo that promotes persistence in tissue. By microarray and real-time PCR analyses, the allergen 1 gene (ALL1) was found to be downregulated in the hypervirulent mucoid switch variant, both during logarithmic growth and during intracellular growth in macrophages. The ALL1 gene encodes a small cytoplasmic protein that is involved in capsule formation. Growth of an all1Delta gene deletion mutant was normal. Similar t

## **ANCE**

We use the [``msmarco-roberta-base-ance-fristp``](https://www.sbert.net/docs/pretrained-models/msmarco-v3.html) ANCE model which was fine-tuned on MSMARCO dataset for 600K steps.

In [None]:
#### Dense Retrieval using ANCE ####
# https://www.sbert.net/docs/pretrained-models/msmarco-v3.html
# MSMARCO Dev Passage Retrieval ANCE(FirstP) 600K model from ANCE.
# The ANCE model was fine-tuned using dot-product (dot) function.

model = DRES(models.SentenceBERT("msmarco-roberta-base-ance-fristp"))
retriever = EvaluateRetrieval(model, score_function="dot")

#### Retrieve dense results (format of results is identical to qrels)
results = retriever.retrieve(corpus, queries)

2021-04-20 15:53:21 - Load pretrained SentenceTransformer: msmarco-roberta-base-ance-fristp
2021-04-20 15:53:21 - Did not find folder msmarco-roberta-base-ance-fristp
2021-04-20 15:53:21 - Search model on server: http://sbert.net/models/msmarco-roberta-base-ance-fristp.zip
2021-04-20 15:53:21 - Downloading sentence transformer model from http://sbert.net/models/msmarco-roberta-base-ance-fristp.zip and saving it at /root/.cache/torch/sentence_transformers/sbert.net_models_msmarco-roberta-base-ance-fristp


HBox(children=(FloatProgress(value=0.0, max=464705670.0), HTML(value='')))


2021-04-20 15:53:44 - Load SentenceTransformer from folder: /root/.cache/torch/sentence_transformers/sbert.net_models_msmarco-roberta-base-ance-fristp
2021-04-20 15:53:47 - Use pytorch device: cuda
2021-04-20 15:53:47 - Encoding Queries...


HBox(children=(FloatProgress(value=0.0, description='Batches', max=3.0, style=ProgressStyle(description_width=…


2021-04-20 15:53:48 - Encoding Batch 1/1...


HBox(children=(FloatProgress(value=0.0, description='Batches', max=41.0, style=ProgressStyle(description_width…




In [None]:
#### Evaluate your retrieval using NDCG@k, MAP@K ...

logging.info("Retriever evaluation for k in: {}".format(retriever.k_values))
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)

2021-04-20 15:57:42 - Retriever evaluation for k in: [1, 3, 5, 10, 100, 1000]
2021-04-20 15:57:42 - 

2021-04-20 15:57:42 - NDCG@1: 0.4100
2021-04-20 15:57:42 - NDCG@3: 0.4682
2021-04-20 15:57:42 - NDCG@5: 0.4871
2021-04-20 15:57:42 - NDCG@10: 0.5114
2021-04-20 15:57:42 - NDCG@100: 0.5493
2021-04-20 15:57:42 - NDCG@1000: 0.5665
2021-04-20 15:57:42 - 

2021-04-20 15:57:42 - MAP@1: 0.3843
2021-04-20 15:57:42 - MAP@3: 0.4416
2021-04-20 15:57:42 - MAP@5: 0.4546
2021-04-20 15:57:42 - MAP@10: 0.4661
2021-04-20 15:57:42 - MAP@100: 0.4732
2021-04-20 15:57:42 - MAP@1000: 0.4738
2021-04-20 15:57:42 - 

2021-04-20 15:57:42 - Recall@1: 0.3843
2021-04-20 15:57:42 - Recall@3: 0.5124
2021-04-20 15:57:42 - Recall@5: 0.5613
2021-04-20 15:57:42 - Recall@10: 0.6333
2021-04-20 15:57:42 - Recall@100: 0.8187
2021-04-20 15:57:42 - Recall@1000: 0.9567
2021-04-20 15:57:42 - 

2021-04-20 15:57:42 - P@1: 0.4100
2021-04-20 15:57:42 - P@3: 0.1889
2021-04-20 15:57:42 - P@5: 0.1260
2021-04-20 15:57:42 - P@10: 0.0720

# **Lexical Retrieval using BM25 (Elasticsearch)**

## 1. Download and setup the Elasticsearch instance
Reference: https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/elasticsearch.ipynb

For demo purposes, the open-source version of the elasticsearch package is used.

In [None]:
%%bash

wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.9.2-linux-x86_64.tar.gz
wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.9.2-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-oss-7.9.2-linux-x86_64.tar.gz
sudo chown -R daemon:daemon elasticsearch-7.9.2/
shasum -a 512 -c elasticsearch-oss-7.9.2-linux-x86_64.tar.gz.sha512

elasticsearch-oss-7.9.2-linux-x86_64.tar.gz: OK


Run the instance as a daemon process


In [None]:
%%bash --bg

sudo -H -u daemon elasticsearch-7.9.2/bin/elasticsearch

Starting job # 0 in a separate thread.


In [None]:
import time

# Sleep for few seconds to let the instance start.
time.sleep(20)

Once the instance has been started, grep for ``elasticsearch`` in the processes list to confirm the availability.

In [None]:
%%bash

ps -ef | grep elasticsearch

root         664     662  0 16:05 ?        00:00:00 sudo -H -u daemon elasticsearch-7.9.2/bin/elasticsearch
daemon       665     664 23 16:05 ?        00:00:16 /content/elasticsearch-7.9.2/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=SPI,COMPAT -Xms1g -Xmx1g -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Djava.io.tmpdir=/tmp/elasticsearch-1430391873797249845 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecou

In [None]:
%%bash

curl -sX GET "localhost:9200/"

{
  "name" : "c0f3c0cd00cc",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "RjQ4yenrQomacpABczxITQ",
  "version" : {
    "number" : "7.9.2",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "d34da0ea4a966c4e49417f2da2f244e3e97b4e6e",
    "build_date" : "2020-09-23T00:45:33.626720Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}


In [None]:
from beir.retrieval.search.lexical import BM25Search as BM25
from beir.retrieval.evaluation import EvaluateRetrieval

#### Provide parameters for elastic-search
hostname = "localhost"
index_name = "scifact"
initialize = True # True, will delete existing index with same name and reindex all documents

model = BM25(index_name=index_name, hostname=hostname, initialize=initialize)
retriever = EvaluateRetrieval(model)

#### Retrieve dense results (format of results is identical to qrels)
results = retriever.retrieve(corpus, queries)

2021-04-20 16:09:45 - Activating Elasticsearch....
2021-04-20 16:09:45 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'scifact', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24}
2021-04-20 16:09:45 - Deleting previous Elasticsearch-Index named - scifact
2021-04-20 16:09:45 - Creating fresh Elasticsearch-Index named - scifact


  0%|          | 0/5183 [00:00<?, ?docs/s]        
que: 100%|██████████| 3/3 [00:14<00:00,  4.94s/it]


In [None]:
#### Evaluate your retrieval using NDCG@k, MAP@K ...
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)

2021-04-20 16:10:20 - 

2021-04-20 16:10:20 - NDCG@1: 0.5400
2021-04-20 16:10:20 - NDCG@3: 0.6104
2021-04-20 16:10:20 - NDCG@5: 0.6297
2021-04-20 16:10:20 - NDCG@10: 0.6495
2021-04-20 16:10:20 - NDCG@100: 0.6749
2021-04-20 16:10:20 - NDCG@1000: 0.6850
2021-04-20 16:10:20 - 

2021-04-20 16:10:20 - MAP@1: 0.5176
2021-04-20 16:10:20 - MAP@3: 0.5851
2021-04-20 16:10:20 - MAP@5: 0.5974
2021-04-20 16:10:20 - MAP@10: 0.6068
2021-04-20 16:10:20 - MAP@100: 0.6130
2021-04-20 16:10:20 - MAP@1000: 0.6134
2021-04-20 16:10:20 - 

2021-04-20 16:10:20 - Recall@1: 0.5176
2021-04-20 16:10:20 - Recall@3: 0.6607
2021-04-20 16:10:20 - Recall@5: 0.7071
2021-04-20 16:10:20 - Recall@10: 0.7657
2021-04-20 16:10:20 - Recall@100: 0.8742
2021-04-20 16:10:20 - Recall@1000: 0.9550
2021-04-20 16:10:20 - 

2021-04-20 16:10:20 - P@1: 0.5400
2021-04-20 16:10:20 - P@3: 0.2356
2021-04-20 16:10:20 - P@5: 0.1547
2021-04-20 16:10:20 - P@10: 0.0840
2021-04-20 16:10:20 - P@100: 0.0099
2021-04-20 16:10:20 - P@1000: 0.0011


# **Reranking BM25 using Cross-Encoder**

In this example, we rerank the top-20 documents retrieved from BM25, using ([cross-encoder/ms-marco-electra-base](https://www.sbert.net/docs/pretrained-models/ce-msmarco.html)) SBERT cross-encoder model

In [None]:
from beir.reranking.models import CrossEncoder
from beir.reranking import Rerank

#### Reranking using Cross-Encoder models (list: )
cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-electra-base')
reranker = Rerank(cross_encoder_model, batch_size=128)

# Rerank top-100 results using the reranker provided
rerank_results = reranker.rerank(corpus, queries, results, top_k=20)

2021-04-20 16:14:39 - Use pytorch device: cuda
2021-04-20 16:14:39 - Starting To Rerank Top-20....


HBox(children=(FloatProgress(value=0.0, description='Batches', max=47.0, style=ProgressStyle(description_width…




In [None]:
#### Evaluate your retrieval using NDCG@k, MAP@K ...
ndcg, _map, recall, precision = EvaluateRetrieval.evaluate(qrels, rerank_results, retriever.k_values)

2021-04-20 16:19:08 - 

2021-04-20 16:19:08 - NDCG@1: 0.5733
2021-04-20 16:19:08 - NDCG@3: 0.6314
2021-04-20 16:19:08 - NDCG@5: 0.6520
2021-04-20 16:19:08 - NDCG@10: 0.6720
2021-04-20 16:19:08 - NDCG@100: 0.6780
2021-04-20 16:19:08 - NDCG@1000: 0.6780
2021-04-20 16:19:08 - 

2021-04-20 16:19:08 - MAP@1: 0.5451
2021-04-20 16:19:08 - MAP@3: 0.6074
2021-04-20 16:19:08 - MAP@5: 0.6216
2021-04-20 16:19:08 - MAP@10: 0.6307
2021-04-20 16:19:08 - MAP@100: 0.6324
2021-04-20 16:19:08 - MAP@1000: 0.6324
2021-04-20 16:19:08 - 

2021-04-20 16:19:08 - Recall@1: 0.5451
2021-04-20 16:19:08 - Recall@3: 0.6758
2021-04-20 16:19:08 - Recall@5: 0.7260
2021-04-20 16:19:08 - Recall@10: 0.7844
2021-04-20 16:19:08 - Recall@100: 0.8078
2021-04-20 16:19:08 - Recall@1000: 0.8078
2021-04-20 16:19:08 - 

2021-04-20 16:19:08 - P@1: 0.5733
2021-04-20 16:19:08 - P@3: 0.2444
2021-04-20 16:19:08 - P@5: 0.1613
2021-04-20 16:19:08 - P@10: 0.0880
2021-04-20 16:19:08 - P@100: 0.0090
2021-04-20 16:19:08 - P@1000: 0.0009
