# BEIR: A Heterogenous benchmark for Zero-shot Evaluation of Information Retrieval models

This notebook contains an simple and easy examples to evaluate retrieval models from our new benchmark.

## Introduction
The BEIR benchmark contains 9 diverse retrieval tasks including 17 diverse datasets. We evaluate 9 state-of-the-art retriever models all in a zero-shot evaluation setup. Today, in this colab notebook, we first will show how to download and load the 14 open-sourced datasets with just three lines of code. Afterward, we would load some state-of-the-art dense retrievers (bi-encoders) such as SBERT, ANCE, DPR models and use them for retrieval and evaluate them in a zero-shot setup.

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

Developed by Nandan Thakur, Researcher @ UKP Lab, TU Darmstadt

(https://nthakur.xyz) (nandant@gmail.com)

In [5]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
print("CUDA_VISIBLE_DEVICES:", os.environ["CUDA_VISIBLE_DEVICES"], "HF_HOME:", os.environ["HF_HOME"])

CUDA_VISIBLE_DEVICES: 7 HF_HOME: /local1/mohsenfayyaz/.hfcache/


# Install BEIR

In [6]:
# ! pip install beir

In [7]:
from beir import util, LoggingHandler
import logging
import pathlib, os
#### Just some code to print debug information to stdout
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])
#### /print debug information to stdout

**BEIR Datasets**

BEIR contains 17 diverse datasets overall. You can view all the datasets (14 downloadable) with the link below:

[``https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/``](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/)

Please refer GitHub page to evaluate on other datasets (3 of them).


We include the following datasets in BEIR:

| Dataset   | Website| BEIR-Name | Domain     | Relevancy| Queries  | Documents | Avg. Docs/Q | Download |
| -------- | -----| ---------| ----------- | ---------| ---------| --------- | ------| ------------|
| MSMARCO    | [``Homepage``](https://microsoft.github.io/msmarco/)| ``msmarco`` | Misc.       |  Binary  |  6,980   |  8.84M     |    1.1 | Yes |  
| TREC-COVID |  [``Homepage``](https://ir.nist.gov/covidSubmit/index.html)| ``trec-covid``| Bio-Medical |  3-level|50|  171K| 493.5 | Yes |
| NFCorpus   | [``Homepage``](https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/) | ``nfcorpus``  | Bio-Medical |  3-level |  323     |  3.6K     |  38.2 | Yes |
| BioASQ     | [``Homepage``](http://bioasq.org) | ``bioasq``| Bio-Medical |  Binary  |   500    |  14.91M    |  8.05 | No |
| NQ         | [``Homepage``](https://ai.google.com/research/NaturalQuestions) | ``nq``| Wikipedia   |  Binary  |  3,452   |  2.68M  |  1.2 | Yes |
| HotpotQA   | [``Homepage``](https://hotpotqa.github.io) | ``hotpotqa``| Wikipedia   |  Binary  |  7,405   |  5.23M  |  2.0 | Yes |
| FiQA-2018  | [``Homepage``](https://sites.google.com/view/fiqa/) | ``fiqa``    | Finance     |  Binary  |  648     |  57K    |  2.6 | Yes |
| Signal-1M (RT) | [``Homepage``](https://research.signal-ai.com/datasets/signal1m-tweetir.html)| ``signal1m`` | Twitter     |  3-level  |   97   |  2.86M  |  19.6 | No |
| TREC-NEWS  | [``Homepage``](https://trec.nist.gov/data/news2019.html) | ``trec-news``    | News     |  5-level  |   57    |  595K    |  19.6 | No |
| ArguAna    | [``Homepage``](http://argumentation.bplaced.net/arguana/data) | ``arguana`` | Misc.       |  Binary  |  1,406     |  8.67K    |  1.0 | Yes |
| Touche-2020| [``Homepage``](https://webis.de/events/touche-20/shared-task-1.html) | ``webis-touche2020``| Misc.       |  6-level  |  49     |  382K    |  49.2 |  Yes |
| CQADupstack| [``Homepage``](http://nlp.cis.unimelb.edu.au/resources/cqadupstack/) | ``cqadupstack``| StackEx.      |  Binary  |  13,145 |  457K  |  1.4 |  Yes |
| Quora| [``Homepage``](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs) | ``quora``| Quora  | Binary  |  10,000     |  523K    |  1.6 |  Yes |
| DBPedia | [``Homepage``](https://github.com/iai-group/DBpedia-Entity/) | ``dbpedia-entity``| Wikipedia |  3-level  |  400    |  4.63M    |  38.2 |  Yes |
| SCIDOCS| [``Homepage``](https://allenai.org/data/scidocs) | ``scidocs``| Scientific |  Binary  |  1,000     |  25K    |  4.9 |  Yes |
| FEVER| [``Homepage``](http://fever.ai) | ``fever``| Wikipedia     |  Binary  |  6,666     |  5.42M    |  1.2|  Yes |
| Climate-FEVER| [``Homepage``](http://climatefever.ai) | ``climate-fever``| Wikipedia |  Binary  |  1,535     |  5.42M |  3.0 |  Yes |
| SciFact| [``Homepage``](https://github.com/allenai/scifact) | ``scifact``| Scientific |  Binary  |  300     |  5K    |  1.1 |  Yes |


# Download Dataset

In [13]:
DATASET = "nq"

# ! git lfs install

# ! git clone https://huggingface.co/datasets/BeIR/nq
# ! mkdir --parents ./datasets/; 
# ! mv nq datasets/
# ! gzip -d datasets/nq/corpus.jsonl.gz
# ! gzip -d datasets/nq/queries.jsonl.gz

# ! git clone https://huggingface.co/datasets/BeIR/nq-qrels
# ! mv nq-qrels datasets/nq/qrels

### SLOW
import pathlib, os
from beir import util
url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(DATASET)
out_dir = os.path.join(os.getcwd(), "datasets")
data_path = util.download_and_unzip(url, out_dir)
print("Dataset downloaded here: {}".format(data_path))

2024-08-18 20:46:27 - Downloading nq.zip ...


/local1/mohsenfayyaz/projects/Retriever-Contextualization/src/notebooks/datasets/nq.zip:   0%|          | 0.00…

2024-08-18 20:47:27 - Unzipping nq.zip ...
Dataset downloaded here: /local1/mohsenfayyaz/projects/Retriever-Contextualization/src/notebooks/datasets/nq


In [15]:
from beir.datasets.data_loader import GenericDataLoader
from tqdm.auto import tqdm

data_path = f"datasets/{DATASET}"
corpus_raw, queries, qrels = GenericDataLoader(data_path).load(split="test") # or split = "train" or "dev"

gold_docs = set()
for test_k, test_v in tqdm(qrels.items()):
    for doc_k, doc_v in test_v.items():
        gold_docs.add(doc_k)
print({
    "#Corpus:": len(corpus_raw), 
    "#Gold_Corpus:": len(gold_docs),
    "#Queries&qrels:": len(queries)
})
corpus = {d: corpus_raw[d] for d in gold_docs}  # corpus = raw_corpus  (FOR GOLD OR ALL)

2024-08-18 20:47:54 - Loading Corpus...


  0%|          | 0/2681468 [00:00<?, ?it/s]

2024-08-18 20:48:06 - Loaded 2681468 TEST Documents.
2024-08-18 20:48:06 - Doc Example: {'text': "In accounting, minority interest (or non-controlling interest) is the portion of a subsidiary corporation's stock that is not owned by the parent corporation. The magnitude of the minority interest in the subsidiary company is generally less than 50% of outstanding shares, or the corporation would generally cease to be a subsidiary of the parent.[1]", 'title': 'Minority interest'}
2024-08-18 20:48:06 - Loading Queries...
2024-08-18 20:48:06 - Loaded 3452 TEST Queries.
2024-08-18 20:48:06 - Query Example: what is non controlling interest on balance sheet


  0%|          | 0/3452 [00:00<?, ?it/s]

{'#Corpus:': 2681468, '#Gold_Corpus:': 4201, '#Queries&qrels:': 3452}


# **Dense Retrieval using Exact Search**

In [16]:
from beir.retrieval.evaluation import EvaluateRetrieval
from beir.retrieval import models
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

#### Dense Retrieval using SBERT (Sentence-BERT) ####
#### Provide any pretrained sentence-transformers model
#### The model was fine-tuned using cosine-similarity.
#### Complete list - https://www.sbert.net/docs/pretrained_models.html

MODEL = "facebook/contriever-msmarco"  # "msmarco-distilbert-base-v3"
model = DRES(models.SentenceBERT(MODEL), batch_size=128)
retriever = EvaluateRetrieval(model, score_function="cos_sim")

#### Retrieve dense results (format of results is identical to qrels)
results = retriever.retrieve(corpus, queries)

2024-08-18 20:48:12 - Use pytorch device_name: cuda
2024-08-18 20:48:12 - Load pretrained SentenceTransformer: facebook/contriever-msmarco
2024-08-18 20:48:12 - No sentence-transformers model found with name facebook/contriever-msmarco. Creating a new one with mean pooling.
2024-08-18 20:48:13 - Encoding Queries...




Batches:   0%|          | 0/27 [00:00<?, ?it/s]

2024-08-18 20:48:14 - Sorting Corpus by document length (Longest first)...
2024-08-18 20:48:14 - Scoring Function: Cosine Similarity (cos_sim)
2024-08-18 20:48:14 - Encoding Batch 1/1...


Batches:   0%|          | 0/33 [00:00<?, ?it/s]

In [18]:
#### Evaluate your retrieval using NDCG@k, MAP@K ...

logging.info("Retriever evaluation for k in: {}".format(retriever.k_values))
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
recall

2024-08-18 20:50:02 - Retriever evaluation for k in: [1, 3, 5, 10, 100, 1000]
2024-08-18 20:50:02 - For evaluation, we ignore identical query and document ids (default), please explicitly set ``ignore_identical_ids=False`` to ignore this.
2024-08-18 20:50:04 - 

2024-08-18 20:50:04 - NDCG@1: 0.9076
2024-08-18 20:50:04 - NDCG@3: 0.9378
2024-08-18 20:50:04 - NDCG@5: 0.9453
2024-08-18 20:50:04 - NDCG@10: 0.9501
2024-08-18 20:50:04 - NDCG@100: 0.9518
2024-08-18 20:50:04 - NDCG@1000: 0.9522
2024-08-18 20:50:04 - 

2024-08-18 20:50:04 - MAP@1: 0.8138
2024-08-18 20:50:04 - MAP@3: 0.9266
2024-08-18 20:50:04 - MAP@5: 0.9322
2024-08-18 20:50:04 - MAP@10: 0.9346
2024-08-18 20:50:04 - MAP@100: 0.9351
2024-08-18 20:50:04 - MAP@1000: 0.9351
2024-08-18 20:50:04 - 

2024-08-18 20:50:04 - Recall@1: 0.8138
2024-08-18 20:50:04 - Recall@3: 0.9596
2024-08-18 20:50:04 - Recall@5: 0.9762
2024-08-18 20:50:04 - Recall@10: 0.9897
2024-08-18 20:50:04 - Recall@100: 0.9974
2024-08-18 20:50:04 - Recall@1000: 0.9999

{'Recall@1': 0.81383,
 'Recall@3': 0.95959,
 'Recall@5': 0.97625,
 'Recall@10': 0.98972,
 'Recall@100': 0.99744,
 'Recall@1000': 0.9999}

# Save Results

In [23]:
import torch
import pickle
import numpy as np
import pandas as pd
import os
from dotenv import load_dotenv
from huggingface_hub import login
load_dotenv()
login(os.environ["HF_API_TOKEN"])
pd.set_option('display.max_colwidth', 200)

# df = pd.read_json("hf://datasets/mohsenfayyaz/misc/res_triviaqa_test_w_gs.jsonl", lines=True)
# df.to_json("./res_triviaqa_test_w_gs.jsonl", lines=True, orient="records")

df_dict = []
sorted_results = {k: dict(sorted(v.items(), key=lambda item: item[1], reverse=True)) for k, v in results.items()}
for key in tqdm(sorted_results.keys()):
    df_dict.append({
        "key": key,
        "query": queries[key],
        "gold_docs": [k for k, v in qrels[key].items()],
        "gold_docs_text": [corpus[k] for k, v in qrels[key].items()],
        "results": sorted_results[key],
        "predicted_docs_text_5": [corpus[k] for k, v in dict(list(sorted_results[key].items())[:5]).items()],
    })
df = pd.DataFrame(df_dict)
df.attrs['eval'] = {"ndcg": ndcg, "map": _map, "recall": recall, "precision": precision}
hf_path = f"hf://datasets/Retriever-Contextualization/datasets/{DATASET}/{MODEL.replace('/', '--')}_corpus{len(corpus)}.parquet"
df.to_parquet(hf_path)
print("UPLOADED:", hf_path)
df

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /local1/mohsenfayyaz/.hfcache/token
Login successful


  0%|          | 0/3452 [00:00<?, ?it/s]

hf://datasets/Retriever-Contextualization/datasets/nq/facebook--contriever-msmarco_corpus4201.parquet


Unnamed: 0,key,query,gold_docs,gold_docs_text,results,predicted_docs_text_5
0,test0,what is non controlling interest on balance sheet,"[doc0, doc1]","[{'text': 'In accounting, minority interest (or non-controlling interest) is the portion of a subsidiary corporation's stock that is not owned by the parent corporation. The magnitude of the minor...","{'doc0': 0.6976444125175476, 'doc1': 0.6396650075912476, 'doc52131': 0.5661276578903198, 'doc37281': 0.4707511365413666, 'doc103592': 0.4688764214515686, 'doc69384': 0.4668482542037964, 'doc21144'...","[{'text': 'In accounting, minority interest (or non-controlling interest) is the portion of a subsidiary corporation's stock that is not owned by the parent corporation. The magnitude of the minor..."
1,test1,how many episodes are in chicago fire season 4,[doc6],"[{'text': 'The fourth season of Chicago Fire, an American drama television series with executive producer Dick Wolf, and producers Derek Haas, Michael Brandt, and Matt Olmstead, was ordered on Feb...","{'doc6': 0.8000181913375854, 'doc23040': 0.6540364623069763, 'doc109460': 0.6424875259399414, 'doc25878': 0.6077133417129517, 'doc103787': 0.5803894400596619, 'doc68415': 0.5647706389427185, 'doc7...","[{'text': 'The fourth season of Chicago Fire, an American drama television series with executive producer Dick Wolf, and producers Derek Haas, Michael Brandt, and Matt Olmstead, was ordered on Feb..."
2,test2,who sings love will keep us alive by the eagles,[doc10],"[{'text': '""Love Will Keep Us Alive"" is a song written by Jim Capaldi, Paul Carrack, and Peter Vale, and produced by the Eagles, Elliot Scheiner, and Rob Jacobs. It was first performed by the Eagl...","{'doc10': 0.793368935585022, 'doc43761': 0.4912663698196411, 'doc33209': 0.4636683762073517, 'doc102649': 0.4517684876918793, 'doc12083': 0.45116615295410156, 'doc93272': 0.44938331842422485, 'doc...","[{'text': '""Love Will Keep Us Alive"" is a song written by Jim Capaldi, Paul Carrack, and Peter Vale, and produced by the Eagles, Elliot Scheiner, and Rob Jacobs. It was first performed by the Eagl..."
3,test3,who is the leader of the ontario pc party,"[doc17, doc18]","[{'text': 'Patrick Walter Brown MPP (born May 26, 1978) is a Canadian politician who is the leader of the Progressive Conservative Party of Ontario and Ontario's Leader of the Official Opposition....","{'doc18': 0.7136879563331604, 'doc17': 0.6641680598258972, 'doc1215': 0.47546347975730896, 'doc25699': 0.4669022560119629, 'doc51063': 0.4437679052352905, 'doc109336': 0.440820574760437, 'doc88621...","[{'text': 'In May 2015, Brown was elected leader of the Ontario PC Party, and stepped down as MP. He was elected Member of Provincial Parliament (MPP) for Simcoe North in a provincial by-election ..."
4,test4,nitty gritty dirt band fishin in the dark album,[doc42],"[{'text': '""Fishin' in the Dark"" is a song written by Wendy Waldman and Jim Photoglo and recorded by American country music group The Nitty Gritty Dirt Band. It was released in June 1987 as the se...","{'doc42': 0.7535380125045776, 'doc60487': 0.46759092807769775, 'doc3859': 0.4553712010383606, 'doc35855': 0.44975847005844116, 'doc86295': 0.44747328758239746, 'doc90474': 0.44683635234832764, 'do...","[{'text': '""Fishin' in the Dark"" is a song written by Wendy Waldman and Jim Photoglo and recorded by American country music group The Nitty Gritty Dirt Band. It was released in June 1987 as the se..."
...,...,...,...,...,...,...
3447,test3447,when is the met office leaving the bbc,[doc117531],"[{'text': 'On 23 August 2015, the BBC announced that the Met Office will lose its contract as the BBC is legally obliged to ensure that licence fee payers get the best value for money. MeteoGroup ...","{'doc117531': 0.7772749662399292, 'doc11580': 0.5452356338500977, 'doc93814': 0.49839359521865845, 'doc28754': 0.46202582120895386, 'doc58479': 0.4590458571910858, 'doc92028': 0.41726332902908325,...","[{'text': 'On 23 August 2015, the BBC announced that the Met Office will lose its contract as the BBC is legally obliged to ensure that licence fee payers get the best value for money. MeteoGroup ..."
3448,test3448,where does junior want to go to find hope,[doc117567],"[{'text': 'Throughout the novel, Junior shares his dreams with the readers. In the first chapter, he dreams of becoming a cartoon artist in order to get rich and escape the cycles of poverty and a...","{'doc117567': 0.5255041122436523, 'doc45843': 0.43827423453330994, 'doc45845': 0.4089242219924927, 'doc56023': 0.37942594289779663, 'doc86818': 0.37746402621269226, 'doc105381': 0.3739920258522033...","[{'text': 'Throughout the novel, Junior shares his dreams with the readers. In the first chapter, he dreams of becoming a cartoon artist in order to get rich and escape the cycles of poverty and a..."
3449,test3449,who does eric end up with in that 70s show,"[doc117643, doc117646]","[{'text': 'Regretting it instantly, Eric goes to find her to once again reconcile, and learns that she and Michael have taken off for California where they spend the remainder of the summer. Despi...","{'doc117646': 0.5370627641677856, 'doc106005': 0.5214886665344238, 'doc117643': 0.5081065893173218, 'doc81007': 0.47155511379241943, 'doc14083': 0.4462912678718567, 'doc106455': 0.4373847842216491...","[{'text': 'Due to Eric's departure from the show at the beginning of its eighth season, Eric was no longer the central focus of the show, though his character was still heavily used to influence e..."
3450,test3450,where does the great outdoors movie take place,"[doc117662, doc117663]","[{'text': 'The film follows two families spending time on vacation in Wisconsin.', 'title': 'The Great Outdoors (film)'}, {'text': 'Chicagoan Chester ""Chet"" Ripley, his wife, Connie, and their two...","{'doc117662': 0.7158812880516052, 'doc117663': 0.5027114152908325, 'doc114945': 0.49752727150917053, 'doc74463': 0.4890879988670349, 'doc83379': 0.47689753770828247, 'doc53915': 0.4764283895492553...","[{'text': 'The film follows two families spending time on vacation in Wisconsin.', 'title': 'The Great Outdoors (film)'}, {'text': 'Chicagoan Chester ""Chet"" Ripley, his wife, Connie, and their two..."


In [24]:
qrels["test0"]

{'doc0': 1, 'doc1': 1}