# Setup

In [2]:
from rar.retrievers import BM25Retriever, DenseRetriever, AggregationType
from rar.encoders import HuggingFaceEncoder
from rar.utils import load_corpus, load_eval_dataset, evaluate_retriever

# Evaluate BM25 on entity retrieval

In [2]:
domain = 'entity_retrieval'
corpus = 'dbpedia_small'
dataset = 'dbpedia_small'

docs, referrals = load_corpus(domain, corpus)
queries, ground_truth = load_eval_dataset(domain, dataset)

Loading full dbpedia_small dataset on entity_retrieval domain...


In [3]:
print('Without referral augmentation...')
retriever = BM25Retriever(docs)
# multiple_correct refers to this dataset having multiple ground truths
evaluate_retriever(retriever, queries, ground_truth, multiple_correct=True)
print()

print('With referral augmentation...')
retriever = BM25Retriever(docs, referrals, aggregation=AggregationType.CONCAT)
evaluate_retriever(retriever, queries, ground_truth, multiple_correct=True)
print()

Without referral augmentation...
Recall@1: 0.6 +/- 0.058554004376911994
Recall@10: 0.6428571428571429 +/- 0.05727026612409093

With referral augmentation...
Recall@1: 0.5857142857142857 +/- 0.058876755202770775
Recall@10: 0.6714285714285714 +/- 0.056139144084518006



# Evaluate BM25 on paper retrieval

In [4]:
domain = 'paper_retrieval'
corpus = 'acl_small'
dataset = 'acl_small'

docs, referrals = load_corpus(domain, corpus)
queries, ground_truth = load_eval_dataset(domain, dataset)

Loading full acl_small dataset on paper_retrieval domain...


In [5]:
print('Without referral augmentation...')
retriever = BM25Retriever(docs)
evaluate_retriever(retriever, queries, ground_truth)
print()

print('With referral augmentation...')
retriever = BM25Retriever(docs, referrals, aggregation=AggregationType.CONCAT)
evaluate_retriever(retriever, queries, ground_truth)
print()

Without referral augmentation...
MRR: 0.20353690476190475 +/- 0.007746342940904251
Recall@1: 0.1375 +/- 0.00770044641563072
Recall@10: 0.3695 +/- 0.010792815897623752

With referral augmentation...
MRR: 0.32728710317460313 +/- 0.008669156027702959
Recall@1: 0.216 +/- 0.009201738966086791
Recall@10: 0.5855 +/- 0.011015664982196943



# Evaluate Specter on paper retrieval

In [19]:
encoder = HuggingFaceEncoder('allenai/specter')

print('Without referral augmentation...')
retriever = DenseRetriever(encoder, docs, verbose=True)
evaluate_retriever(retriever, queries, ground_truth)
print()

print('With referral augmentation...')
retriever = DenseRetriever(encoder, docs, referrals, aggregation=AggregationType.CONCAT)
evaluate_retriever(retriever, queries, ground_truth)
print()

Downloading (…)lve/main/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/321 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/222k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Without referral augmentation...
Encoding corpus...
Not using referral augmentation
Took 2148.4282710552216 seconds
MRR: 0.13613373015873015 +/- 0.006454279448139866
Recall@1: 0.0835 +/- 0.006185780063985462
Recall@10: 0.28 +/- 0.010039920318408906

With referral augmentation...
MRR: 0.16907440476190477 +/- 0.007043126165034891
Recall@1: 0.1055 +/- 0.006869124762296867
Recall@10: 0.3405 +/- 0.010596219844831457



# Qualitative example

One peculiar fact about the SWAG paper (Zellers et al 2018) is that it uses BERT's hyperparameters. What if we ask a retriever to recall this fact?

In [5]:
retriever_no_rar = BM25Retriever(docs)
retriever_no_rar.retrieve('paper that the SWAG paper took hyperparameters from', num_docs=5)

['[TITLE] A Latent Variable Model Approach to PMI-based Word Embeddings [ABSTRACT] Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods. Many use nonlinear operations on co-occurrence statistics, and have hand-tuned hyperparameters and reweighting methods. This paper proposes a new generative model, a dynamic version of the log-linear topic model of Mnih and Hinton (2007) . The methodological novelty is to use the prior to compute closed form expressions for word statistics. This provides a theoretical justification for nonlinear models like PMI, word2vec, and GloVe, as well as some hyperparameter choices. It also helps explain why lowdimensional semantic embeddings contain linear algebraic structure that allows solution of word analogies, as shown by Mikolov et al. (2013a) and many subsequent papers. Experimental support is provided for the generative model assumptions, the most important of which is that latent word vectors are fai

Without referrals, we get nonsense. With referrals, we get the most relevant papers as BERT and SWAG, as desired.

In [6]:
retriever = BM25Retriever(docs, referrals)
retriever.retrieve('paper that the SWAG paper took hyperparameters from', num_docs=5)

['[TITLE] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [ABSTRACT] We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018; Radford et al., 2018) , BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-theart models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7% (5.6% absolute i