# Demo on TREC 2019 Deep Learning track queries

## Installation

In [None]:
%pip install -q pyterrier

In [None]:
%pip install -q git+https://github.com/emory-irlab/pyterrier_genrank.git

## Setup


In [1]:
import pyterrier as pt
from rerank import LLMReRanker

# ir_datasets can provide the raw text of MSMARCO passage corpus
dataset = pt.get_dataset('irds:msmarco-passage')

# download an index for MSMARCO 
bm25 = pt.terrier.Retriever.from_dataset("msmarco_passage", "terrier_stemmed", wmodel="BM25")

llm_reranker = LLMReRanker("castorini/rank_vicuna_7b_v1")

genrank_pipeline = bm25 % 100 >>  pt.text.get_text(dataset, 'text') >> llm_reranker
genrank_pipeline


Java started (triggered by Retriever.from_dataset) and loaded: pyterrier.java.colab, pyterrier.java, pyterrier.java.24, pyterrier.terrier.java [version=5.11 (build: craig.macdonald 2025-01-13 21:29), helper_version=0.0.8]
`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


0,1,2
qid,str,(Query ID) ID of query in frame
query,str,Query text

0,1
index_location,<org.terrier.querying.IndexRef at 0x7f2af966d3b0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x13924910 at 0x7f2af97fb190>>
num_results,1000
metadata,['docno']
wmodel,BM25
threads,1
verbose,False
terrierql,on
parsecontrols,on
parseql,on
applypipeline,on

0,1,2
qid,str,(Query ID) ID of query in frame
docid,int,(Internal Document ID) Integer ID of document in a specific index
docno,str,(External Document ID) String ID of document in collection
rank,int,Ranking order of document to query (lower=better)
score,float,Ranking score of document to query (higher=better)
query,str,Query text

0,1
k,100

0,1,2
qid,str,(Query ID) ID of query in frame
docid,int,(Internal Document ID) Integer ID of document in a specific index
docno,str,(External Document ID) String ID of document in collection
rank,int,Ranking order of document to query (lower=better)
score,float,Ranking score of document to query (higher=better)
query,str,Query text

0,1
dataset,IRDSDataset('msmarco-passage')
fields,['text']
verbose,False

0,1,2
qid,str,(Query ID) ID of query in frame
docid,int,(Internal Document ID) Integer ID of document in a specific index
docno,str,(External Document ID) String ID of document in collection
rank,int,Ranking order of document to query (lower=better)
score,float,Ranking score of document to query (higher=better)
query,str,Query text
text,str,Document text

0,1,2
qid,str,(Query ID) ID of query in frame
docid,int,(Internal Document ID) Integer ID of document in a specific index
docno,str,(External Document ID) String ID of document in collection
rank,int,Ranking order of document to query (lower=better)
score,float,Ranking score of document to query (higher=better)
query,str,Query text
text,str,Document text
score_0,,
docno_orig,,


In [2]:
genrank_pipeline.search('best places to have Indian food')

  0%|          | 0/1 [00:00<?, ?window/s]

Unnamed: 0,qid,docid,docno,rank_orig,score_0,query,text,score,rank
4,1,2775081,2775081,4,25.496048,best places to have Indian food,Spice Fine Indian Cuisine is a casual fine din...,1.000000,0
5,1,2672185,2672185,5,25.477749,best places to have Indian food,Masala Zone Covent Garden. Welcome to London’s...,0.500000,1
1,1,3692448,3692448,1,26.163756,best places to have Indian food,New York City's most diverse borough is also o...,0.333333,2
0,1,3181971,3181971,0,27.902429,best places to have Indian food,Calgary’s Chinatown is home to some of the bes...,0.250000,3
36,1,8217994,8217994,36,21.603263,best places to have Indian food,We heard London serves up some of the best Ind...,0.200000,4
...,...,...,...,...,...,...,...,...,...
92,1,6012386,6012386,92,20.210934,best places to have Indian food,This food will only last a few days. 10 days i...,0.010417,95
96,1,830479,830479,96,20.096468,best places to have Indian food,Calories in Homemade Indian Fry Bread. Find nu...,0.010309,96
97,1,1001753,1001753,97,20.080106,best places to have Indian food,Information about the Chickasaw Indians for st...,0.010204,97
98,1,720162,720162,98,20.078024,best places to have Indian food,This food will keep for about 3 days in the re...,0.010101,98


# Evaluation

In [3]:
dataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')
from pyterrier.measures import *
pt.Experiment(
    [
        bm25%100,
        genrank_pipeline
    ],
    dataset.get_topics(),
    dataset.get_qrels(),
    eval_metrics =[ nDCG@10, RR(rel=2), AP(rel=2)],
    names= ["BM25", "BM25 >> rank_vicuna"],
    precompute_prefix=True # only compute BM25 once
)

Precomputing results of 43 topics on shared pipeline component TerrierRetr(BM25)


  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

  0%|          | 0/1 [00:00<?, ?window/s]

Unnamed: 0,name,nDCG@10,AP(rel=2),RR(rel=2)
0,BM25,0.47954,0.232201,0.641565
1,BM25 >> rank_vicuna,0.670539,0.327995,0.874031
