# LLM-rankers

This jupyter notebook uses [llm-rankers](https://github.com/ielab/llm-rankers) to re-rank documents in the context [ReNeuIR 2024](https://reneuir.org/). Please look at the [corresponding publications](https://github.com/ielab/llm-rankers#references) for more details.

### Step 1: Import dependencies

In [1]:
import torch
from tqdm import tqdm
from tira.third_party_integrations import ir_datasets, persist_and_normalize_run
from llmrankers.setwise import SetwiseLlmRanker
from llmrankers.rankers import SearchResult
import os


### Step 2: Load and Transform the Dataset

In [2]:
# the "from tira.third_party_integrations import ir_datasets" import patches "ir_datasets.load"
# so that it loads the dataset injected into the tira sandbox when executed within the sandbox.
# I.e., we only ensure that it runs on a minimal spot-check dataset here.
dataset = ir_datasets.load('reneuir-2024/re-rank-spot-check-20240624-training')

In [3]:
qid_to_rerank_data = {}

for i in dataset.scoreddocs_iter():
    if i.query_id not in qid_to_rerank_data:
        qid_to_rerank_data[i.query_id] = {'query': i.query.default_text(), 'search_results': []}
    
    qid_to_rerank_data[i.query_id]['search_results']  += [
        SearchResult(docid=i.doc_id, text=i.document.default_text(), score=i.score)
    ]


### Step 3: Re-Rank

In [4]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Will use device:', device)

# We make the model injectable into the notebook so that we can run multiple models with the same code.
model = os.environ.get('MODEL', 'google/flan-t5-small')
print('Model is:', model)

ranker = SetwiseLlmRanker(model_name_or_path=model, tokenizer_name_or_path=model, device=device)

print('Ranker is', ranker)

Will use device: cpu
Model is: google/flan-t5-small


You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Ranker is <llmrankers.setwise.SetwiseLlmRanker object at 0x76f96ce63d30>


In [8]:
run = []

for qid in tqdm(qid_to_rerank_data):
    query = qid_to_rerank_data[qid]['query']
    search_results = qid_to_rerank_data[qid]['search_results']

    for i in ranker.rerank(query, search_results):
        run += [{"qid": qid, "score": i.score, "docno": i.docid}]


### Step 4: Persist run file

persist_and_normalize_run(run, system_name=f'llm-rankers-{model}', default_output='.')