# Coherence-based QPP Predictors for Dense Retrieval

In this notebook, we demonstrate how we obtain our results for the QPP Dense Coherence-based predictors, namely pairRatio and A-pairRatio. Below, we provide some example form of retrieval results for a given retrieval method. However, the reader can produce their own results files and replace the corresponding csv files in the arguments. 

First, we import the necessary libraries.

In [2]:
import pandas as pd
import numpy as np
import torch
from scipy import stats
from scipy.stats import spearmanr,kendalltau
from scipy.spatial.distance import cdist,cosine
from scipy.spatial import distance_matrix
from scipy import spatial
from math import sqrt, log
from pyterrier.measures import *
from sklearn.metrics.pairwise import cosine_similarity

  from .autonotebook import tqdm as notebook_tqdm


Make sure you have installed pyterrier.

In [14]:
#%pip install --upgrade git+https://github.com/terrier-org/pyterrier.git 
import pyterrier as pt
pt.init()

PyTerrier 0.10.1 has loaded Terrier 5.9 (built by craigm on 2024-05-02 17:40) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


In [15]:
##make sure you have the laterst version of pyterrier
pt.__version__

'0.10.1'

To use the dense retrieval results for TCT-ColBERT and ANCE, please make sure to find the corresponding index from http://data.terrier.org. Now, you can load the per query results for TREC DL 2019 and 2020.

In [69]:
per_query_results = pd.read_csv('per_query_results_TCT_19.csv')
per_query_results_2020 = pd.read_csv('per_query_results_TCT_20.csv')
per_query_results.head()

Unnamed: 0,name,qid,measure,value
0,Compose(TctColBert('castorini/tct_colbert-v2-h...,1037798,nDCG@10,0.370031
1,Compose(TctColBert('castorini/tct_colbert-v2-h...,1037798,AP(rel=2)@100,0.178571
2,Compose(TctColBert('castorini/tct_colbert-v2-h...,1037798,RR(rel=2)@10,1.0
3,Compose(TctColBert('castorini/tct_colbert-v2-h...,104861,nDCG@10,1.0
4,Compose(TctColBert('castorini/tct_colbert-v2-h...,104861,AP(rel=2)@100,0.514095


### Dense Coherence-based predictors

Now, we show how we calculate our proposed predictors, assuming the retrieved results are obtained for each retrieval method. The results should have the following form, where query vec is the embedded representation of the query, which doc_embs corresponds to the document embedding vectors.

In [70]:
res_list = pd.read_csv('TCT_results_19_predictor.csv')
res_list.head()

Unnamed: 0,qid,query,query_vec,docno,score,docid,rank,doc_embs,text
0,156493,do goldfish grow,[ 9.27510634e-02 3.18724334e-01 2.06860676e-...,2928707,81.193695,2928707,0,[ 1.00064516e-01 1.95674345e-01 3.05602074e-...,Goldfish Only Grow to the Size of Their Enclos...
1,156493,do goldfish grow,[ 9.27510634e-02 3.18724334e-01 2.06860676e-...,1960257,81.0518,1960257,1,[ 8.52653831e-02 2.10671186e-01 3.27536911e-...,Goldfish Only Grow to the Size of Their Enclos...
2,156493,do goldfish grow,[ 9.27510634e-02 3.18724334e-01 2.06860676e-...,1960255,81.0241,1960255,2,[ 7.92131126e-02 2.00234219e-01 2.42912844e-...,Rating Newest Oldest. Best Answer: Goldfish do...
3,156493,do goldfish grow,[ 9.27510634e-02 3.18724334e-01 2.06860676e-...,8182162,80.989685,8182162,3,[ 5.08242361e-02 2.15213522e-01 3.14078063e-...,"Depending on his type and his environment, gol..."
4,156493,do goldfish grow,[ 9.27510634e-02 3.18724334e-01 2.06860676e-...,2612493,80.74293,2612493,4,[ 9.98763293e-02 1.84747636e-01 2.59946853e-...,"In clean, uncrowded conditions in tanks or pon..."


First, we define function for pairRatio.

In [40]:
def pair_ratio(df_embs, lim_1, lim_2, lim_3, measure_x, per_query_res):
    rows = []
    
    for qid, group in pt.tqdm(df_embs.groupby('qid'), unit='q'):
        embs_list = torch.stack(group.doc_embs.tolist())
        W_mat = pd.DataFrame(cosine_similarity(embs_list,dense_output=True))
        mean_top = W_mat.iloc[0:lim_1, 0:lim_1].values.mean()
        mean_bottom = W_mat.iloc[lim_2:lim_3, lim_2:lim_3].values.mean()
        mean_vs = mean_top/mean_bottom
        rows.append([qid,mean_vs])
    df_sim = pd.DataFrame(rows, columns=['qid', 'mean_vs'])
    merged = df_sim.merge(per_query_res, on = 'qid')
    merged = merged[merged.measure==measure_x]
    #corr_pearson = stats.pearsonr(merged['value'], merged['mean_vs'])[0]
    #corr_spearman = spearmanr(merged['value'], merged['mean_vs']).correlation
    corr_kendall = kendalltau(merged['value'], merged['mean_vs'])
    print (corr_kendall)

In the above function, lim_3 corresponds to the rank cutoff of the retrieved results, lim_1 is where the upper matrix stops, and lim_2 is where the lower matrix starts. For measure_x, replace with the metric of interest from the per_query_results, choose AP(rel=2)@100, nDCG@10, or RR(rel=2)@10. Finally, df_embs is the retrieved result list, which contains an embedded representation of each retrieved document in the list (per query) and an embedded representation of the query vectors.

To test pairRatio, use the following lines. Here we use AP@100 for a cutoff at rank 100. Replace with other metrics to see what happens to NDCG@1- and MRR@10, or cutoffs by adjusting the limits. You can also use the results for TREC DL20 using per_query_results_2020. To get a different correlation metric, simply uncomment the lines for pearson and spearman's correlation.

Here, we demonstrate a test with a rank cutoff of 100. For a top-50 results list, the corresponding lim_1 and lim_2 will be in intervals of 5 from 5 to 35, while lim_3 would be 50.

Now, we test pairRatio as follows:

In [None]:
for lim1 in [10, 20, 30, 40, 50, 60, 70, 80]:
    print("lim1 %d" % lim1)
    for lim2 in [10, 20, 30, 40, 50, 60, 70, 80, 90]:
        print("lim2 %d" % lim2)
        pair_ratio(combined_100, lim1, lim2, 100, 'AP(rel=2)@100', per_query_results)
        print("")

In [None]:
def adjusted_pair_ratio(df_embs, lim_1, lim_2, lim_3, measure_x, per_query_res):
    rows = []
    
    for qid, group in pt.tqdm(df_embs.groupby('qid'), unit='q'):
        embs_list = np.vstack(group.doc_embs.tolist())
        query_embs = group.iloc[0].query_vec
        score_list = group.score
        W_mat = cosine_similarity(embs_list,dense_output=True)
        score_exp = np.expand_dims(score_list,axis=1)
        pair_mat = np.dot(score_exp,score_exp.T)
        weighted_mat = W_mat@pair_mat
        W_mat_new = pd.DataFrame(weighted_mat)
        mean_top = W_mat_new.iloc[0:lim_1, 0:lim_1].values.mean()
        mean_bottom = W_mat_new.iloc[lim_2:lim_3, lim_2:lim_3].values.mean()
        mean_vs = mean_top/mean_bottom
        rows.append([qid,mean_vs])     
    df_sim = pd.DataFrame(rows, columns=['qid', 'mean_vs']) 
    merged = df_sim.merge(per_query_res, on = 'qid')
    merged = merged[merged.measure==measure_x]
    #corr_person = stats.pearsonr(merged['value'], merged['mean_vs'])[0]
    #corr_spearman = spearmanr(merged['value'], merged['mean_vs']).correlation
    corr_kendall = kendalltau(merged['value'], merged['mean_vs'])
    print(corr_kendall)

To test A-pairRatio, use the following lines.

In [None]:
for lim1 in [10, 20, 30, 40, 50, 60, 70, 80]:
    print("lim1 %d" % lim1)
    for lim2 in [10, 20, 30, 40, 50, 60, 70, 80, 90]:
        print("lim2 %d" % lim2)
        adjusted_pair_ratio(combined_100, lim1, lim2, 100, 'AP(rel=2)@100', per_query_results)
        print("")

### Top1(monoT5)

On top of our dense coherence-based predictors, we propose a baseline predictor on the supervised side. Here, we provide some information on how we obtain it.

First, install the pyterrier plugin for Mono and Duo T5 from https://github.com/terrierteam/pyterrier_t5. 

In [None]:
pip install --upgrade git+https://github.com/terrierteam/pyterrier_t5.git

In [None]:
from pyterrier_t5 import MonoT5ReRanker, DuoT5ReRanker

In [None]:
monoT5 = MonoT5ReRanker()

Then, assuming an example index file, we define a retrieval pipeline for the dr model.

In [None]:
model = pyterrier_dr.TctColBert('castorini/tct_colbert-v2-hnp-msmarco')
index = pyterrier_dr.NumpyIndex("example_index_file", docids=True)
retr_pipeline = model >> index

Now, we update the pipeline by getting the document embeddings (first line). Then, in the second line, we define the cross-encoder pipeline for the proposed predictor.

In [None]:
new_pipe = retr_pipeline >> pt.apply.doc_embs(get_embs) >> pt.text.get_text(pt.get_dataset('irds:msmarco-passage'), 'text')
cross_encoder_pipe = new_pipe %1 >> monoT5 >> pt.apply.qpp_monot5(lambda df: df["score"])

We then transform the queries to get the final results, and we merge with the evaluation metrics to get the final correlation.

In [None]:
allres_t5 = cross_encoder_pipe.transform(test_topics)

merged = allres_t5.merge(per_query_results)
merged = merged[merged.measure=='AP(rel=2)@100']
corr_kendall = kendalltau(merged['value'], merged['qpp_monot5'])
#corr_pearson = stats.pearsonr(merged['value'], merged['qpp_monot5'])[0]
#corr_spearman = spearmanr(merged['value'], merged['qpp_monot5']).correlation
print(corr_kendall)

Simply uncomment the corresponding lines to get the pearson and spearman's correlations.