Skip to content
/ RANCE Public

This repository contains the code and trained models accompanying the paper "Learning Robust Dense Retrieval Models from Incomplete Relevance Labels" accepted at SIGIR'21.

License

Notifications You must be signed in to change notification settings

purble/RANCE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

Prafull Prakash, Julian Killingback, and Hamed Zamani

Recent deployment of efficient billion-scale approximate nearestneighbor (ANN) search algorithms on GPUs has motivated infor-mation retrieval researchers to develop neural ranking models thatlearn low-dimensional dense representations for queries and docu-ments and use ANN search for retrieval. However, optimizing thesedense retrieval models poses several challenges including negativesampling for (pair-wise) training. A recent model, called ANCE, suc-cessfully uses dynamic negative sampling using ANN search. Thispaper improves upon ANCE by proposing a robust negative sam-pling strategy for scenarios where the training data lacks completerelevance annotations. This is of particular importance as obtain-ing large-scale training data with complete relevance judgment isextremely expensive. Our model uses a small validation set withcomplete relevance judgments to accurately estimate a negativesampling distribution for dense retrieval models. We also explorepseudo-relevance feedback solutions for negative sampling duringtraining and model penalization for making “easy-to-avoid” mis-takes using a lexical matching signal. Our experiments of the TRECDeep Learning Track benchmarks demonstrate the effectiveness ofour solutions.

** Our code is built on top of the ANCE repository. So, please refer to it for detailed instructions regarding generating pre-processed datasets, running, and evaluating the code.

Dataset

Download and preprocess the TREC 2019 Deep Learning (DL) Track dataset as specified in original ANCE repository and split the test queries into two-folds, fold1 and fold2, at random. For us, the division of the test queries resulted in the following two-folds in terms of test query ids of the TREC 2019 Deep Learning (DL) Track dataset-

Documents Dataset

Fold1 Test Queries : {156493, 1110199, 130510, 573724, 527433, 1037798, 1121402, 1117099, 451602, 1112341, 104861, 1132213, 1114819, 183378, 1106007, 490595, 1103812, 87452, 855410, 19335, 1129237, 146187}

Fold2 Test Queries : {1063750, 489204, 1133167, 915593, 264014, 962179, 148538, 359349, 1115776, 131843, 833860, 207786, 1124210, 287683, 87181, 443396, 1114646, 47923, 405717, 182539, 1113437}

Passages Dataset

Fold1 Test Queries : {156493, 168216, 1037798, 1121402, 962179, 1117099, 148538, 451602, 1115776, 104861, 207786, 1114819, 490595, 1103812, 1121709, 87452, 855410, 19335, 182539, 1113437, 1129237, 146187}

Fold2 Test Queries : {1110199, 1063750, 130510, 489204, 573724, 1133167, 527433, 915593, 264014, 359349, 1112341, 131843, 833860, 183378, 1106007, 1124210, 87181, 443396, 1114646, 47923, 405717}

We could not use a part of the training dataset as validation dataset on account of incomplete relevance judgements.

RANCE-PRF-DEM

Note: The code is in the code/RANCE-PRF-DEM/ folder of this repository.

We have modified original ANCE code to sample negatives as per our proposed methodology. So, the directory structure and methodology to run the code remains the same, except that during each ANN-data generation step we sample negatives with the help of one of the folds as validation dataset and evluate the trained model on the other fold. The final scores are obtained by averaging the performance of two models evaluated on different test folds. We mainly modified the sampling strategy in the code/RANCE-PRFS-DEM/drivers/run_ann_data_gen.py file.

The tables below provides results of the two models trained on each of the folds and then evaluated on the other fold. Hyperlinks embedding in the header of the tables can be used to download our trained models.

Passage

Model_Fold1dev_Fold2test Model_Fold1test_Fold2dev Average Performance
Re-Rerank NDCG 0.672 0.689 0.681
Recall 0.619 0.734 0.676
MRR 1.0 0.932 0.966
Retreival NDCG 0.661 0.667 0.664
Recall 0.621 0.728 0.674
MRR 0.931 0.939 0.935

Document

Model_Fold1dev_Fold2test Model_Fold1test_Fold2dev Average Performance
Re-Rerank NDCG 0.704 0.655 0.68
Recall 0.334 0.297 0.315
MRR 0.922 0.918 0.92
Retreival NDCG 0.652 0.632 0.642
Recall 0.295 0.293 0.294
MRR 0.921 0.913 0.917

RANCE-PRF and RANCE

Note: The code is in the code/RANCE-PRF/ folder of this repository.

RANCE-PRF

For DEM implementation on top of ANCE code, we have made the following modifications-

  • We initially generated corpus level statistics like term frequency, document lengths, document frequency etc., required to compute BM25 score for input queries.
  • We added a BM25_helper object to code/RANCE-PRF/utils/utils.py that essentially loads these statistics at the start of the execution of code/RANCE-PRF/utils/run_ann_data_gen.py script.
  • In addition to sampling negatives as per our proposed strategy as employed in RANCE-PRF-DEM, we also compute the BM25 score for each training query and save it as a part of the updated dataset generated by code/RANCE-PRF/utils/run_ann_data_gen.py script for each new checkpoint.
  • We have modified the loss function formulation in code/RANCE-PRF/utils/run_ann.py as per DEM strategy.

RANCE

We obtain final scores for our proposed RANCE method using a model trained using RANCE-PRF strategy, and add PRF during evaluation for both re-ranking and retrieval tasks.

The tables below provides results of the two models trained using each of the folds as validation sets using the RANCE-PRF strategy and then evaluated on the other fold after incorporating PRF. All hyper-parameters are tuned based on the performance on the validation fold. Hyperlinks embedding in the header of the tables can be used to download our trained models.

Passage

Model_Fold1dev_Fold2test Model_Fold1test_Fold2dev Average Performance
Re-Rerank NDCG 0.696 0.708 0.702
Recall 0.619 0.734 0.676
MRR 0.976 0.931 0.954
Retreival NDCG 0.701 0.690 0.695
Recall 0.626 0.768 0.697
MRR 1.0 0.878 0.939

Document

Model_Fold1dev_Fold2test Model_Fold1test_Fold2dev Average Performance
Re-Rerank NDCG 0.704 0.699 0.702
Recall 0.350 0.299 0.325
MRR 0.901 0.915 0.908
Retreival NDCG 0.695 0.663 0.679
Recall 0.308 0.320 0.314
MRR 0.905 0.911 0.908

About

This repository contains the code and trained models accompanying the paper "Learning Robust Dense Retrieval Models from Incomplete Relevance Labels" accepted at SIGIR'21.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages