Dual Self-Teaching (DST) for Dense Retrieval

This repository contains the code for the paper Typo-Robust Representation Learning for Dense Retrieval, ACL 2023.

Installation

We provide a setup.sh script to install this repository and python dependency packages, run the following command:

sh setup.sh

(Optional) To evaluate the models using our evaluation script, you'll need to install the trec_eval.

git clone https://github.com/usnistgov/trec_eval.git
cd trec_eval
make

Download Model Checkpoints

We provide our finetuned model checkpoints for BERT-based DST-DPR and CharacterBERT-based DST-DPR.

Train Models

In case you want to train the models from scratch, we provide the training script as follows:

To train the BERT-based DST-DPR model, run the following command:

sh scripts/train_bert.sh

To train the CharacterBERT-based DST-DPR model, download the pre-trained CharacterBERT with this link, and run the following command:

sh scripts/train_characterbert.sh

Evaluation

In this section, we describe the steps to evaluate the BERT-based DST-DPR model on the MS MARCO and DL-typo passage ranking datasets.

First, we need to encode the passages and queries into dense vectors using the trained models, then retrieve the top-k passages for each query. To do so, run the following command:

sh scripts/retrieve_bert.sh

This should generate the msmarco_bert_embs folder containing dense vectors of passages and queries, and rank_bert folder containing the top-k passages for each query. To obtain the evaluation results, using the following command:

sh scripts/eval_bert.sh

Likewise, to evaluate the CharacterBERT-based DST-DPR model, use retrieve_characterbert.sh and eval_characterbert.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scripts		scripts
src/tevatron_dst		src/tevatron_dst
test_data		test_data
.gitignore		.gitignore
DST_Pipeline.png		DST_Pipeline.png
LICENSE		LICENSE
README.md		README.md
eval_msmarco.py		eval_msmarco.py
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dual Self-Teaching (DST) for Dense Retrieval

Installation

Download Model Checkpoints

Train Models

Evaluation

About

Releases

Packages

Languages

License

panuthept/DST-DenseRetrieval

Folders and files

Latest commit

History

Repository files navigation

Dual Self-Teaching (DST) for Dense Retrieval

Installation

Download Model Checkpoints

Train Models

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages