Understanding Splade

This repository contains the code for the paper: Exploring the Representation Power of SPLADE Models, ICTIR2023.

Installation

We modify the Tevatron toolkit for our experiments.

First install our version of tevatron by:

cd tevatron
pip install --editable .

Then, install all the dependencies requried by tevatron follow the tevatron instructions.

You also will need to install pyserini by pip install pyserini for indexing SPLADE encodings and evaluation.

The scripts that can reproduced our results in end-to-end manner are in the tevatron/examples/splade folder.

full_pipeline_bm25.sh: the full pipeline of getting bm25 baseline results.
full_pipeline_dense_retriever.sh: the full pipeline of getting DR baseline results.
full_pipeline_splade.sh: the full pipeline of original SPLADEv2 model training, encoding, indexing and retrieval.
full_pipeline_nostopwords.sh: SPLADEv2 that removed 150 stopwords from BERT vocabulary (no-stop).
full_pipline_splade_stopwords.sh: SPLADEv2 that only assigns weights to 150 stopwords (stop-150).
full_pipline_splade_randwords.sh: SPLADEv2 that only assigns weights to random tokens (random-150/786).
full_pipline_splade_lowfreq.sh: SPLADEv2 that only assigns weights to low-frequency tokens (lowfreq-150/786).
full_pipline_splade_added_latentwords.sh: SPLADEv2 that 150/768 latent tokens enlarged vocabulary (added-latent-150/786).
full_pipline_splade_latentwords.sh: SPLADEv2 that only assigns weights to 150/768 latent tokens (latent-150/786).

We also provide all the runs files along with evaluation script in the tevatron/examples/splade/runs folder.

You can directly get our results by running the following commands:

pip install ranx
cd tevatron/examples/splade/runs
python3 eval.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
tevatron		tevatron
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md