BM42 vs BM25 benchmark

Introduction

Download dataset:

wget https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/quora.zip
mkdir -p data
mv quora.zip data/

cd data
unzip quora.zip

Install dependencies

pip install -r requirements.txt

(Note: for gpu inference see fastembed)

BM25 (updated)

Bm25 version uses tantivy library for indexing and search.

python index_bm25.py

python evaluate-bm25.py

Results we got:

Total hits: 12065 out of 15675, which is 0.7696969696969697
Precision: 0.12065
Average precision: 0.12065
Average recall: 0.8952571817831299

BM25 with sparse vectors

Additionally, we compare pure sparse vectors implementation with BM25. It uses exactly the same tokenizer and stemmer as BM42, which provides a more fair comparison.

# Run qdrant
docker run --rm -d --network=host qdrant/qdrant:v1.10.0

python index_bm25_qdrant.py

python evaluate-bm25-qdrant.py

Results we got:

Total hits: 11151 out of 15675, which is 0.7113875598086125
Precision: 0.11151
Average precision: 0.1115100000000054
Average recall: 0.8321873943359426

BM42 - with `all-minilm-l6-v2` as a backbone

BM42 uses fastembed implementation for inference, and qdrant for indexing and search. IDF are calculated using inside Qdrant.

# Run qdrant
docker run --rm -d --network=host qdrant/qdrant:v1.10.0

python index_bm42.py

python evaluate-bm42.py

Results we got:

Total hits: 11488 out of 15675, which is 0.7328867623604466
Precision: 0.11488
Average precision: 0.11488000000000238
Average recall: 0.8515208038970792

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BM42 vs BM25 benchmark

Introduction

Install dependencies

BM25 (updated)

BM25 with sparse vectors

BM42 - with `all-minilm-l6-v2` as a backbone

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
README.md		README.md
evaluate-bm25-qdrant.py		evaluate-bm25-qdrant.py
evaluate-bm25.py		evaluate-bm25.py
evaluate-bm42.py		evaluate-bm42.py
index_bm25.py		index_bm25.py
index_bm25_qdrant.py		index_bm25_qdrant.py
index_bm42.py		index_bm42.py
requirements.txt		requirements.txt

qdrant/bm42_eval

Folders and files

Latest commit

History

Repository files navigation

BM42 vs BM25 benchmark

Introduction

Install dependencies

BM25 (updated)

BM25 with sparse vectors

BM42 - with all-minilm-l6-v2 as a backbone

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

BM42 - with `all-minilm-l6-v2` as a backbone

Packages