# Tutorial: Overview and Pre-Computed Resources of the lsr-benchmark

This tutorial provides an overview of the `lsr-benchmark` and on the pre-computed resources that you can re-use in your experiments.

## Step 1: Install the lsr-benchmark

In [None]:
!pip3 install lsr-benchmark

Obtaining file:///workspaces/lsr_bench
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Installing collected packages: lsr-benchmark
  Attempting uninstall: lsr-benchmark
    Found existing installation: lsr-benchmark 0.0.1rc3
    Uninstalling lsr-benchmark-0.0.1rc3:
      Successfully uninstalled lsr-benchmark-0.0.1rc3
  Running setup.py develop for lsr-benchmark
Successfully installed lsr-benchmark-0.0.1rc3
[0m

# Step 2: Get an Overview of the lsr-benchmark

The lsr-benchmark comes with pre-computed embeddings of queries and documents of many lsr embedding models on many datasets to allow to easily evaluate efficiency and effectiveness accross diverse retrieval datasets with high-quality relevance judgments without the need to access the datasets. The `lsr-benchmark overview` command provides a complete overview.

In [2]:
!lsr-benchmark overview

Overview of the lsr-benchmark:

	- 16 Datasets with 13 pre-computed embeddings (23151 MB)

Datasets:
                              Dataset    Text Avg. Embeddings
           clueweb09/en/trec-web-2009  587 MB          152 MB
           clueweb09/en/trec-web-2010  660 MB          127 MB
           clueweb09/en/trec-web-2011  376 MB           87 MB
           clueweb09/en/trec-web-2012  242 MB           69 MB
      clueweb12/b13/trec-misinfo-2019  226 MB           74 MB
              clueweb12/trec-web-2013  629 MB          124 MB
              clueweb12/trec-web-2014  632 MB          121 MB
  disks45/nocr/trec-robust-2004/fold1  154 MB           95 MB
  disks45/nocr/trec-robust-2004/fold2  147 MB           92 MB
  disks45/nocr/trec-robust-2004/fold3  154 MB           96 MB
  disks45/nocr/trec-robust-2004/fold4  148 MB           90 MB
  disks45/nocr/trec-robust-2004/fold5  159 MB           99 MB
  msmarco-passage/trec-dl-2019/judged    5 MB           27 MB
  msmarco-passage/trec-dl-2020/

# Step 3: Load pre-computed Embeddings with the ir-datasets integration

To easily run retrieval engines on datases that are difficult to process (e.g., due to their size, such as the ClueWebs, or due to restricted access, such as Robust04), all embeddings are public, and we have an ir_datasets compatible API to download and process them.

In [1]:
import ir_datasets
from lsr_benchmark import register_to_ir_datasets

register_to_ir_datasets("clueweb09/en/trec-web-2009")

dataset = ir_datasets.load("lsr-benchmark/clueweb09/en/trec-web-2009")

doc_embeddings = dataset.doc_embeddings(model_name="lightning-ir/webis/splade")
query_embeddings = dataset.query_embeddings(model_name="lightning-ir/webis/splade")

print("doc_embeddings[0]: ", doc_embeddings[0])
print("query_embeddings[0]: ", query_embeddings[0])

load doc embeddings: 100%|██████████| 109683/109683 [00:00<00:00, 1472573.83it/s]
load doc embeddings: 100%|██████████| 50/50 [00:00<00:00, 327168.80it/s]

doc_embeddings[0]:  ('clueweb09-en0001-78-11787', array(['1007', '1025', '1999', '2014', '2026', '2028', '2032', '2040',
       '2054', '2055', '2099', '2100', '2107', '2111', '2115', '2126',
       '2157', '2173', '2185', '2190', '2194', '2204', '2208', '2237',
       '2298', '2367', '2424', '2428', '2437', '2449', '2451', '2469',
       '2476', '2495', '2559', '2591', '2604', '2609', '2651', '2672',
       '2686', '2689', '2695', '2732', '2762', '2780', '2782', '2801',
       '2812', '2831', '2833', '2862', '2863', '2869', '2897', '2914',
       '2923', '2932', '2933', '2941', '3007', '3012', '3029', '3068',
       '3075', '3105', '3112', '3116', '3124', '3144', '3171', '3204',
       '3214', '3226', '3309', '3340', '3443', '3451', '3496', '3528',
       '3573', '3574', '3644', '3658', '3720', '3747', '3749', '3762',
       '3784', '3802', '3829', '3847', '3858', '3892', '3913', '3921',
       '3958', '3971', '4012', '4034', '4037', '4070', '4105', '4113',
       '4114', '4183', '418




# Step 4: Download pre-computed Embeddings via the CLI

To easily run retrieval experiments on datases that are difficult to process (e.g., due to their size, such as the ClueWebs, or due to restricted access, such as Robust04), all embeddings are public, you can directly download and process them.

The `lsr-benchmark download-embeddings` command downloads embeddings.

In [None]:
!lsr-benchmark download-embeddings \
    --embedding webis-splade \
    --dataset msmarco-passage/trec-dl-2019/judged \
    --out webis-splade-dl19-embeddings

webis-splade-dl19-embeddings


Next, we can inspect the embeddings:

In [4]:
!tree webis-splade-dl19-embeddings

[01;34mwebis-splade-dl19-embeddings[0m
├── [01;34mdoc[0m
│   ├── [00mdoc-embeddings.npz[0m
│   ├── [00mdoc-ids.txt[0m
│   └── [00mdoc-ir-metadata.yml[0m
└── [01;34mquery[0m
    ├── [00mquery-embeddings.npz[0m
    ├── [00mquery-ids.txt[0m
    └── [00mquery-ir-metadata.yml[0m

2 directories, 6 files


The `doc` subdirectory contains the document embeddings, the `query` subdirectory contains the query embeddings. The embeddings are persisted in the numpy format, and the corresponding `*-ir-metadata.yml` contain the efficiency measurements (e.g., resources like energy, CPU/RAM/GPU usage) for building the embeddings monitored with the [tirex-tracker](https://github.com/tira-io/tirex-tracker/).

# Step 5: Download Runs

To allow to compare retrieval engines you can download the public ones that are submitted to [tira](https://www.tira.io/task-overview/lsr-benchmark).

The `lsr-benchmark download-run` command downloads embeddings.

In [None]:
!lsr-benchmark download-run \
    --embedding webis-splade \
    --dataset msmarco-passage/trec-dl-2019/judged \
    --retrieval seismic \
    --out seismic-on-dl19-webis-splade

seismic-on-dl19-webis-splade


In [4]:
!lsr-benchmark download-run \
    --embedding webis-splade \
    --dataset msmarco-passage/trec-dl-2019/judged \
    --retrieval duckdb \
    --out duckdb-on-dl19-webis-splade

duckdb-on-dl19-webis-splade


# Step 6: Run Evaluation

In [8]:
!lsr-benchmark evaluate seismic-on-dl19-webis-splade duckdb-on-dl19-webis-splade

100%|█████████████████████████████████████████████| 2/2 [00:00<00:00,  4.10it/s]
                                                       seismic-on-dl19-webis-splade                       duckdb-on-dl19-webis-splade
index.runtime_wallclock                                                    20090 ms                                            390 ms
index.energy_total                                                              4.0                                               0.0
retrieval.runtime_wallclock                                                   34 ms                                           1033 ms
retrieval.energy_total                                                          0.0                                               0.0
embedding/doc.runtime_wallclock                                            61744 ms                                          61744 ms
embedding/doc.energy_total                                                  11270.0                                