## Grid Study

In this example, we’ll walk through how to define a study config to compare retrieval performance using BM25 and vector search.

### Data Requirements

The most challenging part of any retrieval evaluation is often preparing a high-quality dataset. The Retrieval Optimizer is flexible—it supports a variety of formats—but to get started, your data should include three components: a **corpus**, a set of **queries**, and corresponding **qrels** (query relevance labels).

#### Corpus

The corpus is the collection of documents your search system will index. Each entry should include the core searchable text and can optionally include other fields like a title or metadata.

**General structure:**

```json
{
    "corpus_id": {
        "text": "text to be searched or vectorized",
        "title": "optional associated title"
    }
}
```

**Example:**

```json
{
    "MED-10": {
        "text": "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence...",
        "title": "Statin Use and Breast Cancer Survival: A Nationwide Cohort Study from Finland"
    }
}
```

#### Queries

These are the search inputs that will be run against the corpus to evaluate system performance.

**General structure:**

```json
{
    "query_id": "query text"
}
```

**Example:**

```json
{
    "PLAIN-2": "Do Cholesterol Statin Drugs Cause Breast Cancer?",
    "PLAIN-12": "Exploiting Autophagy to Live Longer"
}
```

#### Qrels

Qrels define which corpus entries are considered relevant for each query. These are used to compute metrics like precision, recall, F1, and NDCG.

**General structure:**

```json
{
    "query_id": {
        "corpus_id": "relevance score"
    }
}
```

**Example:**

```json
{
    "PLAIN-2": {
        "MED-2427": 2,
        "MED-2440": 1,
        "MED-2434": 1
    },
    "PLAIN-12": {
        "MED-2513": 2,
        "MED-5237": 2
    }
}
```

*Note:* For most basic metrics, a binary relevance label (e.g., 0 or 1) is sufficient. Graded scores (e.g., 1 or 2) are used in metrics that account for ranking quality like NDCG.


## Installation

In [None]:
%pip install redis-retrieval-optimizer

### Sourcing Data

To make it easier to get started, this example uses datasets from the excellent [BEIR project](https://github.com/beir-cellar/beir), a benchmark suite for information retrieval. The Retrieval Optimizer includes helpers for working with BEIR datasets out of the box.

For custom or domain-specific use cases, you can create your own dataset following the same format. Even a small sample of labeled queries and relevant documents can be a valuable starting point—and you can optionally use a language model to expand these examples for broader testing.

In [5]:
from redis_retrieval_optimizer.corpus_processors import eval_beir

# check the link above for different datasets to try
beir_dataset_name = "nfcorpus"
data_folder = "data"

# Load sample data
corpus, queries, qrels = eval_beir.get_beir_dataset(beir_dataset_name)

09:44:07 beir.datasets.data_loader INFO   Loading Corpus...


100%|██████████| 3633/3633 [00:00<00:00, 163265.62it/s]

09:44:07 beir.datasets.data_loader INFO   Loaded 3633 TEST Documents.
09:44:07 beir.datasets.data_loader INFO   Doc Example: {'text': 'Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear. We evaluated risk of breast cancer death among statin users in a population-based cohort of breast cancer patients. The study cohort included all newly diagnosed breast cancer patients in Finland during 1995–2003 (31,236 cases), identified from the Finnish Cancer Registry. Information on statin use before and after the diagnosis was obtained from a national prescription database. We used the Cox proportional hazards regression method to estimate mortality among statin users with statin use as time-dependent variable. A total of 4,151 participants had used statins. During the median follow-up of 3.25 years after the diagnosis (rang




Now that we have our data we will save it locally to the gitignored `data/` folder

In [6]:
import os

os.makedirs(data_folder, exist_ok=True)

In [7]:
import json

with open(f"data/{beir_dataset_name}_corpus.json", "w") as f:
    json.dump(corpus, f)

with open(f"data/{beir_dataset_name}_queries.json", "w") as f:
    json.dump(queries, f)

with open(f"data/{beir_dataset_name}_qrels.json", "w") as f:
    json.dump(qrels, f)

# Define a study config

To set the parameters of our study we need to define a study configuration file. In this directory there is a sample config which looks like the following:

In [8]:
import yaml

with open("grid_study_config.yaml", "r") as f:
    study_config = yaml.safe_load(f)

study_config

{'corpus': 'data/nfcorpus_corpus.json',
 'queries': 'data/nfcorpus_queries.json',
 'qrels': 'data/nfcorpus_qrels.json',
 'index_settings': {'name': 'optimize',
  'vector_field_name': 'vector',
  'text_field_name': 'text',
  'from_existing': False,
  'additional_fields': [{'name': 'title', 'type': 'text'}],
  'vector_dim': 384},
 'embedding_models': [{'type': 'hf',
   'model': 'sentence-transformers/all-MiniLM-L6-v2',
   'dim': 384,
   'embedding_cache_name': 'vec-cache'}],
 'search_methods': ['bm25', 'vector', 'hybrid', 'rerank', 'weighted_rrf']}

## Available search methods

The available search methods are defined in `redis_retrieval_optimizer.search_methods.__init__` and you can see the active `SEARCH_METHOD_MAP` which maps the string input in search_method to the corresponding function.

You can define your own SEARCH_METHOD_MAP and pass it in to define your custom retrieval logic.

# Run a study

Now we can simply pass in the path to our study_config file and our pre-defined corpus processor function and let the retrieval optimizer do the rest. 

In [9]:
import os
from redis_retrieval_optimizer.grid_study import run_grid_study
from redis_retrieval_optimizer.corpus_processors import eval_beir
from dotenv import load_dotenv

# load environment variables containing necessary credentials
load_dotenv()

redis_url = os.environ.get("REDIS_URL", "redis://localhost:6379/0")

metrics = run_grid_study(
    config_path="grid_study_config.yaml",
    redis_url=redis_url,
    corpus_processor=eval_beir.process_corpus
)

09:44:18 redisvl.index.index INFO   Index already exists, overwriting.
09:44:20 datasets INFO   PyTorch version 2.7.0 available.
09:44:20 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
09:44:20 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00,  4.24it/s]


Recreating: loading corpus from file


Batches: 100%|██████████| 1/1 [00:00<00:00,  3.40it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 23.10it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 35.10it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 34.05it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 35.29it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 35.31it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 36.32it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 34.74it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 35.49it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 35.30it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 33.90it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 32.68it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 34.80it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 32.85it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 32.53it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 32.51it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 31.33it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 34.10it/s]
Batches: 1

09:44:35 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
09:44:35 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 70.46it/s]


Running search method: bm25
Running search method: vector


Batches: 100%|██████████| 1/1 [00:00<00:00,  8.86it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 13.62it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 14.23it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 13.73it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 71.40it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 10.52it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 11.34it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 69.86it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 74.59it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 10.79it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 71.67it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 64.90it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 76.15it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 11.67it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 68.21it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 72.10it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 69.05it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 70.09it/s]
Batches: 1

Running search method: hybrid
Running search method: rerank
09:44:48 sentence_transformers.cross_encoder.CrossEncoder INFO   Use pytorch device: mps


Batches: 100%|██████████| 1/1 [00:00<00:00,  4.76it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  9.33it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  8.53it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  9.58it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  8.69it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 53.58it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 50.42it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 46.29it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 52.56it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.63it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  6.49it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 43.11it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.31it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 47.22it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 45.04it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 52.53it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 48.92it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 53.96it/s]
Batches: 1

Running search method: weighted_rrf


In [10]:
metrics[["search_method", "model", "avg_query_time", "recall@k", "precision", "ndcg@k"]].sort_values(by="ndcg@k", ascending=False)

Unnamed: 0,search_method,model,avg_query_time,recall@k,precision,ndcg@k
4,weighted_rrf,sentence-transformers/all-MiniLM-L6-v2,0.002997,0.164964,0.244582,0.212325
3,rerank,sentence-transformers/all-MiniLM-L6-v2,0.170745,0.166997,0.25387,0.203366
2,hybrid,sentence-transformers/all-MiniLM-L6-v2,0.001872,0.154988,0.243344,0.202778
1,vector,sentence-transformers/all-MiniLM-L6-v2,0.003879,0.154988,0.243344,0.196586
0,bm25,sentence-transformers/all-MiniLM-L6-v2,0.001269,0.138766,0.281526,0.1914
