
# Bayesian Optimization with Retrieval Optimizer

This notebook demonstrates how to use Bayesian optimization to tune Redis-based retrieval pipelines. Unlike a grid study—which tests all combinations—Bayesian optimization intelligently searches the configuration space, prioritizing promising settings based on previous results. This is especially useful when the number of possible configurations is large and exhaustive search would be too costly.

You'll define a study configuration, specify embedding models and search methods, and let the optimizer guide the search toward the best-performing retrieval setup.


## Dataset

We'll import a dataset from the [beir benchmark IR project](https://github.com/beir-cellar/beir) to get going quickly. 



In [None]:
# Load data
from redis_retrieval_optimizer.corpus_processors import eval_beir

# check the link above for different datasets to try
beir_dataset_name = "nfcorpus"

# Load sample data
corpus, queries, qrels = eval_beir.get_beir_dataset(beir_dataset_name)

## Study config

In this directory there is a yaml file containing a configuration for a bayesian study that looks like this:

```yaml
# path to data files for easy read
corpus: "data/nfcorpus_corpus.json"
queries: "data/nfcorpus_queries.json"
qrels: "data/nfcorpus_qrels.json"

index_settings:
  name: "optimize"
  vector_field_name: "vector" # name of the vector field to search on
  text_field_name: "text" # name of the text field for lexical search
  from_existing: false
  vector_dim: 384 # should match first embedding model or from_existing
  additional_fields:
      - name: "title"
        type: "text"

optimization_settings:
  # defines the options optimization can take
  metric_weights:
    f1_at_k: 1
    embedding_latency: 1
    total_indexing_time: 1
  algorithms: ["hnsw"]
  vector_data_types: ["float16", "float32"]
  distance_metrics: ["cosine"]
  n_trials: 10
  n_jobs: 1
  ret_k: [1, 10] # potential range of value to be sampled during study
  ef_runtime: [10, 20, 30, 50]
  ef_construction: [100, 150, 200, 250, 300]
  m: [8, 16, 64]


search_methods: ["vector", "lin_combo"]
embedding_models:
  - type: "hf"
    model: "sentence-transformers/all-MiniLM-L6-v2"
    dim: 384
    embedding_cache_name: "vec-cache" # avoid names with including 'ret-opt' as this can cause collisions
    dtype: "float32"
```

## Running a study

To run a study simple pass the path to config, redis_url, and corpus processing function to the `run_bayes_study` function and the package will take care of the rest. 

In [1]:
# run study

# add root redis_retrieval_optimizer to path until available on pypi
import sys
import os

# Get the current notebook directory
current_dir = os.path.dirname(os.path.abspath(''))

# Go up two directory levels (adjust the number as needed)
parent_dir = os.path.abspath(os.path.join(current_dir, '../..'))

# Add the parent directory to the Python path if it's not already there
if parent_dir not in sys.path:
    sys.path.insert(0, parent_dir)
    print(f"Added {parent_dir} to Python path")


import os
from redis_retrieval_optimizer.bayes_study import run_bayes_study
from redis_retrieval_optimizer.corpus_processors import eval_beir
from dotenv import load_dotenv

# load environment variables containing necessary credentials
load_dotenv()

redis_url = os.environ.get("REDIS_URL", "redis://localhost:6379/0")

metrics = run_bayes_study(
    config_path="bayes_study_config.yaml",
    redis_url=redis_url,
    corpus_processor=eval_beir.process_corpus
)

  from .autonotebook import tqdm as notebook_tqdm
[I 2025-05-13 15:46:11,327] A new study created in memory with name: test


15:46:12 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:46:12 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00,  6.72it/s]


Recreating index...


Batches: 100%|██████████| 1/1 [00:00<00:00,  7.63it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 26.78it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.27it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.36it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.48it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.66it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.51it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.62it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 15.76it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 39.52it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 45.51it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.76it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 45.63it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.70it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.45it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 43.07it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 39.99it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 45.73it/s]
Batches: 1

15:46:24 root INFO   Corpus size: 3633
15:46:27 root INFO   Data indexed total_indexing_time=1.959s, num_docs=3633


Batches: 100%|██████████| 1/1 [00:00<00:00, 10.71it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 18.73it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 17.40it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 17.46it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 110.24it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 17.35it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 15.44it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 102.50it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 118.84it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 16.79it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 110.53it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 101.39it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 105.26it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 16.98it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 99.06it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 104.84it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 108.04it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 105.96it/s]
B

15:46:35 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo'], 'total_indexing_time': [1.959], 'avg_query_time': [0.003249300522700921], 'model': ['sentence-transformers/all-MiniLM-L6-v2'], 'model_dim': [384], 'ret_k': [7], 'recall@k': [0.15498797328057845], 'ndcg@k': [0.20365114040360066], 'f1@k': [0.13071176354799932], 'precision': [0.24334365325077406], 'algorithm': ['hnsw'], 'ef_construction': [100], 'ef_runtime': [20], 'm': [16], 'distance_metric': ['cosine'], 'vector_data_type': ['float32']}


[I 2025-05-13 15:46:35,556] Trial 0 finished with value: 1.959 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'lin_combo', 'algorithm': 'hnsw', 'var_dtype': 'float32', 'distance_metric': 'cosine', 'ret_k': 7, 'ef_runtime': 20, 'ef_construction': 100, 'm': 16}. Best is trial 0 with value: 1.959.


15:46:35 redisvl.index.index INFO   Index already exists, overwriting.
15:46:35 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:46:35 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 48.92it/s]


Skip recreate
15:46:37 root INFO   Indexing progress: 0.6475963448549861
15:46:38 root INFO   Indexing progress: 0.909813269765594
15:46:39 root INFO   Indexing progress: 1
15:46:39 root INFO   Data indexed total_indexing_time=3.184s, num_docs=3633


15:46:40 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo', 'lin_combo'], 'total_indexing_time': [1.959, 3.184], 'avg_query_time': [0.003249300522700921, 0.0030338048196798507], 'model': ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2'], 'model_dim': [384, 384], 'ret_k': [7, 4], 'recall@k': [0.15498797328057845, 0.15498797328057845], 'ndcg@k': [0.20365114040360066, 0.2042288557951055], 'f1@k': [0.13071176354799932, 0.13071176354799932], 'precision': [0.24334365325077406, 0.24334365325077406], 'algorithm': ['hnsw', 'hnsw'], 'ef_construction': [100, 250], 'ef_runtime': [20, 20], 'm': [16, 64], 'distance_metric': ['cosine', 'cosine'], 'vector_d

[I 2025-05-13 15:46:40,941] Trial 1 finished with value: 3.184 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'lin_combo', 'algorithm': 'hnsw', 'var_dtype': 'float32', 'distance_metric': 'cosine', 'ret_k': 4, 'ef_runtime': 20, 'ef_construction': 250, 'm': 64}. Best is trial 1 with value: 3.184.


15:46:40 redisvl.index.index INFO   Index already exists, overwriting.
15:46:40 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:46:40 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 64.50it/s]


Skip recreate
15:46:42 root INFO   Indexing progress: 0.8626671963978281
15:46:43 root INFO   Indexing progress: 1
15:46:43 root INFO   Data indexed total_indexing_time=1.989s, num_docs=3633


15:46:45 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo', 'lin_combo', 'lin_combo'], 'total_indexing_time': [1.959, 3.184, 1.989], 'avg_query_time': [0.003249300522700921, 0.0030338048196798507, 0.0032537913543890136], 'model': ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2'], 'model_dim': [384, 384, 384], 'ret_k': [7, 4, 6], 'recall@k': [0.15498797328057845, 0.15498797328057845, 0.15498797328057845], 'ndcg@k': [0.20365114040360066, 0.2042288557951055, 0.2042288557951055], 'f1@k': [0.13071176354799932, 0.13071176354799932, 0.13071176354799932], 'precision': [0.24334365325077406, 0.24334365325077406, 0.24334365325077406], 'algorithm': ['hnsw', 'hnsw', 'hn

[I 2025-05-13 15:46:45,304] Trial 2 finished with value: 1.989 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'lin_combo', 'algorithm': 'hnsw', 'var_dtype': 'float32', 'distance_metric': 'cosine', 'ret_k': 6, 'ef_runtime': 20, 'ef_construction': 100, 'm': 16}. Best is trial 1 with value: 3.184.


15:46:45 redisvl.index.index INFO   Index already exists, overwriting.
15:46:45 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:46:45 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 71.06it/s]


Skip recreate
15:46:47 root INFO   Indexing progress: 0.9656999072970468
15:46:48 root INFO   Indexing progress: 1
15:46:48 root INFO   Data indexed total_indexing_time=1.761s, num_docs=3633


15:46:48 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo', 'lin_combo', 'lin_combo', 'vector'], 'total_indexing_time': [1.959, 3.184, 1.989, 1.761], 'avg_query_time': [0.003249300522700921, 0.0030338048196798507, 0.0032537913543890136, 0.0012082736189520395], 'model': ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2'], 'model_dim': [384, 384, 384, 384], 'ret_k': [7, 4, 6, 8], 'recall@k': [0.15498797328057845, 0.15498797328057845, 0.15498797328057845, 0.12593320709033942], 'ndcg@k': [0.20365114040360066, 0.2042288557951055, 0.2042288557951055, 0.16520467938671746], 'f1@k': [0.13071176354799932, 0.13071176354799932, 0

[I 2025-05-13 15:46:48,826] Trial 3 finished with value: 1.761 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'vector', 'algorithm': 'hnsw', 'var_dtype': 'float32', 'distance_metric': 'cosine', 'ret_k': 8, 'ef_runtime': 10, 'ef_construction': 100, 'm': 8}. Best is trial 1 with value: 3.184.


15:46:48 redisvl.index.index INFO   Index already exists, overwriting.
15:46:48 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:46:48 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 64.01it/s]


Recreating index...
15:46:49 root INFO   Corpus size: 3633
15:46:54 root INFO   Data indexed total_indexing_time=3.895s, num_docs=3633
15:46:55 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo', 'lin_combo', 'lin_combo', 'vector', 'vector'], 'total_indexing_time': [1.959, 3.184, 1.989, 1.761, 3.895], 'avg_query_time': [0.003249300522700921, 0.0030338048196798507, 0.0032537913543890136, 0.0012082736189520395, 0.0012043253187055559], 'model': ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2'], 'model_dim': [384, 384, 384, 384, 384], 'ret_k': [7, 4, 6, 8, 2], 'recall@k': [0.15498797328057845, 0.15498797328057845, 0.15498797328057845, 0.12593320709033942, 0.12396654346209525], 'ndcg@k': [0.20365114040360066, 0.2042288557951055, 0.2042288557951055, 0.16520467938671746, 

[I 2025-05-13 15:46:55,009] Trial 4 finished with value: 3.895 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'vector', 'algorithm': 'hnsw', 'var_dtype': 'float16', 'distance_metric': 'cosine', 'ret_k': 2, 'ef_runtime': 10, 'ef_construction': 150, 'm': 16}. Best is trial 4 with value: 3.895.


15:46:55 redisvl.index.index INFO   Index already exists, overwriting.
15:46:55 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:46:55 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 59.62it/s]


Recreating index...
15:46:56 root INFO   Corpus size: 3633
15:46:58 root INFO   Data indexed total_indexing_time=1.745s, num_docs=3633
15:46:59 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo', 'lin_combo', 'lin_combo', 'vector', 'vector', 'vector'], 'total_indexing_time': [1.959, 3.184, 1.989, 1.761, 3.895, 1.745], 'avg_query_time': [0.003249300522700921, 0.0030338048196798507, 0.0032537913543890136, 0.0012082736189520395, 0.0012043253187055559, 0.0013466804020175994], 'model': ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2'], 'model_dim': [384, 384, 384, 384, 384, 384], 'ret_k': [7, 4, 6, 8, 2, 4], 'recall@k': [0.15498797328057845, 0.15498797328057845, 0.15498797328057845, 0.12593320709033942, 0.12396654346209525, 0.13

[I 2025-05-13 15:46:59,495] Trial 5 finished with value: 1.745 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'vector', 'algorithm': 'hnsw', 'var_dtype': 'float32', 'distance_metric': 'cosine', 'ret_k': 4, 'ef_runtime': 20, 'ef_construction': 100, 'm': 8}. Best is trial 4 with value: 3.895.


15:46:59 redisvl.index.index INFO   Index already exists, overwriting.
15:46:59 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:46:59 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 51.99it/s]


Recreating index...
15:47:00 root INFO   Corpus size: 3633
15:47:05 root INFO   Data indexed total_indexing_time=4.108s, num_docs=3633
15:47:06 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo', 'lin_combo', 'lin_combo', 'vector', 'vector', 'vector', 'vector'], 'total_indexing_time': [1.959, 3.184, 1.989, 1.761, 3.895, 1.745, 4.108], 'avg_query_time': [0.003249300522700921, 0.0030338048196798507, 0.0032537913543890136, 0.0012082736189520395, 0.0012043253187055559, 0.0013466804020175994, 0.0012582271091709197], 'model': ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2'], 'model_dim': [384, 384, 384, 384, 384, 384, 384], 'ret_k': [7, 4, 6, 8, 2, 4, 3], 'recall@k': [0.15498797328057845

[I 2025-05-13 15:47:06,025] Trial 6 finished with value: 4.108 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'vector', 'algorithm': 'hnsw', 'var_dtype': 'float16', 'distance_metric': 'cosine', 'ret_k': 3, 'ef_runtime': 10, 'ef_construction': 300, 'm': 8}. Best is trial 6 with value: 4.108.


15:47:06 redisvl.index.index INFO   Index already exists, overwriting.
15:47:06 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:47:06 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 49.58it/s]


Recreating index...
15:47:07 root INFO   Corpus size: 3633
15:47:10 root INFO   Data indexed total_indexing_time=2.481s, num_docs=3633
15:47:11 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo', 'lin_combo', 'lin_combo', 'vector', 'vector', 'vector', 'vector', 'vector'], 'total_indexing_time': [1.959, 3.184, 1.989, 1.761, 3.895, 1.745, 4.108, 2.481], 'avg_query_time': [0.003249300522700921, 0.0030338048196798507, 0.0032537913543890136, 0.0012082736189520395, 0.0012043253187055559, 0.0013466804020175994, 0.0012582271091709197, 0.001436730287392442], 'model': ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2'], 'model_dim': [384, 384, 384, 384,

[I 2025-05-13 15:47:11,308] Trial 7 finished with value: 2.481 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'vector', 'algorithm': 'hnsw', 'var_dtype': 'float32', 'distance_metric': 'cosine', 'ret_k': 9, 'ef_runtime': 30, 'ef_construction': 150, 'm': 64}. Best is trial 6 with value: 4.108.


15:47:11 redisvl.index.index INFO   Index already exists, overwriting.
15:47:11 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:47:11 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 59.63it/s]


Skip recreate
15:47:13 root INFO   Indexing progress: 1
15:47:13 root INFO   Data indexed total_indexing_time=1.93s, num_docs=3633


15:47:15 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo', 'lin_combo', 'lin_combo', 'vector', 'vector', 'vector', 'vector', 'vector', 'lin_combo'], 'total_indexing_time': [1.959, 3.184, 1.989, 1.761, 3.895, 1.745, 4.108, 2.481, 1.93], 'avg_query_time': [0.003249300522700921, 0.0030338048196798507, 0.0032537913543890136, 0.0012082736189520395, 0.0012043253187055559, 0.0013466804020175994, 0.0012582271091709197, 0.001436730287392442, 0.002913703121267974], 'model': ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-

[I 2025-05-13 15:47:15,175] Trial 8 finished with value: 1.93 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'lin_combo', 'algorithm': 'hnsw', 'var_dtype': 'float32', 'distance_metric': 'cosine', 'ret_k': 7, 'ef_runtime': 50, 'ef_construction': 150, 'm': 8}. Best is trial 6 with value: 4.108.


15:47:15 redisvl.index.index INFO   Index already exists, overwriting.
15:47:15 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:47:15 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches: 100%|██████████| 1/1 [00:00<00:00, 64.35it/s]


Recreating index...
15:47:16 root INFO   Corpus size: 3633
15:47:22 root INFO   Data indexed total_indexing_time=5.348s, num_docs=3633
15:47:23 root INFO   Saving metrics for study: 4d500114-5e56-4b71-a60b-a71c5174f87f, METRICS={'search_method': ['lin_combo', 'lin_combo', 'lin_combo', 'vector', 'vector', 'vector', 'vector', 'vector', 'lin_combo', 'lin_combo'], 'total_indexing_time': [1.959, 3.184, 1.989, 1.761, 3.895, 1.745, 4.108, 2.481, 1.93, 5.348], 'avg_query_time': [0.003249300522700921, 0.0030338048196798507, 0.0032537913543890136, 0.0012082736189520395, 0.0012043253187055559, 0.0013466804020175994, 0.0012582271091709197, 0.001436730287392442, 0.002913703121267974, 0.004260091220631319], 'model': ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-MiniLM-L

[I 2025-05-13 15:47:24,001] Trial 9 finished with value: 5.348 and parameters: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'lin_combo', 'algorithm': 'hnsw', 'var_dtype': 'float16', 'distance_metric': 'cosine', 'ret_k': 10, 'ef_runtime': 50, 'ef_construction': 250, 'm': 64}. Best is trial 9 with value: 5.348.


Completed Bayesian optimization... 


Best Configuration: 9: {'model_info': {'type': 'hf', 'model': 'sentence-transformers/all-MiniLM-L6-v2', 'dim': 384, 'embedding_cache_name': 'vec-cache', 'dtype': 'float32'}, 'search_method': 'lin_combo', 'algorithm': 'hnsw', 'var_dtype': 'float16', 'distance_metric': 'cosine', 'ret_k': 10, 'ef_runtime': 50, 'ef_construction': 250, 'm': 64}:


Best Score: [5.348]




In [2]:
metrics[["search_method", "algorithm", "vector_data_type", "ef_construction", "ef_runtime", "m", "avg_query_time", "recall@k", "precision", "ndcg@k"]].sort_values(by="ndcg@k", ascending=False)

Unnamed: 0,search_method,algorithm,vector_data_type,ef_construction,ef_runtime,m,avg_query_time,recall@k,precision,ndcg@k
1,lin_combo,hnsw,float32,250,20,64,0.003034,0.154988,0.243344,0.204229
2,lin_combo,hnsw,float32,100,20,16,0.003254,0.154988,0.243344,0.204229
8,lin_combo,hnsw,float32,150,50,8,0.002914,0.154988,0.243344,0.203669
0,lin_combo,hnsw,float32,100,20,16,0.003249,0.154988,0.243344,0.203651
9,lin_combo,hnsw,float16,250,50,64,0.00426,0.154988,0.243344,0.203651
7,vector,hnsw,float32,150,30,64,0.001437,0.14828,0.241796,0.188781
5,vector,hnsw,float32,100,20,8,0.001347,0.139291,0.23096,0.179333
6,vector,hnsw,float16,300,10,8,0.001258,0.125178,0.224458,0.165261
3,vector,hnsw,float32,100,10,8,0.001208,0.125933,0.226316,0.165205
4,vector,hnsw,float16,150,10,16,0.001204,0.123967,0.225697,0.161683
