# Ensemble Query Engine Guide

Oftentimes when building a RAG applications there are many retreival parameters/strategies to decide from (from chunk size to vector vs. keyword vs. hybrid search, for instance).

Thought: what if we could try a bunch of strategies at once, and have any AI/reranker/LLM prune the results?

This achieves two purposes:
- Better (albeit more costly) retrieved results by pooling results from multiple strategies, assuming the reranker is good
- A way to benchmark different retrieval strategies against each other (w.r.t reranker)

This guide showcases this over the Llama 2 paper. We do ensemble retrieval over different chunk sizes and also different indices.

**NOTE**: A closely related guide is our [Ensemble Retrievers Guide](https://gpt-index.readthedocs.io/en/stable/examples/retrievers/ensemble_retrieval.html) - make sure to check it out! 

## Setup

Here we define the necessary imports.

In [1]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

In [2]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().handlers = []
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import (
    VectorStoreIndex,
    SummaryIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
)
from llama_index.response.notebook_utils import display_response
from llama_index.llms import OpenAI

Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
NumExpr defaulting to 8 threads.


## Load Data

In this section we first load in the Llama 2 paper as a single document. We then chunk it multiple times, according to different chunk sizes. We build a separate vector index corresponding to each chunk size.

In [None]:
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"

In [3]:
from pathlib import Path
from llama_index import Document
from llama_hub.file.pymu_pdf.base import PyMuPDFReader

In [4]:
loader = PyMuPDFReader()
docs0 = loader.load(file_path=Path("./data/llama2.pdf"))
doc_text = "\n\n".join([d.get_content() for d in docs0])
docs = [Document(text=doc_text)]

Here we try out different chunk sizes: 128, 256, 512, and 1024.

In [5]:
# initialize service context (set chunk size)
llm = OpenAI(model="gpt-4")
chunk_sizes = [128, 256, 512, 1024]
service_contexts = []
nodes_list = []
vector_indices = []
query_engines = []
for chunk_size in chunk_sizes:
    print(f"Chunk Size: {chunk_size}")
    service_context = ServiceContext.from_defaults(chunk_size=chunk_size, llm=llm)
    service_contexts.append(service_context)
    nodes = service_context.node_parser.get_nodes_from_documents(docs)

    # add chunk size to nodes to track later
    for node in nodes:
        node.metadata["chunk_size"] = chunk_size
        node.excluded_embed_metadata_keys = ["chunk_size"]
        node.excluded_llm_metadata_keys = ["chunk_size"]

    nodes_list.append(nodes)

    # build vector index
    vector_index = VectorStoreIndex(nodes)
    vector_indices.append(vector_index)

    # query engines
    query_engines.append(vector_index.as_query_engine())

Chunk Size: 128
Chunk Size: 256
Chunk Size: 512
Chunk Size: 1024


## Define Ensemble Retriever

We setup an "ensemble" retriever primarily using our recursive retrieval abstraction. This works like the following:
- Define a separate `IndexNode` corresponding to the vector retriever for each chunk size (retriever for chunk size 128, retriever for chunk size 256, and more)
- Put all IndexNodes into a single `SummaryIndex` - when the corresponding retriever is called, *all* nodes are returned.
- Define a Recursive Retriever, with the root node being the summary index retriever. This will first fetch all nodes from the summary index retriever, and then recursively call the vector retriever for each chunk size.
- Rerank the final results.

The end result is that all vector retrievers are called when a query is run.

In [69]:
# try ensemble retrieval

from llama_index.tools import RetrieverTool
from llama_index.schema import IndexNode

# retriever_tools = []
retriever_dict = {}
retriever_nodes = []
for chunk_size, vector_index in zip(chunk_sizes, vector_indices):
    node_id = f"chunk_{chunk_size}"
    node = IndexNode(
        text=f"Retrieves relevant context from the Llama 2 paper (chunk size {chunk_size})",
        index_id=node_id
    )
    retriever_nodes.append(node)
    retriever_dict[node_id] = vector_index.as_retriever()

Define recursive retriever.

In [70]:
from llama_index.selectors.pydantic_selectors import PydanticMultiSelector
# from llama_index.retrievers import RouterRetriever
from llama_index.retrievers import RecursiveRetriever
from llama_index import SummaryIndex

# the derived retriever will just retrieve all nodes
summary_index = SummaryIndex(retriever_nodes)

retriever = RecursiveRetriever(
    root_id="root",
    retriever_dict={
        "root": summary_index.as_retriever(),
        **retriever_dict
    }
)

Let's test the retriever on a sample query.

In [71]:
nodes = await retriever.aretrieve(
    "Tell me about the main aspects of safety fine-tuning"
)

In [None]:
print(f'Number of nodes: {len(nodes)}')
for node in nodes:
    print(node.node.metadata["chunk_size"])
    print(node.node.get_text())

Define reranker to process the final retrieved set of nodes.

In [10]:
# define reranker
from llama_index.indices.postprocessor import (
    LLMRerank,
    SentenceTransformerRerank,
    CohereRerank,
)

# reranker = LLMRerank()
# reranker = SentenceTransformerRerank(top_n=10)
reranker = CohereRerank(top_n=10)

Define retriever query engine to integrate the recursive retriever + reranker together.

In [11]:
# define RetrieverQueryEngine
from llama_index.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine(retriever, node_postprocessors=[reranker])

In [12]:
response = query_engine.query(
    "Tell me about the main aspects of safety fine-tuning"
)

In [None]:
display_response(
    response, show_source=True, source_length=500, show_source_metadata=True
)

### Analyzing the Relative Importance of each Chunk

One interesting property of ensemble-based retrieval is that through reranking, we can actually use the ordering of chunks in the final retrieved set to determine the importance of each chunk size. For instance, if certain chunk sizes are always ranked near the top, then those are probably more relevant to the query.

In [14]:
# compute the average precision for each chunk size based on positioning in combined ranking
from collections import defaultdict
import pandas as pd


def mrr_all(metadata_values, metadata_key, source_nodes):
    # source nodes is a ranked list
    # go through each value, find out positioning in source_nodes
    value_to_mrr_dict = {}
    for metadata_value in metadata_values:
        mrr = 0
        for idx, source_node in enumerate(source_nodes):
            if source_node.node.metadata[metadata_key] == metadata_value:
                mrr = 1 / (idx + 1)
                break
            else:
                continue

        # normalize AP, set in dict
        value_to_mrr_dict[metadata_value] = mrr

    df = pd.DataFrame(value_to_mrr_dict, index=["MRR"])
    df.style.set_caption("Mean Reciprocal Rank")
    return df

In [15]:
# Compute the Mean Reciprocal Rank for each chunk size (higher is better)
# we can see that chunk size of 256 has the highest ranked results.
print("Mean Reciprocal Rank for each Chunk Size")
mrr_all(chunk_sizes, "chunk_size", response.source_nodes)

Mean Reciprocal Rank for each Chunk Size


Unnamed: 0,128,256,512,1024
MRR,0.333333,1.0,0.5,0.25


## Evaluation

We more rigorously evaluate how well an ensemble retriever works compared to the "baseline" retriever.

We define/load an eval benchmark dataset and then run different evaluations over it.

In [16]:
from llama_index.evaluation import (
    DatasetGenerator,
    QueryResponseDataset,
)
from llama_index import ServiceContext
from llama_index.llms import OpenAI
import nest_asyncio

nest_asyncio.apply()

In [17]:
# NOTE: run this if the dataset isn't already saved
eval_service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-4"))
# generate questions from the largest chunks (1024)
dataset_generator = DatasetGenerator(
    nodes_list[-1],
    service_context=eval_service_context,
    show_progress=True,
    num_questions_per_chunk=2,
)

In [None]:
eval_dataset = await dataset_generator.agenerate_dataset_from_nodes(num=60)

In [None]:
eval_dataset.save_json("data/llama2_eval_qr_dataset.json")

In [18]:
# optional
eval_dataset = QueryResponseDataset.from_json("data/llama2_eval_qr_dataset.json")

### Compare Results

In [19]:
import asyncio
import nest_asyncio

nest_asyncio.apply()

In [20]:
from llama_index.evaluation import (
    CorrectnessEvaluator,
    SemanticSimilarityEvaluator,
    RelevancyEvaluator,
    FaithfulnessEvaluator,
    PairwiseComparisonEvaluator,
)

# NOTE: can uncomment other evaluators
# evaluator_c = CorrectnessEvaluator(service_context=eval_service_context)
evaluator_s = SemanticSimilarityEvaluator(service_context=eval_service_context)
# evaluator_r = RelevancyEvaluator(service_context=eval_service_context)
evaluator_f = FaithfulnessEvaluator(service_context=eval_service_context)

pairwise_evaluator = PairwiseComparisonEvaluator(service_context=eval_service_context)

In [24]:
from llama_index.evaluation.eval_utils import (
    get_responses
)
from llama_index.evaluation import BatchEvalRunner

eval_qs = eval_dataset.questions
qr_pairs = eval_dataset.qr_pairs
ref_response_strs = [r for (_, r) in qr_pairs]

base_query_engine = query_engines[-1]

In [None]:
base_pred_responses = get_responses(eval_qs[:max_samples], base_query_engine, show_progress=True)

In [None]:
pred_responses = get_responses(eval_qs[:max_samples], query_engine, show_progress=True)

In [None]:
pred_responses

In [32]:
import numpy as np

pred_response_strs = [str(p) for p in pred_responses]
base_pred_response_strs = [str(p) for p in base_pred_responses]

In [73]:
evaluator_dict = {
    "semantic_similarity": evaluator_s,
    "faithfulness": evaluator_f,
}
batch_runner = BatchEvalRunner(evaluator_dict, workers=1, show_progress=True)

In [74]:
eval_results = await batch_runner.aevaluate_responses(
    queries=eval_qs[:max_samples], responses=pred_responses[:max_samples], reference=ref_response_strs[:max_samples]
)




  0%|                                                                                | 0/60 [00:00<?, ?it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=998 request_id=f7af93706e3a6002ff0014375ba78555 response_code=200





  2%|█▏                                                                      | 1/60 [00:01<01:06,  1.13s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=50 request_id=7eadd9bb1920b956685e47c89e71f432 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=61 request_id=8ec6b09d9681fc88bc4ce846f4c70fdf response_code=200





  3%|██▍                                                                     | 2/60 [00:01<00:48,  1.20it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1118 request_id=13718b45400e9841812321c287e2bfd5 response_code=200





  5%|███▌                                                                    | 3/60 [00:03<00:59,  1.04s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1004 request_id=bc0ec23b86b298d4d43486a164735653 response_code=200





  7%|████▊                                                                   | 4/60 [00:04<01:01,  1.09s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1226 request_id=d75c481aa1df9c5ad15b4e1ddcde68ff response_code=200





  8%|██████                                                                  | 5/60 [00:05<01:09,  1.27s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=25 request_id=383f6e29973d8c524d2795d3efc79f28 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=47 request_id=c9fd347b916b1c8e54223adde19b63f4 response_code=200





 10%|███████▏                                                                | 6/60 [00:06<01:00,  1.12s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=79 request_id=82412027958337c6dd49274429386dff response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=46 request_id=5641065649af247911a9028a23c1ecf1 response_code=200





 12%|████████▍                                                               | 7/60 [00:07<00:47,  1.12it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=941 request_id=47885d9f8816b56089c95869ddead93c response_code=200





 13%|█████████▌                                                              | 8/60 [00:08<00:51,  1.02it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=34 request_id=3d5f455c84382c2eeca0842cc405564b response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=26 request_id=3845667fe8dcc50fc0c9fde65565ceaa response_code=200





 15%|██████████▊                                                             | 9/60 [00:08<00:40,  1.25it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=45 request_id=b5298c1648a8688670cf3c3b66af8b3e response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=33 request_id=8fa442cee8600b950536401956bd1a86 response_code=200





 17%|███████████▊                                                           | 10/60 [00:08<00:33,  1.50it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=31 request_id=6a418d7c5cc3f099962a66fc3217f033 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=32 request_id=524f3cb446fe2e2b4021642f662e6194 response_code=200





 18%|█████████████                                                          | 11/60 [00:09<00:28,  1.72it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=34 request_id=4371c989a5e5ae4d31846ea8fbe91674 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=44 request_id=614c6c5e65270983cd7c6df68515176d response_code=200





 20%|██████████████▏                                                        | 12/60 [00:09<00:24,  1.92it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=34 request_id=382a6aaffa5267d8e6831fe23e6d4ab2 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=56 request_id=50bcb6dae8a431dce0cea94bdf7ca07b response_code=200





 22%|███████████████▍                                                       | 13/60 [00:10<00:23,  1.98it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1032 request_id=e6cac7db17ff72d89b657335acee9994 response_code=200





 23%|████████████████▌                                                      | 14/60 [00:11<00:34,  1.34it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1441 request_id=31919e13e72b082537790c658c03fd36 response_code=200





 25%|█████████████████▊                                                     | 15/60 [00:13<00:45,  1.02s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=932 request_id=1dafba62c4f266333f3378d9a8376b29 response_code=200





 27%|██████████████████▉                                                    | 16/60 [00:14<00:47,  1.07s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=26 request_id=29c31dae96f26558ff1646c320c3c87e response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=31 request_id=f61e33f4ef13b95c78dde5f3a5492b7d response_code=200





 28%|████████████████████                                                   | 17/60 [00:14<00:39,  1.08it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=64 request_id=a66b82f069a0048dcec93a99bbf9740a response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=68 request_id=6409a0fbe3910cc67bd7497d4ef5d0d4 response_code=200





 30%|█████████████████████▎                                                 | 18/60 [00:15<00:33,  1.25it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=50 request_id=d400010b471f9534c55e10179af2a71f response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=36 request_id=59f3e5e42de4a369a59ecf6912caf13d response_code=200





 32%|██████████████████████▍                                                | 19/60 [00:15<00:28,  1.42it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=938 request_id=318601862133a8ce28c4515b512d4f33 response_code=200





 33%|███████████████████████▋                                               | 20/60 [00:17<00:36,  1.09it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=857 request_id=4b8f40060c678f7fe75c3f84c62b1a26 response_code=200





 35%|████████████████████████▊                                              | 21/60 [00:18<00:43,  1.13s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=38 request_id=810ad60bf437682844068253ffc29adf response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=32 request_id=12856f139dff50e2916a22cbb26b4b31 response_code=200





 37%|██████████████████████████                                             | 22/60 [00:19<00:34,  1.09it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=50 request_id=0f259d59e0f9d4f4e5157fe3925779b6 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=39 request_id=c2bd822e6398effae0fee799dacc65bd response_code=200





 38%|███████████████████████████▏                                           | 23/60 [00:19<00:28,  1.29it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=983 request_id=2563d6e0874513b15fa727492e51f743 response_code=200





 40%|████████████████████████████▍                                          | 24/60 [00:21<00:34,  1.06it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1233 request_id=d647479fca665aa9308846225b2076d2 response_code=200





 42%|█████████████████████████████▌                                         | 25/60 [00:22<00:37,  1.07s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=46 request_id=8dab844464a881f883ebaa0b7cc45b89 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=77 request_id=0ae7c6cbd43de576fc142aa196ef9ea9 response_code=200





 43%|██████████████████████████████▊                                        | 26/60 [00:22<00:29,  1.14it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=931 request_id=680b0590ab66d9b460642ba20b1ba901 response_code=200





 45%|███████████████████████████████▉                                       | 27/60 [00:24<00:33,  1.03s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=921 request_id=874a4dbf19d8c83fdfc16bce6a485cd5 response_code=200





 47%|█████████████████████████████████▏                                     | 28/60 [00:25<00:33,  1.05s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1016 request_id=09c4beb18e3d5d021b8857a10b72c2a4 response_code=200





 48%|██████████████████████████████████▎                                    | 29/60 [00:26<00:34,  1.11s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=34 request_id=3eae089126e24d5eca63c7d08309631d response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=32 request_id=02fac29cf05c81c0d0614ae2c9ce6d38 response_code=200





 50%|███████████████████████████████████▌                                   | 30/60 [00:27<00:26,  1.12it/s][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=a17caf3120104af6173fe14bf6ef26e0 response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=972 reque




 52%|████████████████████████████████████▋                                  | 31/60 [00:32<01:03,  2.19s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=25 request_id=8f14ad93ae0c95712eb072a645067d70 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=26 request_id=8467172ab7fefd8156ce9a72282f1d8a response_code=200





 53%|█████████████████████████████████████▊                                 | 32/60 [00:32<00:45,  1.63s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=34 request_id=7490f22f4246600a17cb84eafd6b0fe3 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=25 request_id=f7cee4d6f93e6c4c0f04b866efae3c9f response_code=200





 55%|███████████████████████████████████████                                | 33/60 [00:33<00:33,  1.25s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=28c397ed08363a83ee83122556c944ed response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None requ




 57%|████████████████████████████████████████▏                              | 34/60 [00:43<01:41,  3.89s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=42 request_id=bad240171a1d17066d8bee8d4ded6064 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=54 request_id=db5b0f53dcb803446f4d93142bd85b40 response_code=200





 58%|█████████████████████████████████████████▍                             | 35/60 [00:43<01:11,  2.86s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=26 request_id=fb2e984a4f5075aabe6790371f69ca76 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=30 request_id=7c0c8cb69896346e2a0539ee8fb11694 response_code=200





 60%|██████████████████████████████████████████▌                            | 36/60 [00:43<00:50,  2.12s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=945 request_id=7f42073ed600d3a5be65370b8ca195d9 response_code=200





 62%|███████████████████████████████████████████▊                           | 37/60 [00:44<00:41,  1.81s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=76701281d4f006768898b1e45bf78b90 response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None requ




 63%|████████████████████████████████████████████▉                          | 38/60 [00:54<01:29,  4.06s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=42 request_id=d9ff23d89add22db18a521289bd5be0d response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=39 request_id=4a0ed7638744f3c8ce3a56f62f8619c6 response_code=200





 65%|██████████████████████████████████████████████▏                        | 39/60 [00:54<01:02,  2.96s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=f6161dcff7163926c177d658ed3e4ad9 response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=869 reque




 67%|███████████████████████████████████████████████▎                       | 40/60 [00:59<01:12,  3.63s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=f08113c27136b82248f9958f5d0620f2 response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1007 requ




 68%|████████████████████████████████████████████████▌                      | 41/60 [01:05<01:18,  4.12s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=44 request_id=73e8ccf79ae0eccff874c78dba823ef0 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=64 request_id=9d7aeac9320b50c7a6ac83e44278575b response_code=200





 70%|█████████████████████████████████████████████████▋                     | 42/60 [01:05<00:54,  3.00s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=71 request_id=229f6261e823cabab7d607832aebcfda response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=36 request_id=e7058e23a4181d1d372178882ce8b972 response_code=200





 72%|██████████████████████████████████████████████████▉                    | 43/60 [01:06<00:38,  2.25s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=886 request_id=be8e8d5b83f55595846971059c20aad1 response_code=200





 73%|████████████████████████████████████████████████████                   | 44/60 [01:07<00:30,  1.90s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=62a7d3ffda0bdc0b735269ad9a6df56b response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None requ




 75%|█████████████████████████████████████████████████████▎                 | 45/60 [01:16<01:02,  4.16s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=912f2c224f17f6617b8a90f8ff42f0b0 response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=935 reque




 77%|██████████████████████████████████████████████████████▍                | 46/60 [01:21<01:03,  4.53s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=28 request_id=4471c16491e7cd8c8ea90c44003d848d response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=27 request_id=1a4636a76f230c9d73f725717c4a1846 response_code=200





 78%|███████████████████████████████████████████████████████▌               | 47/60 [01:22<00:43,  3.31s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=89937c2b5f99b6b20981a20e3244260f response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=945 reque




 80%|████████████████████████████████████████████████████████▊              | 48/60 [01:27<00:46,  3.89s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=6a553bbc899d047b8627fc0c1f9ac948 response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=978 reque




 82%|█████████████████████████████████████████████████████████▉             | 49/60 [01:32<00:47,  4.32s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=c933d4e6a5826608effbf15cedd5eb89 response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1016 requ




 83%|███████████████████████████████████████████████████████████▏           | 50/60 [01:38<00:46,  4.67s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=160e2db6de2b10fd81d0490d38ae7b53 response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1322 requ




 85%|████████████████████████████████████████████████████████████▎          | 51/60 [01:44<00:44,  4.95s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=29 request_id=4bd6655c7c3e198ca4547046825f8dcb response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=56 request_id=9f9e0e2c7ecac9c6b45664886f3a8dc2 response_code=200





 87%|█████████████████████████████████████████████████████████████▌         | 52/60 [01:44<00:28,  3.58s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=39 request_id=bbef7d46f4b0d30214d1d2e5d8710b97 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=28 request_id=1b7b61b851d02aa59ea23ddeb5988e33 response_code=200





 88%|██████████████████████████████████████████████████████████████▋        | 53/60 [01:44<00:18,  2.64s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=d66972daabee95da16fd6bb4301f6e99 response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=923 reque




 90%|███████████████████████████████████████████████████████████████▉       | 54/60 [01:50<00:20,  3.40s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=23 request_id=d2b0fb9785d3b31143873a0b944755bd response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=33 request_id=db2d18ee0a07b932ec79afcda07dffc7 response_code=200





 92%|█████████████████████████████████████████████████████████████████      | 55/60 [01:50<00:12,  2.49s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=None request_id=e7e5b9f66aa34e30803f306fc3e523ab response_code=429
error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-4 in organization org-1ZDAvajC6v2ZtAP9hLEIsXRz on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=974 reque




 93%|██████████████████████████████████████████████████████████████████▎    | 56/60 [01:56<00:13,  3.45s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=133 request_id=1b41f96be6939eb306114dbf1079117f response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=32 request_id=2c5620353a335dd2052da70bd5e0bed0 response_code=200





 95%|███████████████████████████████████████████████████████████████████▍   | 57/60 [01:56<00:07,  2.56s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=29 request_id=94bc4197d472fdfecdfb7c7b10f60192 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=52 request_id=29ddb72c072b8d90844415e310a45fb9 response_code=200





 97%|████████████████████████████████████████████████████████████████████▋  | 58/60 [01:56<00:03,  1.89s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=68 request_id=d29a5a64c290727c1b8c98ab216782f7 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=41 request_id=d43570eb0186f99f61be96643eba6f24 response_code=200





 98%|█████████████████████████████████████████████████████████████████████▊ | 59/60 [01:57<00:01,  1.44s/it][A[A[A

message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=50 request_id=7070d0da0a98488d4c65776fa219bed9 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=34 request_id=b9ca0382e269fbf347c21d9d684bdd72 response_code=200





100%|███████████████████████████████████████████████████████████████████████| 60/60 [01:57<00:00,  1.96s/it][A[A[A


In [None]:
base_eval_results = await batch_runner.aevaluate_responses(
    queries=eval_qs[:max_samples], responses=base_pred_responses[:max_samples], reference=ref_response_strs[:max_samples]
)

In [None]:
batch_runner = BatchEvalRunner({"pairwise": pairwise_evaluator}, workers=3, show_progress=True)

pairwise_eval_results = await batch_runner.aevaluate_response_strs(
    queries=eval_qs[:max_samples], response_strs=pred_response_strs[:max_samples], reference=base_pred_response_strs[:max_samples]
)

In [62]:
from collections import defaultdict
def display_results(eval_results_list, names, metric_keys):
    metric_dict = defaultdict(list)
    metric_dict["names"] = names
    for metric_key in metric_keys:
        for eval_results in eval_results_list:
            mean_score = np.array([r.score for r in eval_results[metric_key]]).mean()
            metric_dict[metric_key].append(mean_score)
    return pd.DataFrame(metric_dict)

In [63]:
display_results([eval_results, base_eval_results], ["Ensemble Retriever", "Base Retriever"], ["semantic_similarity"])

Unnamed: 0,names,semantic_similarity
0,Ensemble Retriever,0.918733
1,Base Retriever,0.925547


In [64]:
display_results([pairwise_eval_results], ["Pairwise Comparison"], ["pairwise"])

Unnamed: 0,names,pairwise
0,Pairwise Comparison,0.583333
