### Experiment:  Performance of HYDE prompts on context retrieval distance & response time

**Background:**
HYDE (Hypothetical Document Embedding) is an approach that has been indicated to improve the relevancy and comprehensiveness of RAG context retrieval by expanding a provided query prompt into a hypothetical document that can be used for context retrieval.  Different formulations of prompts may impact LLM response time and resulting embeddings.

**Test Approach**
A sample of questions will be selected from QA corpus.  Documents will be retrieved using 2 different prompts.  REesults from prompts 1, 2, and no-prompt will be compared via cosine similarity distance.


In [2]:
# Common import
from deh.assessment import QASetRetriever
from deh import settings
from deh.eval import generate_experiment_dataset

import pandas as pd
import os
from pathlib import Path

  from .autonotebook import tqdm as notebook_tqdm


#### Test Configuration

In [3]:
num_samples:int = 100
experiment_folder:str = "../../data/evaluation/hyde-prompt-experiment/"
qa_data_set_file:str = "../../data/qas/squad_qas.tsv"

# Create experiment folder:
if not os.path.exists(experiment_folder):
    Path(experiment_folder).mkdir(parents=True, exist_ok=True)


#### Sample QA dataset

In [8]:
qa_set = QASetRetriever.get_qasets(
    file_path = qa_data_set_file,
    sample_size= num_samples
)

print(f"{len(qa_set)} questions sampled from QA corpus ({qa_data_set_file})")

100 questions sampled from QA corpus (../../data/qas/squad_qas.tsv)


#### Get Similiarity Scores based on original question

In [9]:

def convert(response) -> pd.DataFrame:
    """Converts retrieved JSON response to Pandas DataFrame"""
    return pd.json_normalize(
        data=response["response"], record_path="context", meta=["question", "execution_time"]
    )

def api_endpoint(**kwargs) -> str:
    """Endpoint for context retrieval (h=false).
    """
    query_params = "&".join([f"{key}={kwargs[key]}" for key in kwargs])
    return f"http://{settings.API_ANSWER_ENDPOINT}/context_retrieval?{query_params}"

# Collect response:
exp_df = generate_experiment_dataset(qa_set, convert, api_endpoint)

# Store dataframe:
exp_df.to_pickle( f"{experiment_folder}/no_hyde.pkl" )
exp_df[0:1]


Processing 1 of 100 question/answer pairs.
Processing 2 of 100 question/answer pairs.
Processing 3 of 100 question/answer pairs.
Processing 4 of 100 question/answer pairs.
Processing 5 of 100 question/answer pairs.
Processing 6 of 100 question/answer pairs.
Processing 7 of 100 question/answer pairs.
Processing 8 of 100 question/answer pairs.
Processing 9 of 100 question/answer pairs.
Processing 10 of 100 question/answer pairs.
Processing 11 of 100 question/answer pairs.
Processing 12 of 100 question/answer pairs.
Processing 13 of 100 question/answer pairs.
Processing 14 of 100 question/answer pairs.
Processing 15 of 100 question/answer pairs.
Processing 16 of 100 question/answer pairs.
Processing 17 of 100 question/answer pairs.
Processing 18 of 100 question/answer pairs.
Processing 19 of 100 question/answer pairs.
Processing 20 of 100 question/answer pairs.
Processing 21 of 100 question/answer pairs.
Processing 22 of 100 question/answer pairs.
Processing 23 of 100 question/answer pair

Unnamed: 0,id,page_content,type,metadata.source,metadata.similarity_score,question,execution_time
0,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.226851,What ethnic neighborhood in Fresno had primari...,00:00:00


#### Get Similiarity Scores based on HYDE question (prompt=0)

In [10]:
def hyde_convert(response) -> pd.DataFrame:
    """Converts retrieved JSON response to Pandas DataFrame"""
    return pd.json_normalize(
        data=response["response"],
        record_path="context",
        meta=["original_question", "question", "execution_time"],
    )

def hyde_api_endpoint(**kwargs) -> str:
    """Endpoint for context retrieval w/ hyde (h=true,p=0)."""
    query_params = "&".join([f"{key}={kwargs[key]}" for key in kwargs])
    return (
        f"http://{settings.API_ANSWER_ENDPOINT}/context_retrieval?h=True&{query_params}&p=0"
    )

# Collect response:
exp_df = generate_experiment_dataset(qa_set, hyde_convert, hyde_api_endpoint)

# Store dataframe:
exp_df.to_pickle( f"{experiment_folder}/hyde_p0_retrieval.pkl" )
exp_df[0:1]

Processing 1 of 100 question/answer pairs.
Processing 2 of 100 question/answer pairs.
Processing 3 of 100 question/answer pairs.
Processing 4 of 100 question/answer pairs.
Processing 5 of 100 question/answer pairs.
Processing 6 of 100 question/answer pairs.
Processing 7 of 100 question/answer pairs.
Processing 8 of 100 question/answer pairs.
Processing 9 of 100 question/answer pairs.
Processing 10 of 100 question/answer pairs.
Processing 11 of 100 question/answer pairs.
Processing 12 of 100 question/answer pairs.
Processing 13 of 100 question/answer pairs.
Processing 14 of 100 question/answer pairs.
Processing 15 of 100 question/answer pairs.
Processing 16 of 100 question/answer pairs.
Processing 17 of 100 question/answer pairs.
Processing 18 of 100 question/answer pairs.
Processing 19 of 100 question/answer pairs.
Processing 20 of 100 question/answer pairs.
Processing 21 of 100 question/answer pairs.
Processing 22 of 100 question/answer pairs.
Processing 23 of 100 question/answer pair

Unnamed: 0,id,page_content,type,metadata.source,metadata.similarity_score,original_question,question,execution_time
0,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.269003,What ethnic neighborhood in Fresno had primari...,The Japantown district in Fresno was a signifi...,00:00:01


#### Get Similiarity Scores based on HYDE question (prompt=1)

In [11]:
def hyde_convert(response) -> pd.DataFrame:
    """Converts retrieved JSON response to Pandas DataFrame"""
    return pd.json_normalize(
        data=response["response"],
        record_path="context",
        meta=["original_question", "question", "execution_time"],
    )

def hyde_api_endpoint(**kwargs) -> str:
    """Endpoint for context retrieval w/ hyde (h=true,p=0)."""
    query_params = "&".join([f"{key}={kwargs[key]}" for key in kwargs])
    return (
        f"http://{settings.API_ANSWER_ENDPOINT}/context_retrieval?h=True&{query_params}&p=1"
    )

# Collect response:
exp_df = generate_experiment_dataset(qa_set, hyde_convert, hyde_api_endpoint)

# Store dataframe:
exp_df.to_pickle( f"{experiment_folder}/hyde_p1_retrieval.pkl" )
exp_df[0:1]

Processing 1 of 100 question/answer pairs.
Processing 2 of 100 question/answer pairs.
Processing 3 of 100 question/answer pairs.
Processing 4 of 100 question/answer pairs.
Processing 5 of 100 question/answer pairs.
Processing 6 of 100 question/answer pairs.
Processing 7 of 100 question/answer pairs.
Processing 8 of 100 question/answer pairs.
Processing 9 of 100 question/answer pairs.
Processing 10 of 100 question/answer pairs.
Processing 11 of 100 question/answer pairs.
Processing 12 of 100 question/answer pairs.
Processing 13 of 100 question/answer pairs.
Processing 14 of 100 question/answer pairs.
Processing 15 of 100 question/answer pairs.
Processing 16 of 100 question/answer pairs.
Processing 17 of 100 question/answer pairs.
Processing 18 of 100 question/answer pairs.
Processing 19 of 100 question/answer pairs.
Processing 20 of 100 question/answer pairs.
Processing 21 of 100 question/answer pairs.
Processing 22 of 100 question/answer pairs.
Processing 23 of 100 question/answer pair

Unnamed: 0,id,page_content,type,metadata.source,metadata.similarity_score,original_question,question,execution_time
0,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.297337,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02


#### Load and merge Experiment Datasets for comparison

In [4]:
# Load experiment results:
context_retr_df = pd.read_pickle(f"{experiment_folder}/no_hyde.pkl")
hyde_retr_p0_df = pd.read_pickle(f"{experiment_folder}/hyde_p0_retrieval.pkl")
hyde_retr_p1_df = pd.read_pickle(f"{experiment_folder}/hyde_p1_retrieval.pkl")


In [14]:
# Concatenate datasets together for comparison:

context_retr_df["q_index"] = context_retr_df["question"]
context_retr_df["q_rank"] = context_retr_df.groupby(["q_index"]).cumcount()+1
context_retr_df.reset_index(drop=True)
hyde_retr_p0_df["q_index"] = hyde_retr_p0_df["original_question"]
hyde_retr_p0_df["q_rank"] = hyde_retr_p0_df.groupby(["q_index"]).cumcount()+1
hyde_retr_p0_df.reset_index(drop=True)
hyde_retr_p1_df["q_index"] = hyde_retr_p1_df["original_question"]
hyde_retr_p1_df["q_rank"] = hyde_retr_p1_df.groupby(["q_index"]).cumcount()+1
hyde_retr_p1_df.reset_index(drop=True)

combined_df = pd.merge( context_retr_df,hyde_retr_p0_df, on=["q_index", "q_rank"], suffixes=["", "_hyde_p0"])
combined_df = pd.merge( combined_df,hyde_retr_p1_df, on=["q_index", "q_rank"], suffixes=["", "_hyde_p1"])

combined_df[0:2]

Unnamed: 0,id,page_content,type,metadata.source,metadata.similarity_score,question,execution_time,q_index,q_rank,id_hyde_p0,...,question_hyde_p0,execution_time_hyde_p0,id_hyde_p1,page_content_hyde_p1,type_hyde_p1,metadata.source_hyde_p1,metadata.similarity_score_hyde_p1,original_question_hyde_p1,question_hyde_p1,execution_time_hyde_p1
0,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.226851,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,1,,...,The Japantown district in Fresno was a signifi...,00:00:01,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.297337,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02
1,,"The ""West Side"" of Fresno, also often called ""...",Document,/data/contexts/context_437.context,0.370264,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,2,,...,The Japantown district in Fresno was a signifi...,00:00:01,,"The ""West Side"" of Fresno, also often called ""...",Document,/data/contexts/context_437.context,0.498287,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02


In [15]:
# Compare if the hyde p0 similarity is better than p1 (lower is better):
combined_df["similarity_diff"] = combined_df["metadata.similarity_score_hyde_p0"] - combined_df["metadata.similarity_score_hyde_p1"]

# Compare if hyde p0 response time is better than p1 (lower is better):
combined_df["response_diff"] = pd.to_timedelta(combined_df["execution_time_hyde_p0"]).dt.total_seconds() - pd.to_timedelta(combined_df["execution_time_hyde_p1"]).dt.total_seconds()

combined_df[0:1]

Unnamed: 0,id,page_content,type,metadata.source,metadata.similarity_score,question,execution_time,q_index,q_rank,id_hyde_p0,...,id_hyde_p1,page_content_hyde_p1,type_hyde_p1,metadata.source_hyde_p1,metadata.similarity_score_hyde_p1,original_question_hyde_p1,question_hyde_p1,execution_time_hyde_p1,similarity_diff,response_diff
0,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.226851,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,1,,...,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.297337,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02,-0.028335,-1.0


##### Average Response Differences

In [16]:
combined_df["response_diff"].mean()

-0.04

In [17]:
combined_df["similarity_diff"].mean()

-0.0050437915234864085

##### Conclusion
p0 seems to outperform p1 on both response and similarity results.