### Experiment:  Performance of HYDE prompts on context retrieval distance & response time

**Background:**
HYDE (Hypothetical Document Embedding) is an approach that has been indicated to improve the relevancy and comprehensiveness of RAG context retrieval by expanding a provided query prompt into a hypothetical document that can be used for context retrieval.  Different formulations of prompts may impact LLM response time and resulting embeddings.

**Test Approach**
A sample of questions will be selected from QA corpus.  Documents will be retrieved using 2 different prompts.  REesults from prompts 1, 2, and no-prompt will be compared via cosine similarity distance.


In [1]:
# Common import
from deh.assessment import QASetRetriever
from deh import settings
from deh.eval import generate_experiment_dataset

import pandas as pd
import os
from pathlib import Path

#### Test Configuration

In [2]:
num_samples:int = 100
experiment_folder:str = "../../data/evaluations/hyde-prompt-experiment/"
qa_data_set_file:str = "../../data/qas/squad_qas.tsv"

# Create experiment folder:
if not os.path.exists(experiment_folder):
    Path(experiment_folder).mkdir(parents=True, exist_ok=True)


#### Sample QA dataset

In [1]:
qa_set = QASetRetriever.get_qasets(
    file_path = qa_data_set_file,
    sample_size= num_samples
)

print(f"{len(qa_set)} questions sampled from QA corpus ({qa_data_set_file})")

NameError: name 'QASetRetriever' is not defined

#### Get Similiarity Scores based on original question

In [None]:

def convert(response) -> pd.DataFrame:
    """Converts retrieved JSON response to Pandas DataFrame"""
    return pd.json_normalize(
        data=response["response"], record_path="context", meta=["question", "execution_time"]
    )

def api_endpoint(**kwargs) -> str:
    """Endpoint for context retrieval (h=false).
    """
    query_params = "&".join([f"{key}={kwargs[key]}" for key in kwargs])
    return f"http://{settings.API_ANSWER_ENDPOINT}/context_retrieval?{query_params}"

# Collect response:
exp_df = generate_experiment_dataset(qa_set, convert, api_endpoint)

# Store dataframe:
exp_df.to_pickle( f"{experiment_folder}/no_hyde.pkl" )
exp_df[0:1]


#### Get Similiarity Scores based on HYDE question (prompt=0)

In [None]:
def hyde_convert(response) -> pd.DataFrame:
    """Converts retrieved JSON response to Pandas DataFrame"""
    return pd.json_normalize(
        data=response["response"],
        record_path="context",
        meta=["original_question", "question", "execution_time"],
    )

def hyde_api_endpoint(**kwargs) -> str:
    """Endpoint for context retrieval w/ hyde (h=true,p=0)."""
    query_params = "&".join([f"{key}={kwargs[key]}" for key in kwargs])
    return (
        f"http://{settings.API_ANSWER_ENDPOINT}/context_retrieval?h=True&{query_params}&p=0"
    )

# Collect response:
exp_df = generate_experiment_dataset(qa_set, hyde_convert, hyde_api_endpoint)

# Store dataframe:
exp_df.to_pickle( f"{experiment_folder}/hyde_p0_retrieval.pkl" )
exp_df[0:1]

#### Get Similiarity Scores based on HYDE question (prompt=1)

In [None]:
def hyde_convert(response) -> pd.DataFrame:
    """Converts retrieved JSON response to Pandas DataFrame"""
    return pd.json_normalize(
        data=response["response"],
        record_path="context",
        meta=["original_question", "question", "execution_time"],
    )

def hyde_api_endpoint(**kwargs) -> str:
    """Endpoint for context retrieval w/ hyde (h=true,p=0)."""
    query_params = "&".join([f"{key}={kwargs[key]}" for key in kwargs])
    return (
        f"http://{settings.API_ANSWER_ENDPOINT}/context_retrieval?h=True&{query_params}&p=1"
    )

# Collect response:
exp_df = generate_experiment_dataset(qa_set, hyde_convert, hyde_api_endpoint)

# Store dataframe:
exp_df.to_pickle( f"{experiment_folder}/hyde_p1_retrieval.pkl" )
exp_df[0:1]

#### Load and merge Experiment Datasets for comparison

In [3]:
# Load experiment results:
context_retr_df = pd.read_pickle(f"{experiment_folder}no_hyde.pkl")
hyde_retr_p0_df = pd.read_pickle(f"{experiment_folder}hyde_p0_retrieval.pkl")
hyde_retr_p1_df = pd.read_pickle(f"{experiment_folder}hyde_p1_retrieval.pkl")


In [4]:
# Concatenate datasets together for comparison:

context_retr_df["q_index"] = context_retr_df["question"]
context_retr_df["q_rank"] = context_retr_df.groupby(["q_index"]).cumcount()+1
context_retr_df.reset_index(drop=True)
hyde_retr_p0_df["q_index"] = hyde_retr_p0_df["original_question"]
hyde_retr_p0_df["q_rank"] = hyde_retr_p0_df.groupby(["q_index"]).cumcount()+1
hyde_retr_p0_df.reset_index(drop=True)
hyde_retr_p1_df["q_index"] = hyde_retr_p1_df["original_question"]
hyde_retr_p1_df["q_rank"] = hyde_retr_p1_df.groupby(["q_index"]).cumcount()+1
hyde_retr_p1_df.reset_index(drop=True)

combined_df = pd.merge( context_retr_df,hyde_retr_p0_df, on=["q_index", "q_rank"], suffixes=["", "_hyde_p0"])
combined_df = pd.merge( combined_df,hyde_retr_p1_df, on=["q_index", "q_rank"], suffixes=["", "_hyde_p1"])

combined_df[0:2]

Unnamed: 0,id,page_content,type,metadata.source,metadata.similarity_score,question,execution_time,q_index,q_rank,id_hyde_p0,...,question_hyde_p0,execution_time_hyde_p0,id_hyde_p1,page_content_hyde_p1,type_hyde_p1,metadata.source_hyde_p1,metadata.similarity_score_hyde_p1,original_question_hyde_p1,question_hyde_p1,execution_time_hyde_p1
0,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.226851,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,1,,...,The Japantown district in Fresno was a signifi...,00:00:01,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.297337,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02
1,,"The ""West Side"" of Fresno, also often called ""...",Document,/data/contexts/context_437.context,0.370264,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,2,,...,The Japantown district in Fresno was a signifi...,00:00:01,,"The ""West Side"" of Fresno, also often called ""...",Document,/data/contexts/context_437.context,0.498287,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02


In [7]:
# Compare if the hyde p0 similarity is better than p1 (lower is better):
combined_df["similarity_diff"] = combined_df["metadata.similarity_score_hyde_p0"] - combined_df["metadata.similarity_score_hyde_p1"]

# Compare if hyde p0 response time is better than p1 (lower is better):
combined_df["response_diff"] = pd.to_timedelta(combined_df["execution_time_hyde_p0"]).dt.total_seconds() - pd.to_timedelta(combined_df["execution_time_hyde_p1"]).dt.total_seconds()

combined_df[0:10]

Unnamed: 0,id,page_content,type,metadata.source,metadata.similarity_score,question,execution_time,q_index,q_rank,id_hyde_p0,...,id_hyde_p1,page_content_hyde_p1,type_hyde_p1,metadata.source_hyde_p1,metadata.similarity_score_hyde_p1,original_question_hyde_p1,question_hyde_p1,execution_time_hyde_p1,similarity_diff,response_diff
0,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.226851,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,1,,...,,"Before World War II, Fresno had many ethnic ne...",Document,/data/contexts/context_425.context,0.297337,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02,-0.028335,-1.0
1,,"The ""West Side"" of Fresno, also often called ""...",Document,/data/contexts/context_437.context,0.370264,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,2,,...,,"The ""West Side"" of Fresno, also often called ""...",Document,/data/contexts/context_437.context,0.498287,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02,-0.074506,-1.0
2,,The 2010 United States Census reported that Fr...,Document,/data/contexts/context_444.context,0.432765,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,3,,...,,While many homes in the neighborhood date back...,Document,/data/contexts/context_439.context,0.523486,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02,-0.063884,-1.0
3,,While many homes in the neighborhood date back...,Document,/data/contexts/context_439.context,0.479242,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,4,,...,,The 2010 United States Census reported that Fr...,Document,/data/contexts/context_444.context,0.557562,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02,-0.05377,-1.0
4,,"Fresno (/ˈfrɛznoʊ/ FREZ-noh), the county seat ...",Document,/data/contexts/context_423.context,0.501168,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,5,,...,,"The neighborhood includes Kearney Boulevard, n...",Document,/data/contexts/context_438.context,0.590314,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02,-0.060419,-1.0
5,,"As of the census of 2000, there were 427,652 p...",Document,/data/contexts/context_446.context,0.513339,What ethnic neighborhood in Fresno had primari...,00:00:00,What ethnic neighborhood in Fresno had primari...,6,,...,,"Between the 1880s and World War II, Downtown F...",Document,/data/contexts/context_429.context,0.591934,What ethnic neighborhood in Fresno had primari...,The historic Japantown district in downtown Fr...,00:00:02,-0.0561,-1.0
6,,One key figure in the plans for what would com...,Document,/data/contexts/context_1062.context,0.579667,"What was Isiah Bowman nick name, as known by t...",00:00:00,"What was Isiah Bowman nick name, as known by t...",1,,...,,One key figure in the plans for what would com...,Document,/data/contexts/context_1062.context,0.396352,"What was Isiah Bowman nick name, as known by t...","Isaiah Bowman, a renowned American geographer ...",00:00:01,-0.064123,0.0
7,,Other prominent alumni include anthropologists...,Document,/data/contexts/context_750.context,0.682618,"What was Isiah Bowman nick name, as known by t...",00:00:00,"What was Isiah Bowman nick name, as known by t...",2,,...,,Other prominent alumni include anthropologists...,Document,/data/contexts/context_750.context,0.592633,"What was Isiah Bowman nick name, as known by t...","Isaiah Bowman, a renowned American geographer ...",00:00:01,-0.056352,0.0
8,,Other: Civil rights leader W. E. B. Du Bois; p...,Document,/data/contexts/context_650.context,0.700925,"What was Isiah Bowman nick name, as known by t...",00:00:00,"What was Isiah Bowman nick name, as known by t...",3,,...,,The Royal Geographical Society of London and o...,Document,/data/contexts/context_1034.context,0.596248,"What was Isiah Bowman nick name, as known by t...","Isaiah Bowman, a renowned American geographer ...",00:00:01,-0.054344,0.0
9,,"In business, notable alumni include Microsoft ...",Document,/data/contexts/context_744.context,0.72776,"What was Isiah Bowman nick name, as known by t...",00:00:00,"What was Isiah Bowman nick name, as known by t...",4,,...,,"Orientalism, as theorized by Edward Said, refe...",Document,/data/contexts/context_1037.context,0.640191,"What was Isiah Bowman nick name, as known by t...","Isaiah Bowman, a renowned American geographer ...",00:00:01,-0.057987,0.0


##### Average Response Differences

In [6]:
combined_df["response_diff"].mean()

-0.04

In [9]:
combined_df.describe()

Unnamed: 0,metadata.similarity_score,q_rank,metadata.similarity_score_hyde_p0,metadata.similarity_score_hyde_p1,similarity_diff,response_diff
count,600.0,600.0,600.0,600.0,600.0,600.0
mean,0.502589,3.5,0.460791,0.465835,-0.005044,-0.04
std,0.117522,1.70925,0.13438,0.138422,0.062293,0.615656
min,0.226851,1.0,0.129452,0.110092,-0.226766,-2.0
25%,0.421606,2.0,0.364758,0.360483,-0.03844,0.0
50%,0.501965,3.5,0.457766,0.478531,-0.001102,0.0
75%,0.581183,5.0,0.555248,0.572856,0.029025,0.0
max,0.788057,6.0,0.801517,0.788275,0.225366,1.0


In [8]:
combined_df["similarity_diff"].mean()

-0.0050437915234864085

##### Conclusion
p0 seems to outperform p1 on both response and similarity results.