# **Introduction**
Hybrid Dynamic Embedding RAG (HyDE) is an advanced variant of Retrieval-Augmented Generation (RAG) that combines hybrid retrieval methods (dense and sparse retrieval) with dynamic embedding generation to improve document retrieval and response generation.

Unlike traditional RAG, which relies solely on vector-based retrieval, HyDE leverages multiple retrieval strategies and dynamic embedding transformations to optimize search results, making it particularly effective for long-tail queries, multi-turn dialogues, and low-resource domains.

# **Concepts of HyDE**
**HyDE builds upon three core principles:**


## **1. Hybrid Retrieval: Combining Dense and Sparse Methods**
* HyDE integrates:Dense Retrieval (Vector Search): Uses embeddings from models like FAISS, OpenAI embeddings, or BERT-based embeddings to retrieve semantically similar documents.
* Sparse Retrieval (BM25, Keyword Search): Uses traditional term frequency (TF-IDF, BM25) methods to capture keyword-based relevance.

## **2. Dynamic Embedding Generation**

* Instead of relying on static embeddings, HyDE dynamically generates query-specific embeddings based on synthetic document generation:

  * The LLM first hallucinates a potential response based on the query.
  * This response is embedded and used as a query to retrieve more relevant documents.
  * The retrieved documents refine the final response.

# **3. Self-Improving Iterative Retrieval**
* HyDE iteratively re-evaluates the retrieved documents to improve query resolution.
* Multiple retrieval passes refine and expand the knowledge base.


# **Applications of HyDE**
1. Legal & Research Applications
  * Improves retrieval of complex case laws and research articles by combining semantic understanding and keyword-based matching.

2. Enterprise Search Systems
  * Enhances internal knowledge base searches, reducing irrelevant document retrieval.

3. Medical and Scientific Literature
  * Retrieves highly relevant documents using dynamic embeddings that adapt to medical terminology.

4. Open-Domain Q&A Systems
  * Effective in multi-turn dialogues, refining responses dynamically.

5. AI-Powered Code Search
  * Enhances API documentation retrieval by matching queries with sparse keywords and semantic embeddings.

# **Implementation**

In [12]:
!pip install -qU langchain langchain-openai langchain-community faiss-cpu rank_bm25 hashlib

[31mERROR: Ignored the following yanked versions: 20081119[0m[31m
[0m[31mERROR: Could not find a version that satisfies the requirement hashlib (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for hashlib[0m[31m
[0m

In [2]:
import openai
from google.colab import userdata
import os


openai_api= userdata.get("OPENAI_API_KEY")

In [13]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from typing import List, Dict
import hashlib

In [4]:
# 1. Initialize components
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0,openai_api_key=openai_api)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small",openai_api_key=openai_api)

In [14]:
# Generate documents with unique IDs
raw_documents = [
    "The French Revolution began in 1789 with the storming of the Bastille.",
    "Louis XVI was executed in 1793 during the Reign of Terror.",
    "The Revolution led to the rise of Napoleon Bonaparte in 1799."
]

# Create documents with hashed IDs
documents = [
    Document(
        page_content=content,
        metadata={"id": hashlib.md5(content.encode()).hexdigest()}
    ) for content in raw_documents
]

# Preprocess documents with chunk-aware metadata
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = text_splitter.split_documents(documents)

In [15]:
# Add chunk IDs to split documents
for idx, doc in enumerate(split_docs):
    doc.metadata["chunk_id"] = f"{doc.metadata['id']}_{idx}"

# 2. Fixed HybridRetriever implementation
class HybridRetriever:
    def __init__(self, sparse_retriever, dense_retriever):
        self.sparse_retriever = sparse_retriever
        self.dense_retriever = dense_retriever

    def invoke(self, query: str) -> List[Document]:
        sparse_docs = self.sparse_retriever.invoke(query)
        dense_docs = self.dense_retriever.invoke(query)
        return self._merge_results(sparse_docs, dense_docs)

    def _merge_results(self, sparse: List[Document], dense: List[Document]) -> List[Document]:
        all_docs = sparse + dense
        seen = set()
        return [
            doc for doc in all_docs
            if not (doc.metadata["chunk_id"] in seen or seen.add(doc.metadata["chunk_id"]))
        ]

# Create retrievers with chunk IDs
bm25_retriever = BM25Retriever.from_documents(split_docs)
bm25_retriever.k = 3

vectorstore = FAISS.from_documents(split_docs, embeddings)
faiss_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Initialize hybrid retriever
hybrid_retriever = HybridRetriever(bm25_retriever, faiss_retriever)

In [16]:
# 3. HyDE implementation
hyde_prompt = ChatPromptTemplate.from_template(
    """Generate a hypothetical answer to the following query, even if you don't know the answer.
    Include relevant entities, dates, and concepts that would be found in authoritative documents.

    Query: {query}
    Hypothetical Answer:"""
)

# Define the HyDE workflow
hyde_chain = (
    RunnablePassthrough.assign(
        original_query=lambda x: x["query"]
    )
    | {
        "query": hyde_prompt | llm | StrOutputParser(),
        "original_query": RunnablePassthrough()
    }
    | RunnableLambda(lambda x: {
        "hyde_query": x["query"],
        "original_query": x["original_query"]
    })
)

# 4. Full HyDE RAG pipeline
final_prompt = ChatPromptTemplate.from_messages([
    ("system", """Answer the user's question using both the original query and retrieved documents.
    Original Question: {original_query}
    Retrieved Context: {context}"""),
    ("human", "Answer this question: {original_query}")
])

full_hyde_chain = (
    hyde_chain
    | {
        "context": lambda x: hybrid_retriever.invoke(x["hyde_query"]),
        "original_query": lambda x: x["original_query"]
    }
    | {
        "context": lambda x: "\n\n".join([doc.page_content for doc in x["context"]]),
        "original_query": lambda x: x["original_query"]
    }
    | final_prompt
    | llm
    | StrOutputParser()
)

In [17]:
# 5. Test the implementation
response = full_hyde_chain.invoke({"query": "What were the main causes and consequences of the French Revolution?"})
print("HyDE Response:\n", response)

HyDE Response:
 The main causes of the French Revolution included social inequality, economic hardship, and political discontent. The revolution began in 1789 with the storming of the Bastille and led to significant consequences such as the execution of Louis XVI in 1793 during the Reign of Terror. Additionally, the Revolution ultimately resulted in the rise of Napoleon Bonaparte in 1799, who played a significant role in shaping the future of France and Europe.


## **Key Features:**
**Hybrid Retrieval:**

* Combines BM25 (sparse) and FAISS (dense) retrieval methods

* Merges results while removing duplicates using document IDs


**Hypothetical Document Embedding (HyDE):**

* Generates hypothetical answers using GPT-3.5-turbo

* Uses these hypothetical answers as queries for retrieval

* Maintains original query context for final answer generation