### HyDE
 What is HyDe?

HyDE (Hypothetical Document Embeddings) is a retrieval technique where, instead of embedding the user’s query directly, you first generate a hypothetical answer (document) to the query using an LLM — and then embed that hypothetical document to search your vector store.

 HyDE bridges the gap between user intent and relevant content, especially when:

1. Queries are short
2. Language mismatch between query and documents
3.You want to retrieve based on answer content, not question words

In [12]:
from langchain_community.document_loaders import WikipediaLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_classic.chains.hyde.base import HypotheticalDocumentEmbedder
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import PromptTemplate
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS , Chroma
from langchain_groq import ChatGroq


In [4]:
# loading data
loader = WikipediaLoader(query="مریم میرزاخانی", load_max_docs=5)
documents = loader.load()

# text splitting
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 100)
docs = text_splitter.split_documents(documents=documents)
docs


[Document(metadata={'title': 'Maryam Mirzakhani', 'summary': 'Maryam Mirzakhani (Persian: مریم میرزاخانی, pronounced [mæɾˈjæm miːɾzɑːxɑːˈniː]; 12 May 1977 – 14 July 2017) was an Iranian mathematician and a professor of mathematics at Stanford University. Her research topics included Teichmüller theory, hyperbolic geometry, ergodic theory, and symplectic geometry. On 13 August 2014, Mirzakhani was honored with the Fields Medal, the most prestigious award in mathematics, becoming the first woman to win the prize, as well as the first Iranian. The award committee cited her work in "the dynamics and geometry of Riemann surfaces and their moduli spaces". Mirzakhani was considered a leading force in the fields of hyperbolic geometry, topology and dynamics.\nThroughout her career, she achieved milestones that cemented her reputation as one of the greatest mathematicians of her time, such as the "magic wand theorem", which tied together fields such as dynamical systems, geometry, and topology.

In [6]:
import os
from dotenv import load_dotenv
load_dotenv()

groq_llm = ChatGroq(model="llama-3.1-8b-instant",api_key=os.getenv("GROQ_API"),temperature=0.2)

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

vectorstore = Chroma.from_documents(documents = docs,embedding=embedding_model,persist_directory = "output/maryam_mirzakhani.db")

retriever = vectorstore.as_retriever(
    search_type="mmr", # ('similarity', 'similarity_score_threshold', 'mmr')
    search_kwargs={"k":3, "lambda_mult": 0.7})

- web_search
- sci_fact
- arguana
- trec_covid
- fiqa
- dbpedia_entity
- trec_news
- mr_tydi

In [9]:
custom = PromptTemplate.from_template(
    "یک پاسخ کوتاه بساز: {query}"
)

In [10]:
hyde_embedding_function = HypotheticalDocumentEmbedder.from_llm(
    llm=groq_llm,
    base_embeddings=embedding_model,
    # prompt_key="web_search" # list is above this cell
    custom_prompt=custom
)

In [11]:
vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=hyde_embedding_function,
    persist_directory="output/langchain"
)

In [13]:
rag_prompt = PromptTemplate.from_template("""
از اطلاعات زیر استفاده کن و به سوال پاسخ بده

اطلاعات:
{context}

سوال: {input}
""")
rag_chain = create_stuff_documents_chain(llm=groq_llm, prompt=rag_prompt)

In [17]:
def rag_final (query):
    retrieved = vectorstore.similarity_search(query)
    print(f"total hyde : {len(retrieved)}\n docs : \n{retrieved}\n")
    response = rag_chain.invoke({'input':query,'context':retrieved})
    print(response)

In [18]:
rag_final('مریم مجموعا چند جایزه گرفته که خیلی خاص هست')

total hyde : 4
 docs : 
[Document(metadata={'title': 'Maryam Mirzakhani', 'summary': 'Maryam Mirzakhani (Persian: مریم میرزاخانی, pronounced [mæɾˈjæm miːɾzɑːxɑːˈniː]; 12 May 1977 – 14 July 2017) was an Iranian mathematician and a professor of mathematics at Stanford University. Her research topics included Teichmüller theory, hyperbolic geometry, ergodic theory, and symplectic geometry. On 13 August 2014, Mirzakhani was honored with the Fields Medal, the most prestigious award in mathematics, becoming the first woman to win the prize, as well as the first Iranian. The award committee cited her work in "the dynamics and geometry of Riemann surfaces and their moduli spaces". Mirzakhani was considered a leading force in the fields of hyperbolic geometry, topology and dynamics.\nThroughout her career, she achieved milestones that cemented her reputation as one of the greatest mathematicians of her time, such as the "magic wand theorem", which tied together fields such as dynamical systems,