In [2]:
# !pip install --upgrade langchain langchain-chroma langchain-huggingface langchain-groq


In [3]:
# !pip install --upgrade pydantic

In [5]:
import warnings
from langchain._api import LangChainDeprecationWarning
warnings.simplefilter("ignore", category=LangChainDeprecationWarning) # to ignore langchain's deprecation warning.

**HyDE (Hypothetical Document Embeddings)** is an advanced method used in the context of Retrieval-Augmented Generation (RAG) applications to improve the quality of responses by enhancing the retrieval process. 

## What Is HyDE?
HyDE stands for Hypothetical Document Embeddings. It is an approach that augments the process of retrieving documents for answering a user’s query. Instead of directly using the user’s query to search for relevant documents, HyDE takes an additional step of generating a hypothetical response that serves as a representation of an ideal answer. This hypothetical document is then used as the basis for retrieving real documents from a vector store, which improves the overall quality and relevance of the information fetched.

## How Does HyDE Work?
### User Query and Hypothetical Document Generation:

1. When a user submits a query, HyDE first generates a hypothetical document that might be an ideal response to the question (using an LLM).
2. Making the hypo doc is handled by the ***hyde_chain***.
### Embedding and Retrieval:

1. The generated hypothetical document is then converted into a vector embedding—a numerical representation of the document in a high-dimensional space that captures its semantic meaning.
2. The ***retriever*** then uses this embedding to find similar documents stored in the vector store (Chroma here). These are real documents that are semantically similar to the generated hypothetical document.
3. This process helps find documents that match the user's intent, even if the query itself is not very descriptive.
### Combining Retrieved Context with the Query:

1. After retrieving the most relevant documents, HyDE combines these documents with the original query using a prompt template to provide context for the final generation.
2. The LLM then takes this combined context and the user’s original question to generate the final response, which is intended to be more accurate and comprehensive.

In [26]:
from tqdm import tqdm, trange

from langchain_chroma import Chroma  # a vector database, like- FAISS, pinecone etc
from langchain_huggingface import HuggingFaceEmbeddings  
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts.prompt import PromptTemplate
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel
from langchain_core.runnables import RunnableParallel # explained where used
from langchain_groq import ChatGroq

import os  # for loading the groq api token which is stored in a .env file
from dotenv import load_dotenv

In [7]:
load_dotenv("groq_api_token.env")
secret_key = os.getenv('GROQ_API_TOKEN')

**Whats a Vector Store?**

it stores the embeddings of text...simple :)

**Whats Embeddings?**

computers understand numbers, not words. So an embedding model is just all the words in a language and their vector(1D array) representations. U dont need to make it urself. many such models exist and u just use them.

Pass the words/text/document into the model and u'll have the vector rep of them.

In [9]:
vector_store = Chroma(
        collection_name="rag-chroma",
        embedding_function=HuggingFaceEmbeddings(), # the embedding model we'll use. U can use others like openaiembbedings() too.
        persist_directory='./abcdef2',  # saves the vector_store data locally, remove if not necessary 
        )

# ignore the warnings

  from tqdm.autonotebook import tqdm, trange





In [10]:
pdf_path_1 = "C:/Users/Lenovo/Downloads/transformers_2.pdf"
pdf_path_2 = "C:/Users/Lenovo/Downloads/unsupervisedL_forRAG.pdf"
pdf_path_3 = "C:/Users/Lenovo/Downloads/NIPS-2017-attention-is-all-you-need-Paper.pdf"
# pdf_path_4 = "..."
# pdf_path_5 = "..."

doc_paths=[pdf_path_1,pdf_path_2, pdf_path_3]

In [11]:
from langchain_community.document_loaders import PyPDFLoader # for loading the text of a pdf doc

documents = []  # intialize an empty list

for docs in doc_paths:
    loader = PyPDFLoader(docs)
    pages = loader.load_and_split() # chunking(dividing) a pdf into pages
    for page in pages:
        documents.append(page) # storing the chunked documents here.


In [12]:
documents[0] # just to check if the documents are chunked properly.

Document(metadata={'source': 'C:/Users/Lenovo/Downloads/transformers_2.pdf', 'page': 0}, page_content='An Introduction to Transformers\nRichard E. Turner\nDepartment of Engineering, University of Cambridge, UK\nMicrosoft Research, Cambridge, UK\nret26@cam.ac.uk\nAbstract. The transformer is a neural network component that can be used to learn useful represen-\ntations of sequences or sets of data-points [Vaswani et al., 2017]. The transformer has driven recent\nadvances in natural language processing [Devlin et al., 2019], computer vision [Dosovitskiy et al., 2021],\nand spatio-temporal modelling [Bi et al., 2022]. There are many introductions to transformers, but most\ndo not contain precise mathematical descriptions of the architecture and the intuitions behind the design\nchoices are often also missing.1Moreover, as research takes a winding path, the explanations for the\ncomponents of the transformer can be idiosyncratic. In this note we aim for a mathematically precise,\nintuitive

**UUID - Universally Unique Identifier**
*uuid4()* ensures that each document added to the vector store has a unique, random ID, making it easier to manage and retrieve the documents without ID conflicts.

In [14]:
from uuid import uuid4
from langchain_core.documents import Document

uuids = [str(uuid4()) for _ in range(len(documents))] 
vector_store.add_documents(documents=documents, ids=uuids) # generated UUIDs are passed to assign a unique id to each document.

['22ba40e9-13e2-43f0-8c5a-8b3cfc51f273',
 '7fc831b5-156a-4847-9be4-28e3aac15cba',
 'b0268eb4-9c78-41ff-af18-ee683119a2cd',
 '32b3f4ca-40b4-48ce-b078-dfcbe5b72461',
 '842b93ee-3f8c-4fa8-abdc-3f58c23d9208',
 'c7872330-60b9-4037-8976-7feb2a9dc5e6',
 '2026c305-e939-4800-ac02-79fd943d1651',
 '6d93cc0d-fe40-497d-bc7c-5b43c047316a',
 '312feebd-96bd-49b6-a48c-e4544bc858f7',
 'c9e0d4e1-f59c-499e-b5ee-85a217f4d74d',
 '0e801212-866a-4b17-a9c5-39e09f588a77',
 '7f616b07-34e1-4b62-a027-374bd1770e7e',
 '06d6a087-7314-4481-a9a4-6d6b8358409b',
 '79905460-001a-4cf1-b3a4-69f40f0a98ff',
 'cefb95d7-434c-4052-ae4e-14cd60f350cf',
 '1f5eca0c-0ea2-4e11-a5f9-1c376cbd5cec',
 'cf1187fb-7a58-48df-80d0-bcfe14257a31',
 '820bbac2-25cb-400b-85b8-b9bcc0968e19',
 '59254932-c76c-47c3-bf04-0bdc3885b17e',
 '3ad75a6f-ace7-4f73-bcc4-0165c1a3110e',
 'c38de721-e5d9-460b-ac1f-dae5e5201c2e',
 'f6bf0fad-09d7-483e-ad26-e608655478dc',
 'bd55affb-b6f7-4b24-a12d-f96ae0aae308',
 'c8714188-ca2a-44d9-be9a-2491186eef79',
 'a6a2b2ba-1b3c-

**What is a Retriever?**

A retriever is a component that helps find relevant documents from a collection based on a given query. The retriever is used to search the vector store (which stores documents as vector embeddings) to find and return the documents that are most similar or relevant to the passed query.

In [16]:
retriever = vector_store.as_retriever()

In [17]:
# RAG prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [18]:
# LLM
model = ChatGroq(temperature=0, groq_api_key=secret_key, model_name="llama3-70b-8192")

In [28]:
# Query transformation chain
# This prompts the llm to make the hypothetical document
hyde_template = """Please write a passage to answer the question 
Question: {question}
Passage:"""

hyde_prompt = PromptTemplate.from_template(hyde_template)
hyde_chain = hyde_prompt | model | StrOutputParser()

**What is RunnableParallel()**

It is a utility from the Langchain library that allows you to perform multiple actions or operations(runnables) simultaneously, combining their results into one structured output.

In [31]:
# RAG chain
chain = (
    RunnableParallel(
        {
            # Generate a hypothetical document and then pass it to the retriever
            "context": hyde_chain | retriever,  # hyde_chain -> retriver ->(passed as) "CONTEXT"
            "question": lambda x: x["question"],
        }
    )
    | prompt
    | model
    | StrOutputParser()
)



**FLOW :**

question -> chain **[** *hyde_chain **(** hyde_template | model | StrOutputParser() **)** -> retriever -> prompt -> model -> StrOutputParser()* **]** -> answer

In [33]:
# Define the input type for the chain using Pydantic BaseModel
class ChainInput(BaseModel):
    question: str

chain = chain.with_types(input_type=ChainInput)

In [35]:
user_input = ChainInput(question="what are encoder stacks?")

In [37]:
#for deleting the vector store.
# vector_store.delete(ids=uuids)

In [39]:
import langchain # only for langchain.debug if u dont wanna use the debug feature, remove this.

langchain.debug=True # just for visualizing the program flow.
result1 = chain.invoke(user_input.dict())


[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "what are encoder stacks?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question>] Entering Chain run with input:
[0m{
  "question": "what are encoder stacks?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "what are encoder stacks?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > prompt:PromptTemplate] Entering Prompt run with input:
[0m{
  "question": "what are encoder stacks?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > prompt:PromptTemplate] [1ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][0m [1m[

In [40]:
print(result1)

According to the context, the encoder stack is composed of a stack of N=6 identical layers. Each layer has two sub-layers: a multi-head self-attention mechanism and a simple, position-wise fully connected layer.


In [41]:
user_input = ChainInput(question="what is K means algorithm?")
langchain.debug=True # u can turn this off
result2 = chain.invoke(user_input.dict())

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "what is K means algorithm?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question>] Entering Chain run with input:
[0m{
  "question": "what is K means algorithm?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "what is K means algorithm?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > prompt:PromptTemplate] Entering Prompt run with input:
[0m{
  "question": "what is K means algorithm?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > prompt:PromptTemplate] [1ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][

In [43]:
print(result2)

According to the provided context, K-means clustering is a popular clustering technique in unsupervised learning. The "k" in k-means refers to the number of clusters into which the data is divided. The process begins with the algorithm selecting "k" random points as the initial centroids, which serve as the center of each cluster. Each data point is then assigned to the nearest centroid, forming a cluster. After assigning all data points, the centroids are recalculated based on the mean of the points within each cluster. This iterative process continues until the centroids no longer change significantly, indicating that the clusters are stable. K-means clustering is widely used due to its simplicity and efficiency, making it suitable for large datasets. However, it has limitations, including sensitivity to the initial choice of centroids and difficulty in determining the optimal number of clusters, often requiring methods like the elbow method to assess cluster validity.
