
# <span style="color:green;">RAG with Langchain and Llama</span>
<hr style="border:2px solid black">

##  <span style="color:orange;">Learning Objectives</span>

<hr style="border:2px solid black">


By the end of this lesson, students will:

- Understand Retrieval-Augmented Generation (RAG) and its applications.

- Learn how to store, retrieve embeddings using FAISS.

- Implement retrievers and chains for querying document databases.

- Develop an interactive chat system for PDF documents using LangChain.

- Learn best practices for embedding models and retrieval systems.

##  <span style="color:orange;">1. Introduction to [RAG](https://weaviate.io/blog/introduction-to-rag#:~:text=RAG%20is%20a%20multi%2Dstep,prompt%2C%20and%20generates%20a%20response.) (Retrieval-Augmented Generation)</span>





**Limitations of LLMs :**
- Know nothing outside training data (e.g., up-to-date information, classified/private data).
- Not specialized in specific use cases.
- Tend to hallucinate confidently, possibly leading to misinformation.
- Produce black box output: do not clarify what has led to the generation of particular content.




**What is RAG?**
- Combines retrieval-based and generative AI techniques.
- Enhances LLMs with external knowledge retrieval to improve accuracy and reduce hallucinations.

**Fine-Tuning vs. RAG :**

***Fine-Tuning***
- Enhances model performance for specific use cases via Transfer Learning.
- Changes model parameters, enhancing speed and reducing cost for specific tasks.
- Useful for static datasets (e.g., specialized industry terminology).
- Limitations: Cannot provide up-to-date information.


***Retrieval Augmented Generation (RAG)***
- Increases model capabilities through:
  - **Retrieving** external, up-to-date information.
  - **Augmenting** the original prompt with retrieved data.
  - **Generating** a response based on both the prompt and retrieved information.
- No need for Transfer Learning (LLM parameters remain unchanged).
- Provides a white box output (transparency, fewer hallucinations).
- Ideal for real-time, dynamic knowledge retrieval.


**Key Components of RAG :**
- **Embedding Models:** Convert text into numerical vectors for similarity search.
- **VectorStore and Vector Databases** (FAISS, ChromaDB, Pinecone, Weaviate).
- **Retriever:** Fetches relevant documents.
- **LLM (Generator):** Generates responses using retrieved documents.

<hr style="border:2px solid black">

##  <span style="color:orange;">2. Use Cases of RAG</span>


- <span style="color:orange;">Enterprise Document Search (Retrieving company policies, research papers).</span>
- Chatbots with Domain-Specific Knowledge (Customer support, legal, medical).
- Coding Assistants (Fetching relevant code snippets from documentation).
- Financial Reports Analysis (Summarizing earnings reports, news articles).
- E-Learning and Research Assistants.


<hr style="border:2px solid black">

##  <span style="color:orange;">3. Warm Up </span>

In [1]:
from dotenv import load_dotenv
import warnings
from langchain_groq import ChatGroq
from langchain.prompts.prompt import PromptTemplate

  from .autonotebook import tqdm as notebook_tqdm


#### Load credentials

In [2]:
load_dotenv()

True

#### Defining the LLM (Using Groq)

In [3]:
warnings.filterwarnings("ignore")

llm = ChatGroq(
    model="llama-3.1-8b-instant", #"llama3-8b-8192",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
)

#### Define promt template

**What is a Prompt?**
>- A set of instructions or input for an LLM provided by a user to guide its response
>- Helps the model understand context and generate relevant content and coherent language-based output

In [4]:
query = """
    given the information {information} about a person I want you to create:
    1. A short summary
    2. two interesting facts about them
    """

In [5]:
prompt_template = PromptTemplate(
    input_variables=["information"],
    template=query
)

#### Define Chain

**What is a Chain?**

> - Allows to link the output of one LLM call as the input of another

In [6]:
chain = prompt_template | llm

**Note:**
- The `|` symbol chains together the different components, feeding the output from one component as input into the next component.
- In this chain the user input is passed to the prompt template, then the prompt template output is passed to the model. 

#### invoke Chain

In [8]:
text_data ="""
Geoffrey Everest Hinton (born 6 December 1947) is a British-Canadian computer scientist, cognitive scientist, 
cognitive psychologist, known for his work on artificial neural networks which earned him the title as the 
"Godfather of AI". Hinton is University Professor Emeritus at the University of Toronto. From 2013 to 2023, 
he divided his time working for Google (Google Brain) and the University of Toronto, before publicly announcing 
his departure from Google in May 2023, citing concerns about the risks of artificial intelligence (AI) technology.
In 2017, he co-founded and became the chief scientific advisor of the Vector Institute in Toronto.

With David Rumelhart and Ronald J. Williams, Hinton was co-author of a highly cited paper published in 1986 
that popularised the backpropagation algorithm for training multi-layer neural networks, although they were 
not the first to propose the approach. Hinton is viewed as a leading figure in the deep learning community.
The image-recognition milestone of the AlexNet designed in collaboration with his students Alex Krizhevsky 
and Ilya Sutskever for the ImageNet challenge 2012[22] was a breakthrough in the field of computer vision.

Hinton received the 2018 Turing Award, often referred to as the "Nobel Prize of Computing", together with 
Yoshua Bengio and Yann LeCun, for their work on deep learning. They are sometimes referred to as the 
"Godfathers of Deep Learning", and have continued to give public talks together. He was also awarded 
the 2024 Nobel Prize in Physics, shared with John Hopfield.
"""

In [9]:
output = chain.invoke(input={"information": text_data})

In [10]:
print(output.content)

**Summary:**
Geoffrey Hinton is a renowned British-Canadian computer scientist, cognitive scientist, and cognitive psychologist known as the "Godfather of AI" for his pioneering work on artificial neural networks. He has made significant contributions to the field of deep learning and has received numerous awards, including the Turing Award and the Nobel Prize in Physics.

**Two Interesting Facts:**

1. **Breakthrough in Image Recognition:** Geoffrey Hinton, along with his students Alex Krizhevsky and Ilya Sutskever, designed the AlexNet, which achieved a milestone in image recognition by winning the ImageNet challenge in 2012. This breakthrough paved the way for significant advancements in computer vision.

2. **Concerns about AI Risks:** In May 2023, Hinton publicly announced his departure from Google, citing concerns about the risks of artificial intelligence (AI) technology. This move highlights his commitment to responsible AI development and his willingness to speak out on the po

<hr style="border:2px solid black">

##  <span style="color:orange;">4. Implementing RAG with LangChain & Llama</span>




This project utilizes **Retrieval-Augmented Generation (RAG)** to enhance the search and retrieval of **medical research papers**. By integrating **FAISS** and **LangChain**, we develop an intelligent system that efficiently retrieves relevant documents from a **VectorStore** and uses a **language model** to generate insightful responses based on those retrieved papers.  

#### Objective  

- Build a **RAG-based** system for retrieving **research papers**.  
- Understand **embeddings, similarity search, and document retrieval** techniques.  
- Implement efficient **document storage, search** using FAISS and LangChain.  

#### Key Concepts  

- **Embeddings:** Transforming text into numerical vectors for efficient retrieval.  
- **VectorStores:** Storing and retrieving react research papers using embeddings.  
- **Similarity Search:** Identifying the most relevant papers based on a given query.  
- **LLM Integration:** Using a **Language Model** to enhance search results and generate meaningful responses from retrieved research papers.  

### Project Workflow


<img src="../Images/RAG_steps.png" width="950"/> 

üîó [**RAG Architecture**](https://weaviate.io/blog/introduction-to-rag#:~:text=RAG%20is%20a%20multi%2Dstep,prompt%2C%20and%20generates%20a%20response)


#### Stages & Steps :

üü£ Ingestion Stage:

1. *Load PDF Data:*
    - Use `PyPDFLoader` from LangChain to load and read PDF files.

2. *Document Chunking:*
    - Use `RecursiveCharacterTextSplitter` from LangChain to split documents into smaller chunks.

3. *Embedding Storage:*
    - Use `HuggingFaceEmbeddings` and `FAISS` from LangChain to convert chunks into vector embeddings and store them locally in a FAISS database.


üü£ Inference Stage:

4. *Retrieval Object Creation:*
    - Use FAISS to load embedded chunks from the stored VectorStore and create a retrieval object.

5. *Response Generation (Augmentation & Generation):*
    - Use `create_stuff_documents_chain` and `create_retrieval_chain` to connect the retriever to an LLM for answering queries.

6. *Chat with PDFs:*
    - Implement a RAG-based Q&A system for interacting with PDF documents.




Optional Enhancements:

7. *Semantic Search & Embedding Functions:*
    - Perform semantic similarity searches in the VectorStore.



#### Technology Stack :

*[LangChain](https://python.langchain.com/docs/introduction/)*
> framework for developing applications powered by LLMs

*[FAISS (Facebook AI Similarity Search)](https://ai.meta.com/tools/faiss/)*
>  library allowing storage of contextual embedding vectors in VectorStore and similarity search

*[Groq](https://groq.com/about-us/)*
> engine providing fast AI inference (conclusion from brand new data) in the cloud


<img src="../Images/RAG_step1_step2.png" width="800"/> 

#### Step 1: Load Data

In [11]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.vectorstores.faiss import DistanceStrategy
from langchain import hub
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
import numpy as np

In [12]:
def load_pdf_data(pdf_path):
    """
    Load text data from PDF file.
    """
    loader = PyPDFLoader(file_path=pdf_path)
    documents = loader.load()
    return documents

In [13]:
react_docs = load_pdf_data(pdf_path = "../documents/react_paper.pdf")

In [14]:
# Show number of pages
print(f"number of loaded pages: {len(react_docs)}")

number of loaded pages: 33


In [15]:
# Show page content
print(react_docs[0].page_content)

Published as a conference paper at ICLR 2023
REAC T: S YNERGIZING REASONING AND ACTING IN
LANGUAGE MODELS
Shunyu Yao‚àó*,1, Jeffrey Zhao2, Dian Yu2, Nan Du2, Izhak Shafran2, Karthik Narasimhan1, Yuan Cao2
1Department of Computer Science, Princeton University
2Google Research, Brain team
1{shunyuy,karthikn}@princeton.edu
2{jeffreyzhao,dianyu,dunan,izhak,yuancao}@google.com
ABSTRACT
While large language models (LLMs) have demonstrated impressive performance
across tasks in language understanding and interactive decision making, their
abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action
plan generation) have primarily been studied as separate topics. In this paper, we
explore the use of LLMs to generate both reasoning traces and task-speciÔ¨Åc actions
in an interleaved manner, allowing for greater synergy between the two: reasoning
traces help the model induce, track, and update action plans as well as handle
exceptions, while actions allow it to interface wit

#### Step 2: Split Documents into Chunks

**Why?** 
>  LLMs have a finite context window.

>  Chunking improves retrieval by splitting text into searchable pieces.

In [16]:
def split_documents(documents, chunk_size=200, chunk_overlap=50):
    """
    Splits documents into chunks of given size and overlap
    """
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    chunks = text_splitter.split_documents(documents=documents)
    
    # Just to add id for etch chunks to map it later 
    for i, chunk in enumerate(chunks):
         chunk.metadata.update({
        "id": f"chunk_{i}",
    })
    
    return chunks


react_chunks = split_documents(react_docs)

In [17]:
# Show number of chunks created
print(f"number of chunks created: {len(react_chunks)}","\n",f"Type of the chunks : {type(react_chunks)}","\n\n" ,react_chunks)

number of chunks created: 705 
 Type of the chunks : <class 'list'> 

 [Document(metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': '../documents/react_paper.pdf', 'total_pages': 33, 'page': 0, 'page_label': '1', 'id': 'chunk_0'}, page_content='Published as a conference paper at ICLR 2023\nREAC T: S YNERGIZING REASONING AND ACTING IN\nLANGUAGE MODELS'), Document(metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trap

#### Step 3: Generate and Store Embeddings in a VectorStore
<img src="../Images/RAG_step3.png" width="400"/> 

**Why?**
>  finding numerical representations of text chunks

In [18]:

def create_embedding_vector_db(chunks, db_name):
    """
    This function uses the open-source embedding model HuggingFaceEmbeddings 
    to create embeddings and store those in a VectorStore called FAISS, 
    which allows for efficient similarity search
    """
    # instantiate embedding model
    embedding = HuggingFaceEmbeddings(
        model_name='sentence-transformers/all-mpnet-base-v2',
        encode_kwargs={"normalize_embeddings": True}
    )
    # create the vector store 
    vectorstore = FAISS.from_documents(
        documents=chunks,
        embedding=embedding,
        distance_strategy=DistanceStrategy.COSINE  # or DistanceStrategy.DOT or DistanceStrategy.L2 
        
    )
    # save VectorStore locally
    vectorstore.save_local(f"../vector_databases/vector_db_{db_name}")
    return vectorstore


all_embedding=create_embedding_vector_db(chunks=react_chunks, db_name="react")


> ‚ö†Ô∏è **Note: Why Embedding Normalization Matters in RAG (with FAISS + Cosine Similarity)**  
>  
> In Retrieval-Augmented Generation (RAG), accurate retrieval is critical.  
> Since most vector search relies on comparing **semantic similarity**,  
> we often use **cosine similarity** to find relevant chunks.  
>  
> However, cosine similarity compares **direction**, not magnitude.  
> If your embeddings are **not normalized**, the similarity score can be biased by vector length ‚Äî leading to irrelevant results or lower-quality answers.  
>  

>  
> üîç **Importance of Normalization in FAISS (Cosine Similarity)**  
>  
> If you set `distance_strategy="COSINE"` but do **not** enable `normalize_embeddings=True`,  
> FAISS will default to using **unnormalized dot product** ‚Äî not true cosine similarity.  
>  
> **As a result:**  
> - Similarity scores may be skewed by vector lengths.  
> - Retrieval results can become inconsistent or incorrect.  
>  
> ‚úÖ **Solution:**  
> Always normalize embeddings (both **chunks** and **queries**) before storing or searching.  
> This ensures the dot product reflects **true cosine similarity**,  
> where comparison is based purely on **direction**, not **magnitude**.  
>  
> ‚ÑπÔ∏è Without normalization, FAISS effectively ignores your `COSINE` strategy and uses inner product directly. [source](https://github.com/langchain-ai/langchain/issues/32498)


<img src="../Images/RAG_step4_step5.png" width="600"/> 

#### Step 4: Load embedded chunks from VectorStore and make retrieve object

**Why ?**

> make sure that you are using the same embeddings model that you used when you story the chunks

> the Chain expecting retriever object as a input

In [19]:
def retrieve_from_vector_db(vector_db_path):
    """
    this function splits out a retriever object from a local VectorStore
    """
    # instantiate embedding model
    embeddings = HuggingFaceEmbeddings(
        model_name='sentence-transformers/all-mpnet-base-v2',
        encode_kwargs={"normalize_embeddings": True}
    )
    react_vectorstore = FAISS.load_local(
        folder_path=vector_db_path,
        embeddings=embeddings,
        allow_dangerous_deserialization=True,
        distance_strategy=DistanceStrategy.COSINE  # or DistanceStrategy.DOT or DistanceStrategy.L2 
    )
    retriever = react_vectorstore.as_retriever()
    return retriever ,react_vectorstore

# Load the retriever and index
react_retriever,react_vectorstore = retrieve_from_vector_db("../vector_databases/vector_db_react")
type(react_retriever),type(react_vectorstore)


(langchain_core.vectorstores.base.VectorStoreRetriever,
 langchain_community.vectorstores.faiss.FAISS)

In [20]:
react_retriever.vectorstore.docstore._dict

{'8109e355-bc7d-48bb-8fcc-648cd463df5c': Document(id='8109e355-bc7d-48bb-8fcc-648cd463df5c', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': '../documents/react_paper.pdf', 'total_pages': 33, 'page': 0, 'page_label': '1', 'id': 'chunk_0'}, page_content='Published as a conference paper at ICLR 2023\nREAC T: S YNERGIZING REASONING AND ACTING IN\nLANGUAGE MODELS'),
 'b6f43617-aa42-4982-96a5-0ee468f1a2b7': Document(id='b6f43617-aa42-4982-96a5-0ee468f1a2b7', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3

In [21]:
query="""
what is react ?
"""

In [22]:
react_retriever.get_relevant_documents(query,k=3)

[Document(id='fa789f6c-dbfa-4765-b630-4ee34dbe7f35', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': '../documents/react_paper.pdf', 'total_pages': 33, 'page': 13, 'page_label': '14', 'id': 'chunk_351'}, page_content='ReAct‚Äôs reasoning traces. Figure 5 shows that by simply removing a hallucinating sentence in Act\n17 and adding some hints in Act 23, ReAct can be made to change its behavior drastically to align'),
 Document(id='0a8381d6-1da9-4d86-b0e2-408054801702', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdf

#### Step 5: Connecting the Retriever to LLM and Generate Response

**Why?**

- [`create_stuff_documents_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html#langchain.chains.combine_documents.stuff.create_stuff_documents_chain) *chain passing documents to llm*
  > takes a list of documents and formats them all into a prompt, then passes that prompt to an LLM

  >passes ALL documents, so you should make sure it fits within the context window of the LLM being used

- [`create_retrieval_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html#langchain.chains.retrieval.create_retrieval_chain) *chain passing user inquiry to retriever object*

  > takes in a user inquiry, which is then passed to the retriever to fetch relevant documents
  
  > those documents (and original inputs) are then passed to an LLM to generate a response

In [23]:
def connect_chains(retriever):
    """
    this function connects stuff_documents_chain with retrieval_chain
    """
    stuff_documents_chain = create_stuff_documents_chain(
        llm=llm,
        prompt=hub.pull("langchain-ai/retrieval-qa-chat")
    )
    retrieval_chain = create_retrieval_chain(
        retriever=retriever,
        combine_docs_chain=stuff_documents_chain
    )
    return retrieval_chain


react_retrieval_chain = connect_chains(react_retriever)

In [24]:
output = react_retrieval_chain.invoke(
    {"input": "what is zebra?"}
)
type(output) , output.keys() 

(dict, dict_keys(['input', 'context', 'answer']))

In [25]:
output['context']

[Document(id='10112960-8953-4721-a5c9-fc02a8fad412', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': '../documents/react_paper.pdf', 'total_pages': 33, 'page': 10, 'page_label': '11', 'id': 'chunk_280'}, page_content='com/Authors-Notes/sparrow/sparrow-final.pdf.\nEhsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, and Richard Socher. A simple'),
 Document(id='75022e8b-bc6c-4437-b43a-dfe182fc5a9b', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea

In [26]:
print(output['answer'])

There is no mention of "Zebra" in the provided context.


#### Step 6: Chat with PDF

##### Load data

In [27]:
paracetamol_docs = load_pdf_data(pdf_path = "../documents/paracetamol.pdf")

##### Split document into chunks

In [28]:
paracetamol_chunks = split_documents(paracetamol_docs)

##### Create embeddings

In [29]:
create_embedding_vector_db(chunks=paracetamol_chunks, db_name="paracetamol")

<langchain_community.vectorstores.faiss.FAISS at 0x12252f690>

##### Retrieve from VectorStore

In [30]:
paracetamol_retriever = retrieve_from_vector_db("../vector_databases/vector_db_paracetamol")

##### Generation

In [31]:
paracetamol_retrieval_chain = connect_chains(paracetamol_retriever[0])

In [32]:
def print_output(
    inquiry,
    retrieval_chain=paracetamol_retrieval_chain
):
    result = retrieval_chain.invoke({"input": inquiry})
    print(result['answer'].strip("\n"))

**inquiry 1**

In [33]:
print_output("Give me the summary of Paracetamol in 3 sentences.")

Paracetamol 500 mg is a type of pain-relieving medication. It is available in tablet form and can be taken with food and drinks to help with its absorption and effectiveness. However, it may interact with certain tests, such as urine acid and blood sugar tests.


**inquiry 2**

In [34]:
print_output("Geb mir die Zusammenfassung von Paracetamol in 3 S√§tzen.")

Hier ist eine Zusammenfassung von Paracetamol in 3 S√§tzen:

Paracetamol 500 mg ist ein schmerzstillendes und fiebersenkendes Arzneimittel, das √úbelkeit, Erbrechen, Appetitlosigkeit, Bl√§sse und Bauchschmerzen lindern kann. Es kann die Aufnahme und Wirkung von Paracetamol beschleunigen, wenn eine gr√∂√üere Menge eingenommen wird. Paracetamol darf nicht mit Alkohol eingenommen werden und ist auch w√§hrend der Schwangerschaft und Stillzeit mit Vorsicht zu genie√üen.


#### Step 7: Semantic search and embedding function

In [35]:
# Call the create_embedding_vector_db from the Step 3 
all_embedding=create_embedding_vector_db(chunks=react_chunks, db_name="react")

In [36]:
# see all the _dict and the matedata of the chanks
all_embedding.docstore._dict.values() 

dict_values([Document(id='d48014ae-8989-470f-ae80-c1ba10ff0fd1', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': '../documents/react_paper.pdf', 'total_pages': 33, 'page': 0, 'page_label': '1', 'id': 'chunk_0'}, page_content='Published as a conference paper at ICLR 2023\nREAC T: S YNERGIZING REASONING AND ACTING IN\nLANGUAGE MODELS'), Document(id='04bd62b2-9df9-44ba-8553-4eaefee05ea7', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subje

In [37]:
# see all the embedding of the chanks
all_embedding.index.reconstruct_n() 

array([[ 0.05631226,  0.04336628, -0.03845092, ...,  0.07139471,
        -0.05103523, -0.04845445],
       [ 0.06336991,  0.02522455, -0.04140245, ..., -0.00365809,
        -0.07649195, -0.06401425],
       [ 0.0325838 , -0.02973063, -0.01315417, ..., -0.04946292,
        -0.02444163, -0.05943531],
       ...,
       [ 0.01722936,  0.08850818,  0.00600497, ..., -0.01606479,
        -0.0421785 , -0.01688989],
       [ 0.00841374,  0.04345893, -0.00780026, ..., -0.00686866,
        -0.01094507, -0.02737107],
       [ 0.02459178,  0.04008967, -0.01268218, ...,  0.00441174,
         0.05505519, -0.00066032]], shape=(705, 768), dtype=float32)

In [38]:
query="5H\x0c0RFN\x10XS\x03,QIRUPDWLRQ\x03DQG\x03$SS"
embedding_query = all_embedding.embedding_function.embed_query(query) # you can use all_embedding._embed_query(query) to have the same result
len(embedding_query) ,type(embedding_query)

(768, list)

In [39]:
all_embedding.similarity_search(query ,k=3)

[Document(id='6da57ad0-ade8-4253-8ecc-f3b288a77521', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': '../documents/react_paper.pdf', 'total_pages': 33, 'page': 1, 'page_label': '2', 'id': 'chunk_36'}, page_content='5RZ\x03QH[W\x03DQG\x03ILQG\x03ZKDW\x03RWKHU\x03GHYLFH\x03FDQ\x03FRQWURO\x03LW\x11\n$FW\x03\x15\x1d\x036HDUFK>)URQW\x035RZ@\n2EV\x03\x15\x1d\x03&RXOG\x03QRW\x03ILQG\x03>)URQW\x035RZ@\x11\x036LPLODU\x1d\x03>\n)URQW\x035RZ\x03\n6HDW\x03WR\x03(DUWK\n\x0f\x03\n)URQW\x035RZ\x030RWRUVSRUWV\n\x0f\n)URQW\x035RZ\x03\n\x0bVRIWZDUH'),
 Document(id='2f9a4728-bb4b-4535-94a0-bc074387dbb6', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-

In [40]:
all_embedding.similarity_search_with_score(query,k=2)

[(Document(id='6da57ad0-ade8-4253-8ecc-f3b288a77521', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': '../documents/react_paper.pdf', 'total_pages': 33, 'page': 1, 'page_label': '2', 'id': 'chunk_36'}, page_content='5RZ\x03QH[W\x03DQG\x03ILQG\x03ZKDW\x03RWKHU\x03GHYLFH\x03FDQ\x03FRQWURO\x03LW\x11\n$FW\x03\x15\x1d\x036HDUFK>)URQW\x035RZ@\n2EV\x03\x15\x1d\x03&RXOG\x03QRW\x03ILQG\x03>)URQW\x035RZ@\x11\x036LPLODU\x1d\x03>\n)URQW\x035RZ\x03\n6HDW\x03WR\x03(DUWK\n\x0f\x03\n)URQW\x035RZ\x030RWRUVSRUWV\n\x0f\n)URQW\x035RZ\x03\n\x0bVRIWZDUH'),
  np.float32(0.6014277)),
 (Document(id='2f9a4728-bb4b-4535-94a0-bc074387dbb6', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref

In [41]:
# here we should embadding the query becuse the similarity_search_by_vector Expecting embedding input 
all_embedding.similarity_search_by_vector(embedding_query,k=3)

[Document(id='6da57ad0-ade8-4253-8ecc-f3b288a77521', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-13T00:09:11+00:00', 'author': '', 'keywords': '', 'moddate': '2023-03-13T00:09:11+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': '../documents/react_paper.pdf', 'total_pages': 33, 'page': 1, 'page_label': '2', 'id': 'chunk_36'}, page_content='5RZ\x03QH[W\x03DQG\x03ILQG\x03ZKDW\x03RWKHU\x03GHYLFH\x03FDQ\x03FRQWURO\x03LW\x11\n$FW\x03\x15\x1d\x036HDUFK>)URQW\x035RZ@\n2EV\x03\x15\x1d\x03&RXOG\x03QRW\x03ILQG\x03>)URQW\x035RZ@\x11\x036LPLODU\x1d\x03>\n)URQW\x035RZ\x03\n6HDW\x03WR\x03(DUWK\n\x0f\x03\n)URQW\x035RZ\x030RWRUVSRUWV\n\x0f\n)URQW\x035RZ\x03\n\x0bVRIWZDUH'),
 Document(id='2f9a4728-bb4b-4535-94a0-bc074387dbb6', metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-03-

In [42]:

# Ensure query_embedding is a NumPy array with float32 type
query_embedded_array = np.array([embedding_query], dtype=np.float32)  # Correct variable name

# Perform FAISS search
distances, indexes = all_embedding.index.search(query_embedded_array, k=3)


# Retrieve the embeddings of the top 5 search results
retrieved_embeddings = [all_embedding.index.reconstruct(int(idx)) for idx in indexes[0]]

# Print embeddings of the retrieved documents
print("Embeddings of Retrieved Documents:", retrieved_embeddings)

Embeddings of Retrieved Documents: [array([ 4.20535691e-02,  1.24242585e-02, -1.26207536e-02,  6.60819486e-02,
       -5.01307622e-02,  5.24305403e-02, -1.18936002e-02, -2.60302629e-02,
       -5.10713756e-02,  5.56227425e-03,  9.05946828e-03,  2.92965118e-02,
        2.03612652e-02,  2.55907867e-02,  3.07282023e-02,  3.02856527e-02,
        4.90690628e-03, -2.20699906e-02, -7.99373537e-02, -3.72758321e-02,
       -4.45533991e-02,  3.14433835e-02, -1.00241043e-02, -2.10824627e-02,
        4.37799981e-03,  7.34696910e-03, -4.84062657e-02,  6.61053881e-03,
       -4.81361384e-03,  3.12439777e-04,  2.93423571e-02,  4.26930748e-02,
       -6.61351830e-02, -1.45976860e-02,  3.15480816e-06, -5.54316714e-02,
       -4.40676138e-02, -1.58801340e-02, -1.93040911e-02,  3.71765643e-02,
       -2.54021282e-03,  1.45259187e-01, -7.13418946e-02,  1.72161926e-02,
       -6.05357718e-03, -4.56971154e-02,  2.39643678e-02,  7.58329108e-02,
        2.24833321e-02,  2.35965140e-02,  1.38762044e-02, -4.394

<hr style="border:2px solid black">

##  <span style="color:orange;">4. Conclusion</span>




- We created an embedding store using FAISS.‚úÖ

- Retrieved relevant documents using a retriever object.‚úÖ

- Implemented a LangChain pipeline to process and generate responses.‚úÖ

- Built an interactive PDF chat system with RAG.‚úÖ


<hr style="border:2px solid black">

##  <span style="color:orange;">5. References</span> 


1. [RAG vs. Fine Tuning](https://www.youtube.com/watch?v=00Q0G84kq3M)
2. [How to Use Langchain Chain Invoke: A Step-by-Step Guide](https://medium.com/@asakisakamoto02/how-to-use-langchain-chain-invoke-a-step-by-step-guide-9a6f129d77d1)
3. [Implementing RAG using Langchain and Ollama](https://medium.com/@imabhi1216/implementing-rag-using-langchain-and-ollama-93bdf4a9027c)