<a href="https://colab.research.google.com/github/manojmandal27/DecisionTree/blob/main/RAG_with_OpenAI_LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Learning Objectives

At the end of the experiment, you will be able to:

1. Load the Documents
2. Splitting the documents into chunks
3. Embedding the chunks and storing them in vector db
4. Retrieving the relevant chunks to the query
 * Addressing Diversity
 * Addressing Specificity
5. Connecting with LLM to get a final grounded answer

## Introduction

> **RAG diagram:**
>
> <img src='https://drive.google.com/uc?id=1sCVvpsmtZEU1WSK1FFGMGHbEjrgtCNLi'>

> **Vector Store and Retrieval:**
>
> <img src='https://drive.google.com/uc?id=1_zX5gtSNrV8Qdx7Nz4_gMR8dCwvxCDS7' width=750px>

> **Embedding Model:**
>
> <img src='https://drive.google.com/uc?id=1HnvjGJ4HmpS-0wndpH-Q8cKMwIwWkTUe'>

> **Retrieval in Action:**
>
> <img src='https://drive.google.com/uc?id=1ry2TWFsewwqYP3Lw9muuPmbyuQqXwnYV' width=800px>

> **Example workflow with embedding model:**
>
><br>
>
> <img src='https://drive.google.com/uc?id=1zTuMMX54L2HrnmCYktTxVfMVrkIz8w15' width=600px>

### Install Dependencies

In [None]:
%%capture
!pip -q install openai
!pip -q install langchain-openai
!pip -q install langchain-core
!pip -q install langchain-community
!pip -q install sentence-transformers
!pip -q install langchain-huggingface
!pip -q install langchain-chroma
!pip -q install chromadb
!pip -q install pypdf

### Import Required Packages

In [None]:
import os
import openai
import numpy as np
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import ChatOpenAI
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

#### **Provide your OpenAI API key**

In [None]:
# Read OpenAI key from Colab Secrets

from google.colab import userdata

api_key = userdata.get('OA_API')           # <-- change this as per your secret's name
os.environ['OPENAI_API_KEY'] = api_key
openai.api_key = os.getenv('OPENAI_API_KEY')

### Load LLM

In [None]:
# Load Model

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [None]:
# General query
response = llm.invoke("How to learn programming? give 5 points")
print(response.content)

Learning programming can be an exciting and rewarding journey. Here are five key points to help you get started:

1. **Choose a Programming Language**: Start with a beginner-friendly language such as Python, JavaScript, or Ruby. Python is often recommended for its readability and versatility, making it suitable for various applications, from web development to data science.

2. **Utilize Online Resources**: Take advantage of online platforms like Codecademy, freeCodeCamp, Coursera, or edX. These platforms offer structured courses, tutorials, and exercises that can help you learn at your own pace.

3. **Practice Regularly**: Consistent practice is crucial for mastering programming. Work on small projects, solve coding challenges on platforms like LeetCode or HackerRank, and contribute to open-source projects to apply what you’ve learned.

4. **Join a Community**: Engage with other learners and experienced programmers through forums like Stack Overflow, Reddit, or local coding meetups. P

### **Loading the documents**

[PDF Loader](https://python.langchain.com/docs/how_to/document_loader_pdf/)

In [None]:
# UPLOAD the Docs first to this notebook, then run this cell

from langchain_community.document_loaders import PyPDFLoader

# Load PDF
loaders = [
    PyPDFLoader("/content/pca_d1.pdf"),
    PyPDFLoader("/content/ens_d2.pdf"),
    PyPDFLoader("/content/ens_d2.pdf"),    # Loading duplicate documents on purpose
]

docs = []
for loader in loaders:
    docs.extend(loader.load())


In [None]:
len(docs)        # 7 pages were there in total from above documents

7

In [None]:
docs

[Document(metadata={'source': '/content/pca_d1.pdf', 'page': 0}, page_content=' \n1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then dimensions are merely a basis of view, like where is the data located when \nit is observed from horizontal axis or vertical axis. \nAs the dimensions of data increases, the difficulty to visualize it and perform computations on \nit also increases. So, how to reduce the dimensions of a data:- \n• Remove the redundant dimensions \n• Only keep the most important dimensions  \nLet us first try to understand some terms:- \nVariance : It is a measure of the variability or it simply measures how spread the data set is.  \nMathematically, i

In [None]:
print(docs[0].page_content)

 
1 
 
 
N 
 
1 Principal Component Analysis 
In real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  
data and find various patterns in it or use it to train some machine learning models.  One way to  
think about dimensions is that suppose you have an data point x , if we consider this data point as 
a physical object then dimensions are merely a basis of view, like where is the data located when 
it is observed from horizontal axis or vertical axis. 
As the dimensions of data increases, the difficulty to visualize it and perform computations on 
it also increases. So, how to reduce the dimensions of a data:- 
• Remove the redundant dimensions 
• Only keep the most important dimensions  
Let us first try to understand some terms:- 
Variance : It is a measure of the variability or it simply measures how spread the data set is.  
Mathematically, it is the average squared deviation from the mean score. We use the following 
formula to compute 

### **Splitting of document**

[Recursively split by character](https://python.langchain.com/docs/how_to/recursive_text_splitter/)

[Split by character](https://python.langchain.com/docs/how_to/character_text_splitter/)

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [None]:
# Split
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50
)

In [None]:
splits = text_splitter.split_documents(docs)

print(len(splits))
print(len(splits[0].page_content) )
splits[0].page_content

26
443


'1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then dimensions are merely a basis of view, like where is the data located when'

In [None]:
splits[0]

Document(metadata={'source': '/content/pca_d1.pdf', 'page': 0}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then dimensions are merely a basis of view, like where is the data located when')

### **Embeddings**

Let's take our splits and embed them.

In [None]:
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model='text-embedding-3-small')

In [None]:
embedding

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x7a1b02bc7bb0>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x7a1b02c37cd0>, model='text-embedding-3-small', dimensions=None, deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

### **Understanding similarity search with a toy example**

In [None]:
sentence1 = "i like dogs"
sentence2 = "i like cats"
sentence3 = "the weather is ugly, too hot outside"

In [None]:
embedding1 = embedding.embed_query(sentence1)
embedding2 = embedding.embed_query(sentence2)
embedding3 = embedding.embed_query(sentence3)

In [None]:
len(embedding1), len(embedding2), len(embedding3)

(1536, 1536, 1536)

In [None]:
embedding1[:10]

[0.016543298959732056,
 -0.03333469107747078,
 3.722277324413881e-05,
 0.0063320123590528965,
 0.02789919637143612,
 -0.011942299082875252,
 -0.007651416584849358,
 0.037259072065353394,
 -0.07275893539190292,
 -0.022069009020924568]

In [None]:
import numpy as np

def cosine_similarity(vector1, vector2):
    # Ensure that the vectors are numpy arrays
    vector1 = np.array(vector1)
    vector2 = np.array(vector2)

    # Calculate the dot product of the vectors
    dot_product = np.dot(vector1, vector2)

    # Calculate the magnitude (norm) of the vectors
    norm_vector1 = np.linalg.norm(vector1)
    norm_vector2 = np.linalg.norm(vector2)

    # Compute cosine similarity
    if norm_vector1 == 0 or norm_vector2 == 0:
        return 0  # Avoid division by zero
    return dot_product / (norm_vector1 * norm_vector2)


In [None]:
cosine_similarity(embedding1, embedding2), cosine_similarity(embedding1, embedding3), cosine_similarity(embedding2, embedding3)

(0.7222641205374054, 0.20250974039559247, 0.1820158505380567)

### **Vectorstores**

In [None]:
from langchain_chroma import Chroma       # Light-weight and in memory

In [None]:
persist_directory = 'docs/chroma/'
!rm -rf ./docs/chroma  # remove old database files if any

In [None]:
vectordb = Chroma.from_documents(
    documents=splits,                    # splits we created earlier
    embedding=embedding,
    persist_directory=persist_directory, # save the directory
)

In [None]:
print(vectordb._collection.count()) # same as number of splits

26


### **Similarity Search in Vector store**

Algorithms for retrieving relevant chunks In Vector databases,

In vector databases, algorithms for retrieving relevant chunks to a query are often based on **similarity search techniques**, primarily using nearest neighbor search.

Here are some common approaches:

>**Approximate Nearest Neighbor (ANN) Search:** Vector databases frequently use ANN algorithms to improve efficiency when searching for vectors that
are close to the query vector.
>
>Popular **ANN** algorithms include:
>
>1. HNSW (Hierarchical Navigable Small World Graph): This is a graph-based approach that finds approximate nearest neighbors using a multi-
layered graph structure.
>
>2. Faiss: An open-source library developed by Facebook, which uses various algorithms for fast similarity search, such as Product Quantization and
Inverted File System (IVF).
>
>3. Annoy (Approximate Nearest Neighbors Oh Yeah): Developed by Spotify, it uses a forest of random projection trees for approximate nearest
neighbor search.


In [None]:
question = "How does ensemble method works?"

In [None]:
docs = vectordb.similarity_search(question, k=6)     # k --> No. of Document object to return

In [None]:
print(len(docs))

for i in range(len(docs)):
    print(docs[i].page_content)
    print('='*140)

6
Why use Ensemble Methods? 
Ensemble Methods are used in order to: 
• decrease variance (bagging) 
• decrease bias (boosting) 
• improve predictions (stacking) 
 
Bagging 
Bagging actually refers to Bootstrap Aggregators. 
Bagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap - 
ping. In turn, this reduces the noise and variance by utilizing multiple samples. Each hypothesis
Why use Ensemble Methods? 
Ensemble Methods are used in order to: 
• decrease variance (bagging) 
• decrease bias (boosting) 
• improve predictions (stacking) 
 
Bagging 
Bagging actually refers to Bootstrap Aggregators. 
Bagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap - 
ping. In turn, this reduces the noise and variance by utilizing multiple samples. Each hypothesis
considered. The product is bought by the user when the combined ratings of the group is positive. 
The user gets a fairer idea about the product when all 

### **Edge cases where failure may happen**

1. Lack of Diversity : Semantic search fetches all similar documents, but does not enforce diversity.

    - Notice that we're getting duplicate chunks (because of the duplicate `ens_d2.pdf` in the index). `docs[0]` and `docs[1]` are indentical.

  **Addressing Diversity - MMR (Maximum Marginal Relevance)**

Maximum Marginal Relevance (MMR) is a method used to retrieve relevant items to a query while avoiding redundancy. It does this by ensuring a balance between relevancy and diversity in the items retrieved.

<img src='https://miro.medium.com/v2/resize:fit:828/format:webp/1*U-9mPt5tBfPBPrwC4_oD1w.png'>

In [None]:
question = 'How ensemble method works?'
docs = vectordb.similarity_search(question, k=3)     # Without MMR

print(len(docs))

for i in range(len(docs)):
    print(docs[i].page_content)
    print('='*140)

3
Why use Ensemble Methods? 
Ensemble Methods are used in order to: 
• decrease variance (bagging) 
• decrease bias (boosting) 
• improve predictions (stacking) 
 
Bagging 
Bagging actually refers to Bootstrap Aggregators. 
Bagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap - 
ping. In turn, this reduces the noise and variance by utilizing multiple samples. Each hypothesis
Why use Ensemble Methods? 
Ensemble Methods are used in order to: 
• decrease variance (bagging) 
• decrease bias (boosting) 
• improve predictions (stacking) 
 
Bagging 
Bagging actually refers to Bootstrap Aggregators. 
Bagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap - 
ping. In turn, this reduces the noise and variance by utilizing multiple samples. Each hypothesis
considered. The product is bought by the user when the combined ratings of the group is positive. 
The user gets a fairer idea about the product when all 

**Example 1. Addressing Diversity - MMR-Maximum Marginal Relevance**

In [None]:
docs_with_mmr = vectordb.max_marginal_relevance_search(question, k=3, fetch_k=6)   # With MMR

print(len(docs_with_mmr))

for i in range(len(docs_with_mmr)):
    print(docs_with_mmr[i].page_content)
    print('='*140)

3
Why use Ensemble Methods? 
Ensemble Methods are used in order to: 
• decrease variance (bagging) 
• decrease bias (boosting) 
• improve predictions (stacking) 
 
Bagging 
Bagging actually refers to Bootstrap Aggregators. 
Bagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap - 
ping. In turn, this reduces the noise and variance by utilizing multiple samples. Each hypothesis
considered. The product is bought by the user when the combined ratings of the group is positive. 
The user gets a fairer idea about the product when all the ratings are combined. 
Here, the combination of ratings is done so that the decision making process of the user is made  
easy. 
Ensemble Methods refer to combining many different machine learning models in order to get a  
more powerful prediction. 
Thus, ensemble methods increase the accuracy of the predictions.
1  
 
Ensemble Methods 
Let us consider a real world situation which uses Ensemble Methods, which is, 

2. Lack of specificity:  The question may be from a particular doc but answer may contain information from other doc.

  **Addressing Specificity: Working with metadata - Manually**

  **Working with metadata using self-query retriever - Automatically**

**Example 2. Addressing Specificity: Working with metadata - Manually**

In [None]:
# Without metadata information
question = "What is variance?"

docs = vectordb.similarity_search(question, k=5)

for doc in docs:
    print(doc.metadata)    # metadata contains information about from which doc the answer has been fetched

{'page': 0, 'source': '/content/pca_d1.pdf'}
{'page': 0, 'source': '/content/ens_d2.pdf'}
{'page': 0, 'source': '/content/ens_d2.pdf'}
{'page': 0, 'source': '/content/pca_d1.pdf'}
{'page': 1, 'source': '/content/pca_d1.pdf'}


We can filter the results based on metadata.

In [None]:
# With metadata information
question = "what is the role of variance in pca?"
docs = vectordb.similarity_search(
    question,
    k=5,
    filter={"source":'/content/ens_d2.pdf'}     # manually passing metadata, using metadata filter.
)

for doc in docs:
    print(doc.metadata)

{'page': 0, 'source': '/content/ens_d2.pdf'}
{'page': 0, 'source': '/content/ens_d2.pdf'}
{'page': 0, 'source': '/content/ens_d2.pdf'}
{'page': 0, 'source': '/content/ens_d2.pdf'}
{'page': 1, 'source': '/content/ens_d2.pdf'}


In [None]:
# With metadata information + MMR

docs_with_mmr = vectordb.max_marginal_relevance_search(question,
                                                       k=2,
                                                       fetch_k=5,
                                                       filter={"source":'/content/ens_d2.pdf'}     # manually passing metadata, using metadata filter.
                                                       )

In [None]:
for i in range(len(docs_with_mmr)):
    print(docs_with_mmr[i].page_content)
    print('='*140)

models. 
 
Variance 
Variance quantifies how the predictions made on same observation are different from each other. A  
high variance model will over -fit on your training population and perform badly on any observation  
beyond training. Thus, we aim at low variance.
subset of features is selected, further randomizing the tree. 
As a result, the bias of the forest increases slightly, but due to the averaging of less correlated  
trees, its variance decreases, resulting in an overall better model.


[**Addressing Specificity -Automatically: Working with metadata using self-query retriever**](https://python.langchain.com/docs/how_to/self_query/)

### **Additional tricks: Compression**

Another approach for improving the quality of retrieved docs is compression. Information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

[Contextual compression](https://python.langchain.com/docs/how_to/contextual_compression/) is meant to fix this.

## **Retrieval**

**[Vectorstore as a retriever](https://python.langchain.com/docs/how_to/vectorstore_retriever/)**

**Better Approach**

In [None]:
# Without MMR
question = "What is principal component analysis?"
retriever = vectordb.as_retriever(search_kwargs={"k": 3})
docs = retriever.invoke(question)
docs

[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then di

In [None]:
# With MMR
retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 2, "fetch_k":5})
docs = retriever.invoke(question)
docs

[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then di

## **Augmentation**

In [None]:
from langchain_core.prompts import PromptTemplate                                    # To format prompts
from langchain_core.output_parsers import StrOutputParser                            # to transform the output of an LLM into a more usable format
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough          # Required by LCEL (LangChain Expression Language)

In [None]:
# Build prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""

QA_PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)

## **Creating final RAG Chain**

> <img src='https://www.pinecone.io/_next/image/?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2Fvr8gru94%2Fproduction%2F63f8a8482c9ec06a8d7d1041514f87c06dd108a9-3442x942.png&w=3840&q=75' width=1200px>

[[Image source](https://www.pinecone.io/learn/series/langchain/langchain-expression-language/)]

Above figure describes the LCEL flow using `RunnableParallel` and `RunnablePassthrough`.

A Runnable is a **unit of execution** in the LangChain framework. It represents a specific task or operation that can be performed.

Examples of Runnables include data transformations, computations, or any other operation that can be **expressed** in the LCEL(LangChain expression language).

[Runnable Lambdas](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.RunnableLambda.html) is a LangChain abstraction that allows us to turn Python functions into **pipe-compatible functions**, similar to the Runnable class.

[RunnablePassthrough](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.passthrough.RunnablePassthrough.html) on its own allows you to pass inputs unchanged. This typically is **used in conjuction with [RunnableParallel](https://python.langchain.com/v0.1/docs/expression_language/interface/#parallelism)** to pass data through to a new key in the map.

The **RunnableParallel** object allows us to define multiple values and operations, and run them all in parallel.

The **RunnablePassthrough** object is used as a “passthrough” that takes any input to the current component ('retrieval' in above figure) and allows us to provide it in the component output via the “question” key or any other custom key.

In [None]:
retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 7, "fetch_k":15})
retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7a1b0185c940>, search_type='mmr', search_kwargs={'k': 7, 'fetch_k': 15})

In [None]:
retrieval = RunnableParallel(
    {
        "context": RunnablePassthrough(context= lambda x: x["question"] | retriever),
        "question": RunnablePassthrough()
        }
    )

In [None]:
# RAG Chain

rag_chain = (retrieval                     # Retrieval
             | QA_PROMPT                   # Augmentation
             | llm                         # Generation
             | StrOutputParser()
             )

In [None]:
response = rag_chain.invoke({"question": "What is PCA ?"})

response

'PCA, or Principal Component Analysis, is a statistical technique used for dimensionality reduction. It transforms a dataset into a new coordinate system, where the greatest variance by any projection of the data lies on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. This helps in simplifying the dataset while retaining its essential features, making it easier to visualize and analyze. Thanks for asking!'

In [None]:
response = rag_chain.invoke({"question": "What is principal component analysis?"})

response

'Principal component analysis (PCA) is a statistical technique used to reduce the dimensionality of a dataset while preserving as much variance as possible. It transforms the original variables into a new set of uncorrelated variables called principal components, which are ordered by the amount of variance they capture from the data. PCA is commonly used in exploratory data analysis, data visualization, and preprocessing for machine learning.\n\nThanks for asking!'

In [None]:
response = rag_chain.invoke({"question": "How ensemble method works?"})

print(response)

Ensemble methods work by combining multiple models to improve the overall performance of a predictive task. The idea is that by aggregating the predictions from several models, the ensemble can achieve better accuracy and robustness than any individual model. There are several types of ensemble methods, including:

1. **Bagging (Bootstrap Aggregating)**: This technique involves training multiple models on different subsets of the training data, which are created by sampling with replacement. The final prediction is made by averaging the predictions (for regression) or by majority voting (for classification).

2. **Boosting**: In boosting, models are trained sequentially, with each new model focusing on the errors made by the previous ones. The predictions from all models are then combined, often with more weight given to the more accurate models.

3. **Stacking**: This method involves training multiple models (the base learners) and then using another model (the meta-learner) to combin

In [None]:
# For queries that is not in documents
response = rag_chain.invoke({"question": "Who is the CEO of OpenAI "})

print(response)

I don't know. Thanks for asking!


**To check what is being retreived from the retriever**

In [None]:
chain_retriever = RunnablePassthrough() | retriever

In [None]:
chain_retriever.invoke("What is principal component analysis?")

[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then di

In [None]:
chain_retriever.invoke("How ensemble method works?")

[Document(metadata={'page': 0, 'source': '/content/ens_d2.pdf'}, page_content='Why use Ensemble Methods? \nEnsemble Methods are used in order to: \n• decrease variance (bagging) \n• decrease bias (boosting) \n• improve predictions (stacking) \n \nBagging \nBagging actually refers to Bootstrap Aggregators. \nBagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap - \nping. In turn, this reduces the noise and variance by utilizing multiple samples. Each hypothesis'),
 Document(metadata={'page': 0, 'source': '/content/ens_d2.pdf'}, page_content='considered. The product is bought by the user when the combined ratings of the group is positive. \nThe user gets a fairer idea about the product when all the ratings are combined. \nHere, the combination of ratings is done so that the decision making process of the user is made  \neasy. \nEnsemble Methods refer to combining many different machine learning models in order to get a  \nmore powerful predict

[**Details of Chroma through LangChain**](https://python.langchain.com/docs/integrations/vectorstores/chroma/)

## Reusing Vector DB

### **Download the vector DB**

In [None]:
# Zip the entire folder
!zip -r /content/docs.zip /content/docs

  adding: content/docs/ (stored 0%)
  adding: content/docs/chroma/ (stored 0%)
  adding: content/docs/chroma/d6331c83-e352-441d-85a0-e05f62774915/ (stored 0%)
  adding: content/docs/chroma/d6331c83-e352-441d-85a0-e05f62774915/data_level0.bin (deflated 100%)
  adding: content/docs/chroma/d6331c83-e352-441d-85a0-e05f62774915/link_lists.bin (stored 0%)
  adding: content/docs/chroma/d6331c83-e352-441d-85a0-e05f62774915/header.bin (deflated 61%)
  adding: content/docs/chroma/d6331c83-e352-441d-85a0-e05f62774915/length.bin (deflated 13%)
  adding: content/docs/chroma/chroma.sqlite3 (deflated 61%)


In [None]:
from google.colab import files
files.download("/content/docs.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### **Upload the vector db from previous step and unzip**

In [None]:
!unzip /content/docs.zip  -d /

Archive:  /content/docs.zip
replace /content/docs/chroma/d6331c83-e352-441d-85a0-e05f62774915/data_level0.bin? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [None]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model='text-embedding-3-small')

vectordb = Chroma(persist_directory = 'docs/chroma/',
                  embedding_function = embedding
                  )