
# Model Architecture and Approach to Retrieval

This project aims to develop a Question-Answering (QA) system that extracts relevant information from a PDF document using vector-based retrieval and generates detailed answers with a generative language model. The approach combines several key components: PDF document processing, vector embeddings for retrieval, and a generative LLM to answer questions.

## 1. PDF Document Processing
   - **PDF Loader**: We use the `PyPDFLoader` from the LangChain framework to load PDF documents. This loader parses the PDF and extracts the textual content, which is then processed for further tasks.
   - **Text Splitting**: Since PDFs can contain large chunks of text, we break down the document into smaller, manageable pieces using the `RecursiveCharacterTextSplitter`. It divides the content into chunks of 1000 characters with an overlap of 200 characters between consecutive chunks. This overlap ensures that no important context is lost when splitting the document.

## 2. Vector Embeddings for Retrieval
   - **Embeddings**: We use the `GoogleGenerativeAIEmbeddings` model (`embedding-001`) to generate vector embeddings for the document chunks. Vector embeddings are high-dimensional representations of the text that capture semantic meaning. These embeddings are generated based on the pre-trained model and are critical for the next step, which involves similarity-based retrieval.
   - **Vector Store (FAISS)**: The document chunks and their corresponding embeddings are stored in a FAISS vector store. FAISS (Facebook AI Similarity Search) is an efficient tool for large-scale vector similarity searches. Once the embeddings are indexed, this store allows us to perform fast similarity searches based on query vectors.
   
   - **Process Flow**:
    1. The PDF document is loaded and split into smaller chunks.
    2. Each chunk is converted into a vector embedding using the embedding model.
    3. The vector embeddings are stored in a FAISS index to facilitate fast similarity search during queries.

## 3. Generative Responses Using a Language Model
   - **Generative LLM**: The system uses the `Gemma2-9b-it` model from Groq, integrated with the `ChatGroq` API, to generate **faster** responses. This language model is capable of processing input queries and providing detailed, context-aware answers.
   - **Chain Creation**: The documents retrieved from the vector store serve as the context for generating responses. The system uses the `create_stuff_documents_chain` method to combine the retrieved documents with the query, constructing a cohesive prompt for the LLM.
   - **Prompt Template**: A `ChatPromptTemplate` is designed to ensure that the model generates answers based strictly on the provided context. The template is structured as follows:

     ```
     Answer the questions based on the provided context only.
     Please provide the most accurate and detailed response based on the question.
     <context>
     {context}
     <context>
     Questions:{input}
     ```

     This format helps guide the LLM to restrict its output to the information contained in the document, avoiding hallucinations or off-topic responses.

## 4. End-to-End Retrieval and Generation Process
   - **Document Retrieval**: When a query is made, it is first converted into a vector embedding using the same `GoogleGenerativeAIEmbeddings` model. This query embedding is then used to search the FAISS vector store for the most semantically similar document chunks.
   - **Generative Response**: The retrieved documents, which serve as the context, are passed to the LLM along with the query. The LLM processes the context and generates a detailed response, ensuring that the answer is accurate and context-relevant.
   
   - **Process Flow**:
    1. A user provides a query.
    2. The system converts the query into an embedding and performs a similarity search on the FAISS vector store.
    3. The retrieved documents are fed into the LLM along with the query, generating an answer.

## How Generative Responses Are Created
1. **Query Vectorization**: The query is converted into a vector using the same `GoogleGenerativeAIEmbeddings` model, ensuring that it can be compared with the document embeddings stored in FAISS.
2. **Document Retrieval**: The vectorized query is passed to FAISS to retrieve the most relevant document chunks based on cosine similarity.
3. **Generative Model**: The retrieved documents are passed as context to the Groq LLM, which generates a response. The LLM is guided by the `ChatPromptTemplate` to ensure the answer is extracted and structured based on the document content.

## Example Use Case
For example, the user asks, **"What is a function and its features?"**:
1. The query is vectorized and used to retrieve the relevant sections of a PDF document on functions.
2. These document chunks (retrieved from the FAISS vector store) contain relevant descriptions and features of a function.
3. The generative model processes the context and produces an answer summarizing what a function is and its key features.

This approach ensures that responses are grounded in the provided document and accurately reflect its content.


In [None]:
!pip install faiss-cpu groq langchain-groq PyPDF2 langchain_google_genai langchain streamlit langchain_community python-dotenv pypdf

In [18]:
import os
from dotenv import load_dotenv
import time
from langchain_groq import ChatGroq
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader
from langchain_google_genai import GoogleGenerativeAIEmbeddings

# Set environment variables directly in Colab
os.environ['GOOGLE_API_KEY'] = 'your_google_api_key'
os.environ['GROQ_API_KEY'] = 'your_groq_api_key'

# Initialize the language model globally
llm = ChatGroq(groq_api_key=os.getenv('GROQ_API_KEY'), model_name="Gemma2-9b-it")

prompt = ChatPromptTemplate.from_template(
    """
    Answer the questions based on the provided context only.
    Please provide the most accurate and detailed response based on the question.
    <context>
    {context}
    <context>
    Questions:{input}
    """
)

# Function to load PDF and generate vector embeddings
def load_pdf_and_generate_vectors(pdf_file_path):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    loader = PyPDFLoader(pdf_file_path)
    docs = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    final_documents = text_splitter.split_documents(docs)
    vectors = FAISS.from_documents(final_documents, embeddings)
    print("Vector Store DB is ready.")
    return vectors

# Function to retrieve relevant documents based on the query
def retrieve_documents(vectors, query):
    return vectors.similarity_search(query)

# Function to run the query and generate the output (answer)
def generate_output(vectors, query):
    # Retrieve relevant documents using the query
    docs = retrieve_documents(vectors, query)

    # Combine the retrieved documents with the language model chain
    document_chain = create_stuff_documents_chain(llm, prompt)

    # Prepare the input for the chain
    input_data = {
        'input': query,
        'context': docs  # This is a list of the document chunks returned from similarity search
    }

    # Run the query and get the answer
    response = document_chain.invoke(input_data)

    # Print the generated response
    print("Answer:", response)

# Function to display document similarity search results
def display_similarity_results(docs):
    print("\nDocument Similarity Search:")
    for doc in docs:
        print(doc.page_content)
        print("--------------------------------")

# Load the PDF and generate vectors once
pdf_file_path = '/content/5(Functions).pdf'  # Replace with your uploaded PDF file path
vectors = load_pdf_and_generate_vectors(pdf_file_path)

Vector Store DB is ready.


## Q1. What is a function and its features?

In [19]:
query = "What is a function and its features?"
generate_output(vectors, query)

Answer: A function is a sequence of statements/instructions that performs a particular task.  

Here are its features:

* **Reusability:** Once defined, a function can be used multiple times without rewriting the code.
* **Code Conciseness:** Functions make code shorter and easier to read by encapsulating specific tasks.
* **Modularization:** Functions break down code into smaller, manageable modules, each with a specific purpose.
* **Easy Debugging:**  Errors in functions are easier to find and fix because the code is more organized.


 Functions act like black boxes: they take inputs (parameters), process them, and can optionally return an output value. 



##Answer:
A function is a sequence of statements/instructions that performs a particular task.  

Here are its features:

* **Reusability:** Once defined, a function can be used multiple times without rewriting the code.
* **Code Conciseness:** Functions make code shorter and easier to read by encapsulating specific tasks.
* **Modularization:** Functions break down code into smaller, manageable modules, each with a specific purpose.
* **Easy Debugging:**  Errors in functions are easier to find and fix because the code is more organized.


 Functions act like black boxes: they take inputs (parameters), process them, and can optionally return an output value.

In [20]:
# Retrieve documents and display document similarity results
docs = retrieve_documents(vectors, query)
display_similarity_results(docs)


Document Similarity Search:
F u n c t i o n s
What Is A Function?
A Function is a sequence of statements/instructions
that performs a particular task.
A function is like a black box that can take certain
input(s) as its
parameters
and
can output a value after performing a few operations
on the parameters. A function
is created so that one can use a block of code as
many times as needed just by
using the name of the function.
Why Do We Need Functions?
●
Reusability:
Once a function is deﬁned, it can be
used over and over again.
You can call the function as many times as it is needed.
Suppose you are
required to ﬁnd out the area of a circle for 10 diﬀerent
radii. Now, you can
either write the formula
π
r2
10 times or you can simply
create a function that
takes the value of the radius as an input and returns
the area corresponding
to that radius. This way you would not have to write
the same code (formula)
10 times. You can simply invoke the function every
time.
●
Neat code:
A code conta

## Q2. What do you mean by the scope of a function?

In [21]:
query = "What do you mean by the scope of a function?"
generate_output(vectors, query)

Answer: The scope of a function refers to the region of a program where a variable declared inside that function can be accessed.  Local variables, declared within a function, are only visible and usable within that specific function's code block. They cease to exist once the function has finished executing.  

Global variables, declared outside any function, have a wider scope and can be accessed from anywhere within the program.  


Let me know if you'd like more detail on any aspect of scope!



##Answer:
The scope of a function refers to the region of a program where a variable declared inside that function can be accessed.  Local variables, declared within a function, are only visible and usable within that specific function's code block. They cease to exist once the function has finished executing.  

Global variables, declared outside any function, have a wider scope and can be accessed from anywhere within the program.  

In [22]:
# Retrieve documents and display document similarity results
docs = retrieve_documents(vectors, query)
display_similarity_results(docs)


Document Similarity Search:
scope, it prints the value that it has
been assigned inside the function.
Similarly, when we print the variable outsidefoo()
,
it outputs globalGlobal:5
.
This is called the global scope of the variable and
the value of the global variable
x
is printed.
8
--------------------------------
Creating a Local Variable
We declare a local variable inside a function. Consider
the given function deﬁnition:
deffoo():y ="Local Variable"print(y)foo()
We get the output as:
Local Variable
Accessing A Local Variable Outside The Scope
deffoo():y ="local"foo()print(y)
In the above code, we declared a local variable
y
inside the functionfoo()
, and
then we tried to access it from outside the function.
We get the output as:
NameError: name'y'isnotdefined
We get an error because the lifetime of a local variable
is the function it is deﬁned
in. Outside the function, the variable does not exist
and cannot be accessed. In
other words, a variable cannot be accessed outside
its sco

## Q3. Explain return statement in a function?


In [23]:
query = "Explain return statement in a function?"
generate_output(vectors, query)

Answer: A `return` statement is used to end the execution of a function and send a value back to the code that called the function. 

Here's a breakdown:

* **Stops Function Execution:**  Once a `return` statement is encountered, the function immediately stops running, regardless of any remaining code within its block.

* **Returns a Value:** The `return` statement can optionally include an expression. The value produced by this expression is sent back to the caller of the function.

* **None Return:** If the `return` statement doesn't include an expression, it implicitly returns the special value `None`.


Let me know if you have any other questions about return statements or functions in Python!



## Answer:
 A `return` statement is used to end the execution of a function and send a value back to the code that called the function.

Here's a breakdown:

* **Stops Function Execution:**  Once a `return` statement is encountered, the function immediately stops running, regardless of any remaining code within its block.

* **Returns a Value:** The `return` statement can optionally include an expression. The value produced by this expression is sent back to the caller of the function.

* **None Return:** If the `return` statement doesn't include an expression, it implicitly returns the special value `None`.


Let me know if you have any other questions about return statements or functions in Python!

In [24]:
# Retrieve documents and display document similarity results
docs = retrieve_documents(vectors, query)
display_similarity_results(docs)


Document Similarity Search:
Thereturn
Statement
Areturn
statement is used to end the execution of
the function call and it “returns”
the result (value of the expression following thereturn
keyword) to the caller. The
statements after the return statements are not executed.
If thereturn
statement is
without any expression, then the special valueNone
is returned.
In the example given above, the suma+b
is returned.
Note:
In Python, you need not specify the return type
i.e. the data type of returned
value.
Calling/Invoking A Function
Once you have deﬁned a function, you can call it
from another function, program,
or even the Python prompt. To use a function that
has been deﬁned earlier, you
need to write a
function call
.
A
function call
takes the following form:
<function-name> (<value-to-be-passed-as-argument>)
The function deﬁnition does not execute the function
body. The function gets
executed only when it is called or invoked. To call
the above function we can write:
add(5,7)
In this