In [1]:
!pip install langchain-community==0.2.4 langchain==0.2.3 faiss-cpu==1.8.0 unstructured==0.14.5 unstructured[pdf]==0.14.5 transformers==4.41.2 sentence-transformers==3.0.1




[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import os

from langchain_community.llms import Ollama
from langchain.document_loaders import UnstructuredFileLoader
from langchain_community.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import RetrievalQA

In [3]:
# loading the LLM
llm = Ollama(
    model="laddo",
    temperature=0
)

In [4]:
# loading the document
loader = UnstructuredFileLoader("PP Unit 2 Tesseract.pdf")
documents = loader.load()

In [5]:
# create document chunks
text_splitter = CharacterTextSplitter(separator="/n",
                                      chunk_size=7500,
                                      chunk_overlap=200)

In [6]:
text_chunks = text_splitter.split_documents(documents)

In [7]:
embeddings = HuggingFaceEmbeddings()

  embeddings = HuggingFaceEmbeddings()
  from tqdm.autonotebook import tqdm, trange


In [8]:
knowledge_base = FAISS.from_documents(text_chunks, embeddings)

In [9]:
# retrieval QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=knowledge_base.as_retriever())

In [10]:
question = "What is this document about?"
response = qa_chain.invoke({"query": question})
print(response["result"])

  This document appears to be a guide on how to use the Message Passing Interface (MPI) for parallel computing, specifically focusing on Cartesian topology support in MPI. The document covers the following topics:

1. Initializing MPI and determining the size and rank of processes in a communicator.
2. Creating a Cartesian communicator using MPI_Cart_create() and defining the dimensions and periodicity of the grid.
3. Retrieving the coordinates of each process in the Cartesian grid using MPI_Cart_coords().
4. Performing point-to-point communication or collective operations specific to your application's grid structure, such as MPI_Send, MPI_Recv, and collective operations like MPI_Allreduce, MPI_Gather, or MPI_Scatter.
5. Freeing the Cartesian communicator after use using MPI_Comm_free().


In [11]:
question = "generate 5 mcqs on Vectorization methods with answers"
response = qa_chain.invoke({"query": question})
print(response["result"])

  Sure! Here are five MCQs related to vectorization methods, along with their answers:

MCQ1: What is the primary advantage of using vectorization techniques in programming?
A. Improved code readability
B. Faster execution time
C. Simplified data manipulation
D. Better memory management
Answer: B. Faster execution time

MCQ2: Which of the following vectorization methods is most suitable for performing matrix operations?
A. Recursion
B. Iteration
C. Looping
D. Function calls
Answer: C. Looping

MCQ3: What is the purpose of using a hash table in vectorization?
A. To store data in a sorted order
B. To reduce memory usage by storing data in an array
C. To perform fast lookups for elements in a large dataset
D. To implement recursion in algorithms
Answer: C. To perform fast lookups for elements in a large dataset

MCQ4: Which of the following vectorization techniques is most efficient for searching an element in a sorted array?
A. Linear search
B. Binary search
C. Iteration through all elem

In [12]:
question = "generate 30 mcqs on Vectorization methods with answers"
response = qa_chain.invoke({"query": question})
print(response["result"])

  Sure! Here are 30 MCQs on vectorization methods, along with their answers:

1. What is the main advantage of using vectorization techniques in programming?
A. Improved code readability
B. Faster execution time
C. Simplified code maintenance
D. Better memory management
Answer: B
2. Which of the following is a key benefit of using array operations in vectorized code?
A. Reduced memory usage
B. Improved data manipulation efficiency
C. Simplified data access patterns
D. Faster execution time for small arrays
Answer: B
3. What is the primary goal of vectorization in programming?
A. To reduce the number of loops
B. To improve code readability
C. To optimize memory usage
D. To simplify data manipulation operations
Answer: A
4. Which of the following techniques can be used to perform element-wise arithmetic operations on arrays?
A. Matrix multiplication
B. Vector addition
C. Array concatenation
D. Loop iteration
Answer: B
5. What is the main advantage of using OpenMP directives in parallel p

In [13]:
question = "generate 10 true/false questions  with answers on Vectorization methods with answers"
response = qa_chain.invoke({"query": question})
print(response["result"])

  Sure! Here are 10 true or false questions related to vectorization methods in parallel computing, along with their answers:

1. True or False: The Parallel Random Access (PRA) method is a vectorization technique that involves dividing the data into smaller chunks and processing them in parallel. (True)
2. True or False: The Data Dependence Graph (DDG) is a graphical representation of the dependencies between variables in a program, which can be used to identify potential parallelism. (False - DDG is actually a technique for analyzing the data dependencies in a program to identify opportunities for parallelization.)
3. True or False: The OpenMP compiler generates optimized code by automatically vectorizing loops. (True)
4. True or False: The MPI_Allreduce() function in MPI is used for collective operations such as summing the values of all processors. (False - MPI_Allreduce() is actually a function for reducing the values of all processors.)
5. True or False: The OpenMP directives #PA

In [14]:
def setup_qa_chain():
    # Replace with actual setup of the retriever and LLM
    # Here we're assuming you have already initialized knowledge base and LLM
    embeddings = HuggingFaceEmbeddings()
    # Assume knowledge_base and qa_chain are created here
    return qa_chain

In [15]:
'''def get_input_values():
    # Get user inputs from the terminal
    num_questions = input("Enter the number of questions: ")
    question_type = input("Enter the question type (e.g., TrueorFalse/FillintheBlanks/Mcqs): ")
    difficulty_level = input("Enter the difficulty level (e.g., Easy/Medium/Hard): ")

    # Construct the query using the inputs
    query = f"Generate {num_questions} {question_type} questions with answers on given pdf of {difficulty_level} difficulty."
    
    # Initialize the QA chain
    qa_chain = setup_qa_chain()

    # Run the query through the QA chain
    response = qa_chain.invoke({"query": query})
    
    # Display the result in the terminal
    print("Generated Questions and Answers:")
    print(response["result"])

# Run the function to get inputs and generate questions
get_input_values()'''

'def get_input_values():\n    # Get user inputs from the terminal\n    num_questions = input("Enter the number of questions: ")\n    question_type = input("Enter the question type (e.g., TrueorFalse/FillintheBlanks/Mcqs): ")\n    difficulty_level = input("Enter the difficulty level (e.g., Easy/Medium/Hard): ")\n\n    # Construct the query using the inputs\n    query = f"Generate {num_questions} {question_type} questions with answers on given pdf of {difficulty_level} difficulty."\n    \n    # Initialize the QA chain\n    qa_chain = setup_qa_chain()\n\n    # Run the query through the QA chain\n    response = qa_chain.invoke({"query": query})\n    \n    # Display the result in the terminal\n    print("Generated Questions and Answers:")\n    print(response["result"])\n\n# Run the function to get inputs and generate questions\nget_input_values()'

In [16]:
question = "generate 10 fill in the blanks with answers on Vectorization methods with answers"
response = qa_chain.invoke({"query": question})
print(response["result"])

  Sure! Here are 10 fill-in-the-blank questions related to vectorization methods, along with their answers:

1. What is the primary advantage of using vectorization techniques in parallel computing?
Answer: Reduced memory usage and improved performance by processing multiple data elements simultaneously.
2. Which method of vectorization involves dividing a problem into smaller subproblems that can be solved independently, then combining their results?
Answer: Divide-and-Conquer Method.
3. What is the main advantage of using parallel loops in vectorization?
Answer: Improved performance by reducing the number of iterations and increasing the processing speed.
4. Which method of vectorization involves dividing a problem into smaller parts, solving each part independently, and then combining their results?
Answer: Parallel Iterative Method.
5. What is the main advantage of using OpenMP directives in vectorization?
Answer: Improved performance by reducing the number of iterations and increa

In [18]:
from langchain.document_loaders import PyPDFLoader
from langchain.llms import Ollama
from fpdf import FPDF
import os

# Function to save response to a PDF
def save_to_pdf(response, output_file):
    pdf = FPDF()
    pdf.set_auto_page_break(auto=True, margin=15)
    pdf.add_page()
    pdf.set_font("Arial", size=12)
    pdf.multi_cell(0, 10, response)
    pdf.output(output_file)
    print(f"Response saved to {output_file}")

# RAG Setup and PDF Saving
def main():
    # Example response for testing purposes
    # In a real scenario, you will be working with your RAG pipeline to get this result.
    result = response["result"]

    # Get current working directory to save the output PDF
    current_directory = os.getcwd()
    print(f"Saving PDF in directory: {current_directory}")

    # Define the output file name with current directory
    output_file = os.path.join(current_directory, "RAG_Response_Vectorization.pdf")

    # Save result to PDF
    save_to_pdf(result, output_file)

# Run the main function
if __name__ == "__main__":
    main()


Saving PDF in directory: c:\Users\Srujana\OneDrive\Desktop\MODEL
Response saved to c:\Users\Srujana\OneDrive\Desktop\MODEL\RAG_Response_Vectorization.pdf
