# RAG application built on gemini 

In [1]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("Introduction to Software Testing.pdf")
data = loader.load()  # entire PDF is loaded as a single Document
#data

In [2]:
len(data)

346

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split data
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
docs = text_splitter.split_documents(data)


print("Total number of documents: ",len(docs))

Total number of documents:  1161


In [4]:
docs[7]

Document(metadata={'producer': 'Acrobat Distiller Server 6.0.1 (Sparc Solaris, Built: 2003-11-03)', 'creator': 'dvips(k) 5.95a Copyright 2005 Radical Eye Software', 'creationdate': '2007-12-06T02:28:52+05:30', 'author': 'Paul Ammann and Jeff Offutt', 'moddate': '2008-11-03T16:17:44-08:00', 'title': 'INTRODUCTION TO: SOFTWARE TESTING', 'source': 'Introduction to Software Testing.pdf', 'total_pages': 346, 'page': 6, 'page_label': '7'}, page_content='introtest CUUS047-Ammann ISBN 9780521880381 December 6, 2007 2:42 Char Count= 0\nContents\nList of Figures page ix\nList of Tables xiii\nPreface xv\nPart 1 Overview 1\n1 Introduction 3\n1.1 Activities of a Test Engineer 4\n1.1.1 Testing Levels Based on Software Activity 5\n1.1.2 Beizer’s Testing Levels Based on Test Process\nMaturity 8\n1.1.3 Automation of Test Activities 10\n1.2 Software Testing Limitations and Terminology 11\n1.3 Coverage Criteria for Testing 16\n1.3.1 Infeasibility and Subsumption 20\n1.3.2 Characteristics of a Good Covera

In [5]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

from dotenv import load_dotenv
load_dotenv() 

#Get an API key: 
# Head to https://ai.google.dev/gemini-api/docs/api-key to generate a Google AI API key. Paste in .env file

# Embedding models: https://python.langchain.com/v0.1/docs/integrations/text_embedding/

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector = embeddings.embed_query("hello, world!")
vector[:5]
#vector

[0.05168594419956207,
 -0.030764883384108543,
 -0.03062233328819275,
 -0.02802734263241291,
 0.01813093200325966]

In [6]:
vectorstore = Chroma.from_documents(documents=docs, embedding=GoogleGenerativeAIEmbeddings(model="models/embedding-001"))

In [7]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 10})

retrieved_docs = retriever.invoke("What is new in yolov9?")


In [8]:
len(retrieved_docs)

10

In [9]:
print(retrieved_docs[5].page_content)

callee. Each class is given alevel in the yo-yo graph that shows the actual calls made
if an object has the actual type of that level. Bold arrows are actual calls and light
arrows are calls that cannot be made due to overriding.
Consider the inheritance hierarchy from Figure 7.2. Assume that inA’s imple-
mentation, d() calls g(), g() calls h(), h() calls i(), and i() calls j(). Further, assume
that inB’s implementation,h() calls i(), i() calls its parent’s (that is,A’s) version of
i(), andk() calls l(). Finally, assume that inC’s implementation,i() calls its parent’s
(this timeB’s) version ofi(), andj() calls k().
Figure 7.3 is a yo-yo graph of this situation and illustrates theactual sequence
of calls if a call is made tod() through an instance of actual typeA, B, andC.T h e
top level of the graph assumes that a call is made to methodd() through an object
of actual typeA. This sequence of calls is simple and straightforward. The second


In [10]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro",temperature=0.3, max_tokens=500)

In [15]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an AI assistant for an online quiz system. "
    "Use the provided context to generate multiple-choice questions (MCQs) along with correct answers. "
    "Each question should have four options, with one correct answer clearly identified. "
    "Ensure the questions are clear, relevant, and engaging."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)


In [16]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [18]:
response = rag_chain.invoke({"input": "generate questions about waterfall method "})
print(response["answer"])

1. Which software development model is characterized by a sequential and linear progression through distinct phases, with each phase being completed before the next one begins?
    a) Agile
    b) Waterfall
    c) Spiral
    d) Iterative

    **Correct Answer: b) Waterfall**

2.  A key disadvantage of the Waterfall model is its:
    a) Flexibility to adapt to changing requirements
    b) Rigorous documentation at each phase
    c) Difficulty in accommodating changes after a phase is complete
    d) Early detection of defects through continuous testing

    **Correct Answer: c) Difficulty in accommodating changes after a phase is complete**

3. In the Waterfall model, which phase typically follows the design phase?
    a) Requirements gathering
    b) Testing
    c) Implementation
    d) Deployment

    **Correct Answer: c) Implementation**

4. The Waterfall model is most suitable for projects where:
    a) Requirements are constantly evolving
    b) Rapid prototyping is essential
    c