# **Installing Dependencies**

In [None]:
!pip install python-dotenv langchain langchain_community langchain_openai pypdf faiss-cpu langchain-google-genai



1. loading PDFs

2. splitting text

3. creating embeddings

4. storing them in FAISS

5. building a RAG pipeline

6. using Google Gemini models

# **Import all libraries**

In [None]:
from dotenv import load_dotenv
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS,DocArrayInMemorySearch
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

1. ChatGoogleGenerativeAI → talk to Gemini models

2. PyPDFLoader → load PDF pages

3. TextSplitter → break large text into smaller chunks

4. GoogleGenerativeAIEmbeddings → convert text → embeddings

5. FAISS → vector database

6. Runnable* → required to build RAG pipeline pieces

# **Loading API key**

In [None]:
from google.colab import userdata
import os
os.environ['GOOGLE_API_KEY'] = userdata.get('RAGAGENTKEY')
Model = 'gemini-2.5-flash'

1. fetches your Gemini API key stored in Colab

2. saves it as an environment variable

3. selects model gemini-2.5-flash

# **Creating Gemini Chat Model object**

In [None]:
from google.colab import userdata
import os
os.environ['GOOGLE_API_KEY'] = userdata.get('RAGAGENTKEY')
Model = 'gemini-2.5-flash'

model = ChatGoogleGenerativeAI(google_api_key=os.environ['GOOGLE_API_KEY'], model=Model)

# **Testing the model**

In [None]:
model = ChatGoogleGenerativeAI(google_api_key=os.environ['GOOGLE_API_KEY'], model=Model)
model.invoke('what is coding?')

AIMessage(content='Coding, also known as **computer programming**, is essentially the process of **giving a computer a set of instructions to perform a specific task.**\n\nHere\'s a breakdown to make it clearer:\n\n1.  **Computers are Dumb (but Fast!):** Computers are incredibly fast at performing calculations and following instructions, but they have no common sense. They don\'t understand human languages like English, Spanish, or Mandarin.\n2.  **The Need for Instructions:** If you want a computer to do anything—like show you a webpage, play a song, launch an app, or even calculate 2+2—you have to tell it *exactly* what to do, step-by-step, in a language it understands.\n3.  **Programming Languages:** Coders write these instructions using **programming languages**. Just like humans have different spoken languages, computers have different programming languages (e.g., Python, JavaScript, C++, Java, Ruby, Swift, C#, Go). Each language has its own **syntax** (grammar and spelling rules)

# **Adding the string output parser**

In [None]:
parser = StrOutputParser()
chain = model|parser

1. **StrOutputParser()** converts model output to simple text

2. **model | parser** means “first call the model, then parse its output”

# **Testing the model which gives the content as an output**

In [None]:
chain.invoke('what is coding?')

'Coding, at its core, is **the process of giving instructions to a computer in a language it can understand.**\n\nThink of it like this:\n\n*   **You have a task you want the computer to perform.** (e.g., show a webpage, calculate numbers, play a game, control a robot).\n*   **Computers don\'t understand human languages** (like English, Spanish, etc.) directly. They need very specific, step-by-step instructions.\n*   **Coding is writing those step-by-step instructions** using a specialized "programming language" (like Python, JavaScript, Java, C++, etc.).\n*   **These instructions are called "code"** or "source code," and when put together, they form a "program" or "software."\n\n**Here\'s a breakdown of what that means:**\n\n1.  **Writing Instructions:** You write a series of commands, statements, and rules that tell the computer exactly what to do, in what order, and under what conditions.\n    *   *Example:* "If the user clicks this button, then change the color of this text to blue

# **Loading PDF**

In [None]:
file_loader = PyPDFLoader('/content/drive/MyDrive/EDC/_The Art of Electronics 3rd ed [2015].pdf')
page = file_loader.load_and_split()
len(page)

1819

1. loads PDF

2. splits it page-wise

3. prints how many pages were extracted

# **Spliting content into small chunks**

In [None]:
spliter = RecursiveCharacterTextSplitter(chunk_size = 200,chunk_overlap = 50)
pages = spliter.split_documents(page)
pages[3]

Document(metadata={'producer': 'Acrobat Distiller 10.0.0 (Windows)', 'creator': 'Adobe Illustrator CS6 (Macintosh)', 'creationdate': '2016-02-11T14:42:21-05:00', 'author': 'Horowitz & Hill', 'moddate': '2016-03-13T18:04:37-05:00', 'title': 'The Art of Electronics (3rd edition)', 'source': '/content/drive/MyDrive/EDC/_The Art of Electronics 3rd ed [2015].pdf', 'total_pages': 1225, 'page': 1, 'page_label': 'i'}, page_content='both analog and digital, the ﬁrst two editions were translated into eight languages, and sold more than a million copies')

1. Large PDF text is too big for embeddings

2. So we split into tiny chunks of 200 characters

3. Each chunk overlaps by 50 characters to maintain context

4. pages[3] just shows one sample chunk

# **Creating embeddings + FAISS vectorstore**

In [None]:
embedding_model = 'text-embedding-004'
vector_storage = FAISS.from_documents(pages, GoogleGenerativeAIEmbeddings(model=embedding_model, google_api_key=os.environ['GOOGLE_API_KEY']))
retriever = vector_storage.as_retriever()

1. converts each chunk into embeddings

2. stores them inside a FAISS vector database

3. converts FAISS into a retriever (used to fetch relevant chunks during queries)

# **Creating a prompt template**

In [None]:
question_template = """
your a smart bot that answers questions based on the context given to you only.
You don't make things up.
context:{context}
question:{question}

"""

1. using only context

2. avoid hallucinating

# **Testing the prompt formatting**

In [None]:
prompt = PromptTemplate.from_template(template=question_template)
print(prompt.format(context = ' Here is the context to use',
              question = ' Answer this question based on the context'
              ))


your a smart bot that answers questions based on the context given to you only.
You don't make things up.
context: Here is the context to use
question: Answer this question based on the context




# **Building the RAG chain**

In [None]:
result = RunnableParallel(context= retriever, question = RunnablePassthrough())

1. takes user input

2. send it as-is to the "question" field

3. also send it to the retriever to get relevant chunks

# **Pipeline Structure**

In [None]:
chain = result | prompt | model | parser

1. Retriever gets context

2. Prompt formats question + context

3. Model generates answer

4. Parser cleans the text

# **Asking a aimple question PDF related**

In [None]:
chain.invoke('What is  Static and Dynamic resistances?')

'Based on the context provided:\n\n**Dynamic resistance** is included in the specifications of a zener and is a measure of its "regulation" against changes in the driving current provided to it. It is given at a certain current; for example, a zener might have a dynamic resistance of 10 Ω at 10 mA, at its specified zener voltage of 5 V. It is calculated using the changes in the voltages and currents, rather than the steady (dc) values.\n\nThe provided context does not define **Static resistance**.'

1. retrieves chunks from PDF

2. passes context + question to Gemini

3. gives an accurate answer

# **This shows the raw context chunks used by RAG.**

In [None]:
retriever.invoke('What is  Static and Dynamic resistances?')

[Document(id='220fad19-0bdb-4da6-b741-b10e36df47d4', metadata={'producer': 'Acrobat Distiller 10.0.0 (Windows)', 'creator': 'Adobe Illustrator CS6 (Macintosh)', 'creationdate': '2016-02-11T14:42:21-05:00', 'author': 'Horowitz & Hill', 'moddate': '2016-03-13T18:04:37-05:00', 'title': 'The Art of Electronics (3rd edition)', 'source': '/content/drive/MyDrive/EDC/_The Art of Electronics 3rd ed [2015].pdf', 'total_pages': 1225, 'page': 44, 'page_label': '12'}, page_content='given at a certain current. For example, a zener might have\na dynamic resistance of 10 Ω at 10 mA, at its speciﬁed\nzener voltage of 5 V . Using the deﬁnition of dynamic re-'),
 Document(id='54f3faab-0aae-49dd-9fe5-ed757564386a', metadata={'producer': 'Acrobat Distiller 10.0.0 (Windows)', 'creator': 'Adobe Illustrator CS6 (Macintosh)', 'creationdate': '2016-02-11T14:42:21-05:00', 'author': 'Horowitz & Hill', 'moddate': '2016-03-13T18:04:37-05:00', 'title': 'The Art of Electronics (3rd edition)', 'source': '/content/driv