<a href="https://colab.research.google.com/github/sahiti3636/AIML_SESSION/blob/main/Module2_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval-Augmented Generation (RAG) — Simple Demo

In this notebook, we build a simple RAG system:
1. Store documents
2. Convert text into embeddings
3. Retrieve relevant context
4. Generate an answer using an LLM (Gemini)

## Install Dependencies



In [None]:
!pip install langchain langchain-google-genai langchain_community langchain_core chromadb sentence-transformers pypdf

Collecting langchain-google-genai
  Downloading langchain_google_genai-4.2.0-py3-none-any.whl.metadata (2.7 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Collecting langchain_community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting chromadb
  Downloading chromadb-1.4.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting pypdf
  Downloading pypdf-6.6.0-py3-none-any.whl.metadata (7.1 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain-google-genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-genai<2.0.0,>=1.56.0 (from langchain-google-genai)
  Downloading google_genai-1.59.0-py3-none-any.whl.metadata (53 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain_com

## Importing necessary libraries

In [None]:
from langchain_google_genai import GoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings

In [None]:
import os
os.environ["GOOGLE_API_KEY"] = "PASTE_YOUR_API_KEY"

## Our Knowledge Base

Upload PDF

In [None]:
from google.colab import files

uploaded = files.upload()

Saving Module2.pdf to Module2 (3).pdf
Saving Analog_Lab_Report_1 (2).pdf to Analog_Lab_Report_1 (2) (3).pdf


Load PDF as Documents

In [None]:
from langchain_community.document_loaders import PyPDFLoader

all_documents = []

for pdf_path in uploaded.keys():
    loader = PyPDFLoader(pdf_path)
    docs = loader.load()
    all_documents.extend(docs)

Split into Chunks

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)

documents = text_splitter.split_documents(all_documents)

## Create Embeddings + Store in Schema

In [None]:
embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

In [None]:
vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embeddings
)

## Retriever (Semantic Search)

In [None]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

In [None]:
query = "What is AC coupling?"

retrieved_docs = retriever.invoke(query)

In [None]:
retrieved_docs

[Document(metadata={'page': 4, 'source': 'Analog_Lab_Report_1 (2) (3).pdf', 'keywords': '', 'title': '', 'page_label': '5', 'total_pages': 6, 'creationdate': '2026-01-17T16:03:00+00:00', 'author': '', 'creator': 'LaTeX with hyperref', 'subject': '', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.27 (TeX Live 2025) kpathsea version 6.4.1', 'moddate': '2026-01-17T16:03:00+00:00', 'producer': 'pdfTeX-1.40.27', 'trapped': '/False'}, page_content='4.2 AC Coupling\nAC coupling was then applied and the resulting waveform is shown alongside. The fre-\nquency and time period remain unchanged; however, the DC offset is completely removed,\ncentering the sine wave about zero volts and thereby altering the values ofV max andV min,\n.\n•V max: 0.84 V\n•V min: -0.74 V\nFigure 6: AC Coupled Sine Wave\n5 Oscilloscope Math Operations\n5.1 Add\nA Traingle wave has been given from the signal generator to channel 1 of the oscilloscope'),
 Document(metadata={'moddate': '2026-01-17T16:03:0

## Gemini LLM

In [None]:
llm = GoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0
)

## Final RAG Answer

In [None]:
context = "\n".join(doc.page_content for doc in retrieved_docs)

prompt = f"""
You are an assistant that answers questions using the given context.

Rules:
- Use ONLY the information in the context.
- Do NOT use outside knowledge.
- If the context is insufficient, ask the user to rephrase the question
  or provide more information.

Answer Guidelines:
- Write a clear and complete explanation.
- Use 3–5 sentences.
- Prefer simple language suitable for a beginner.

Context:
{context}

Question:
{query}

Answer:
"""

In [None]:
response = llm.invoke(prompt)
print(response)

AC coupling is a process applied to a waveform, such as a sine wave or a triangular wave. When AC coupling is used, the DC offset or DC component of the signal is completely removed. This removal causes the waveform to be centered around zero volts. Importantly, the frequency and time period of the signal do not change, although the maximum (Vmax) and minimum (Vmin) voltage values are altered.
