**Retrieval Augmented Generation** using FAISS and Gemini 2.0 flash

It is a basic Retrieval-Augmented Generation (RAG) pipeline using:

* **Google Gemini** (gemini-2.0-flash) for answering queries
* **FAISS** as the vector database
* **HuggingFace Embeddings** for vectorizing text
* **LangChain** for text splitting and document management
* **PyPDF2** for PDF reading



-> Install libraries needed for embeddings, vector search (FAISS), PDF reading and Google Gemini (Generative AI).

-> Load a PDF, extracts its text.

-> Split the text into overlapping “chunks”.

-> Embed those chunks with a HuggingFace sentence embedding model.

-> Builds an in-memory FAISS vector index (vector DB) from those chunk embeddings.

-> Accepts a user query, retrieves top relevant chunks, builds a prompt using those chunks as context, and asks Google Gemini to generate an answer.

## ✅ 1. Install Required Packages

In [None]:
!pip install -q langchain_community google-generativeai PyPDF2 langchain_huggingface faiss-cpu

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m70.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m43.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m19.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m38.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.2/45.2 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h

langchain_community: community helpers / integrations for LangChain.

google-generativeai: Google Generative AI SDK (Gemini client).

PyPDF2: to read/extract text from PDFs.

langchain_huggingface: wrapper to use Hugging Face embeddings with LangChain.

faiss-cpu: FAISS index for nearest-neighbor vector search (CPU build).
-q suppresses verbose pip output.

## 🔐 2. Import Libraries & Configure Gemini API

In [None]:
import os
import google.generativeai as genai
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import CharacterTextSplitter
# from dotenv import load_dotenv
from langchain.schema import Document
from PyPDF2 import PdfReader
from langchain_huggingface import HuggingFaceEmbeddings

import warnings
warnings.filterwarnings("ignore")

genai is Google’s Gemini SDK.

FAISS is LangChain’s wrapper around FAISS vectorstore.

CharacterTextSplitter splits long text into fixed-size character chunks (with overlap).

load_dotenv would load .env files (but in this script it’s imported and never used).

Document is LangChain’s Document schema type.

PdfReader reads PDF pages and extract_text() obtains text.

HuggingFaceEmbeddings provides an embeddings wrapper around HF sentence-transformer models.

Suppresses warnings for cleaner notebook output.

In [None]:
# Api Key
# import google.generativeai as genai
from google.colab import userdata

google_api = userdata.get("Google_API")
genai.configure(api_key = google_api)

gemini_model = genai.GenerativeModel('gemini-2.0-flash')

## 🧠 3. Load Embedding Model

If the model is large or takes time to initialize, calling it multiple times in different parts of your code can lead to unnecessary delays. By caching the model in a variable after the first load, subsequent calls will use the already loaded model, thus improving performance.

Wrapping the model loading in a function makes the code cleaner and more modular. If you need to change the model or adjust loading parameters in the future, you can do that in one place without modifying every instance where the model is referenced.

Lazy Loading or Conditioned Loading: If your application requires conditional loading based on certain parameters or environmental conditions, a function allows you to implement that logic easily.

Testing and Debugging

Scalability

In [None]:
# ✅ Creates and returns an embeddings object that wraps the sentence-transformer "all-MiniLM-L6-v2"
def load_embedding_model():
    return HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

embedding_model = load_embedding_model()

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## 📄 4. Read PDF Content

In [None]:
# Read the pdf file

def read_pdf(file_path):
    pdf_reader = PdfReader(file_path)
    text = ""
    for page in pdf_reader.pages:
        text += page.extract_text()
    return text

Opens a PDF.

Extracts text page-by-page using PdfReader.

Returns full text as a string.



In [None]:
# Reading the PDF Uploaded...

text = read_pdf("/content/Investoreye - Sharekhan.pdf")

## 🧩 5. Process Text into Chunks & Vectors

Why Chunking is Needed?

Embedding Limitations

Embedding models (like OpenAI, HuggingFace) have a maximum input length (say 512–4096 tokens).

A large PDF or document won’t fit into a single embedding. Chunking ensures we can process it.

LLMs have a context window limit.

Feeding smaller chunks prevents wasted space and keeps prompts relevant

In [None]:
if text.strip():  #removes leading and trailing whitespace characters (spaces, tabs, newlines)
# Thus if text.strip(): checks whether the extracted text has any real content besides whitespace.
  document = Document(page_content=text) #Wraps the entire text in a Document
  splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200) #splitter object created
  chunks = splitter.split_documents([document]) # split_documents returns a list of Document objects
  #  (each chunk has page_content and optional metadata).
  texts = [chunk.page_content for chunk in chunks] # List of chunks created
  vector_db = FAISS.from_texts(texts, embedding_model)
  #texts = the book pages or chunks of text you want to store.
  # embedding_model = the tool that turns each chunk into a vector (a list of numbers that capture meaning).
  retriever = vector_db.as_retriever()
  #This makes your FAISS index act like a question-answering helper
  # Normally, FAISS only knows how to do similarity search (find closest vectors).
  # as_retriever() wraps it in a Retriever object that LangChain understands

If PDF has text:

* Wrap it in a Document.

* Split the document into chunks of 1000 characters with 200-character overlap.

* Create a list of text chunks (needed for vectorization).

* Convert each chunk to a vector using the embedding model. Store all vectors in FAISS.

* retriever lets you perform similarity search on the vector DB.

## ❓ 6. Ask User for Input

In [None]:
user_query = input("Enter your question:")

Enter your question:Summarize this PDF


## 🔍 7. Retrieve & Generate Answer

In [None]:
if user_query: #If user_query is non-empty

  relevant_docs = retriever.get_relevant_documents(user_query)
  # embeds the query and does a similarity search in FAISS, returning top relevant chunks (using retriever object)

  context = "\n\n".join([doc.page_content for doc in relevant_docs])
  # context joins the retrieved chunk texts (Joining and Building context from all relevant text chunks)

  prompt = f"""You are an expert assistant. Use the context below to answer the query.If unsure, say 'I don't know.'

  Context:{context}
  Query:{user_query}
  Answer:"""

  response = gemini_model.generate_content(prompt)
  print(response.text)
else:

  print("⚠️ No text could be extracted from the PDF. Please upload a readable document.")

This document is an investment report by Sharekhan on Bajaj Finance Ltd, Cholamandalam Investment and Finance Company Ltd and Federal Bank Ltd, dated April 30, 2025.

**Bajaj Finance Ltd:**
*   The report maintains a "Buy" recommendation with an unchanged price target of Rs. 10,500.
*   Net earnings were in line with estimates, AUM growth was strong, but management revised FY26 guidance slightly lower for return ratios and AUM growth, citing a focus on improving credit costs.
*   Key positives include strong AUM growth in specific loan segments and a falling cost-to-income ratio.
*   Key negatives include revised, slightly lower guidance for FY26 and a higher credit cost guidance.

**Cholamandalam Investment and Finance Company Ltd:**
*   The report maintains a "Buy" rating with a revised price target of Rs. 1,720.
*   Net earnings beat estimates due to lower opex and strong AUM growth, despite higher credit costs.
*   AUM growth is expected at 20-25% in FY26.
*   Key positives include