# Generative AI and Retrieval-Augmented Generation (RAG)

This notebook demonstrates how to implement a RAG pipeline using:

- 🗂 A PDF document
- 📎 Text chunking
- 🔍 Semantic search using embeddings and FAISS
- 🧠 A generative model (FLAN-T5)
- 📩 Querying for context-aware answers





##  Step 1: Install Dependencies

We install all the libraries needed for PDF loading, chunking, embedding, vector storage, and question-answering.

In [64]:
!pip install -q langchain langchain-community transformers sentence-transformers faiss-cpu pypdf

## Step 2: Import Required Libraries

We import LangChain modules for handling PDFs, splitting text, creating embeddings, and storing vectors.
We also import Hugging Face `transformers` to load the FLAN-T5 model for text generation.


In [65]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

## Step 3: Load PDF Document

Here we load the `document.pdf` file using `PyPDFLoader` which converts it into a list of documents.
Upload your PDF first in the file panel on the left in Colab.


In [66]:
loader = PyPDFLoader("document.pdf")
docs = loader.load()
print(f"Total pages loaded: {len(docs)}")


Total pages loaded: 283


## Step 4: Split PDF Content into Chunks

We use `RecursiveCharacterTextSplitter` to divide the document into smaller overlapping chunks.

- `chunk_size=1000`: each chunk has around 1000 characters.
- `chunk_overlap=150`: overlap between chunks for better context continuity.

This is necessary because language models have input length limits.


In [67]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
chunks = splitter.split_documents(docs)
print(f"Total chunks created: {len(chunks)}")



Total chunks created: 738


## Step 5: Generate Embeddings and Create Vector Store

We convert each text chunk into a vector using a Sentence-Transformer model: `all-MiniLM-L6-v2`.

Then, we store these vectors in a FAISS index — a fast similarity search engine.

This allows us to search for semantically similar content later.


In [68]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever()


## Step 6: Load FLAN-T5 Large Language Model

We use Hugging Face’s `google/flan-t5-large`, a text-to-text model that performs well on reasoning and QA tasks.

We wrap it in a `pipeline` for easy querying.


In [69]:
model_name = "google/flan-t5-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
flan_pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer)


Device set to use cpu


## Step 7: Define a RAG Query Function

This function does the following:

1. Takes in a user question.
2. Uses FAISS retriever to find the most relevant chunks (context).
3. Builds a prompt with the context and the question.
4. Passes it to FLAN-T5 to generate an answer.

This is where Retrieval-Augmented Generation happens!


In [70]:
def query_rag(question):
    relevant_docs = retriever.get_relevant_documents(question)
    context = "\n".join([doc.page_content for doc in relevant_docs])
    prompt = f"Answer the question using only the context:\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"

    response = flan_pipeline(
        prompt,
        max_new_tokens=200,
        temperature=0.9,      # Controls creativity
        top_k=50,             # Use top 50 most likely tokens
        top_p=0.9,            # Use tokens with cumulative probability up to 90%
        do_sample=True        # Enables sampling (not greedy)
    )

    return response[0]['generated_text']


## Step 8: Test the RAG Pipeline

Now let's ask a question related to the content of the uploaded document.
Try summarizing, extracting facts, or explaining concepts.


In [71]:
query = "Summarize the key points of this document in a paragraph of 200 words."
answer = query_rag(query)
print(" Answer:\n", answer)


Token indices sequence length is longer than the specified maximum sequence length for this model (803 > 512). Running this sequence through the model will result in indexing errors


 Answer:
 The Art of Invisibility is a book on becoming invisible when spying and surveillance is now the norm.


## Other Examples

In [72]:
top_questions = [
    "What is the main purpose of the book The Art of Invisibility?",
    "Summarize the key privacy principles discussed in the book.",
    "Which tools does the book recommend for anonymous browsing?",
    "How does Kevin Mitnick recommend securing communications and messaging?",
    "What steps should users take when using public Wi-Fi to stay invisible?",
    "What advice does the book offer on creating anonymous identities online?",
    "How are individuals typically tracked online, according to the book?",
    "What does the book reveal about surveillance by corporations or governments?",
    "What operating systems or devices are considered most privacy-focused?",
    "Why is digital privacy important even for people who say they have nothing to hide?"
]

for i, question in enumerate(top_questions, start=1):
    print(f"\n Question {i}: {question}")
    answer = query_rag(question)
    print(" Answer:\n", answer)



 Question 1: What is the main purpose of the book The Art of Invisibility?
 Answer:
 to help educate the world's population on how to protect their personal privacy rights from the overstepping of Big Brother and Big Data

 Question 2: Summarize the key privacy principles discussed in the book.
 Answer:
 encrypt and send a secure e-mail protect your data with good password management hide your true IP address from places you visit obscure your computer from being tracked defend your anonymity

 Question 3: Which tools does the book recommend for anonymous browsing?
 Answer:
 Microsoft’s Internet Explorer and Edge

 Question 4: How does Kevin Mitnick recommend securing communications and messaging?
 Answer:
 using PGP or GPG

 Question 5: What steps should users take when using public Wi-Fi to stay invisible?
 Answer:
 iii.

 Question 6: What advice does the book offer on creating anonymous identities online?
 Answer:
 the Tor browser should always be used to create and access all onli

## Conclusion

This assignment provided hands-on experience in building a complete Retrieval-Augmented Generation (RAG) pipeline using LangChain, Hugging Face Transformers, and FAISS. I successfully implemented each step: loading and chunking a PDF document, generating embeddings using Sentence-Transformers, storing them in a FAISS vector store, and using a generative model (FLAN-T5) to provide context-aware answers.

### Challenges Faced
- **Embedding size limitations**: Handling large document chunks led to memory issues, which were mitigated by tuning `chunk_size` and `chunk_overlap`.
- **Model loading latency**: Pre-trained model loading took time, especially with larger models like FLAN-T5.
- **Prompt design**: It took a few iterations to engineer prompts that elicited accurate, context-grounded answers.

### Remediation Steps
- Used `RecursiveCharacterTextSplitter` with optimized parameters to balance chunk size and retrieval accuracy.
- Leveraged pipeline caching to reduce model loading delays during repeated queries.
- Iteratively refined prompts to improve generation quality and relevance.