# Enhanced RAG System with Embeddings and FAISS

This notebook demonstrates an improved RAG system using sentence embeddings and FAISS for efficient similarity search, along with Gemini for generation.

In [None]:
# Install required dependencies
%pip install -r requirements.txt

## 2. Import Libraries and Setup

In [None]:
import re
import os
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
from google import genai

# Initialize embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

def chunk_text(text, chunk_size=1000, overlap=200):
    """Split text into overlapping chunks using sliding window"""
    return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size - overlap)]

def create_faiss_index(embeddings):
    """Create FAISS index for efficient similarity search"""
    dimension = embeddings.shape[1]
    index = faiss.IndexFlatL2(dimension)
    index.add(embeddings.astype(np.float32))
    return index

## 3. Prepare Document and Generate Embeddings

In [88]:
# Sample document (same as previous)
import requests
from bs4 import BeautifulSoup

# Fetch the HTML content from the URL
url = "https://www.gutenberg.org/cache/epub/11/pg11-images.html"
response = requests.get(url)
html_content = response.text

# Use BeautifulSoup to extract the textual content
soup = BeautifulSoup(html_content, 'html.parser')
for script in soup(["script", "style"]):
    script.decompose()

# Extract text and optionally limit length for performance
document = soup.get_text(separator=" ", strip=True)
# Uncomment the next line to limit text (if needed)
# document = document[:10000]

print("Document fetched and processed.")

# Preprocess text
clean_text = re.sub('\s+', ' ', document).strip()

# Create chunks
chunks = chunk_text(clean_text)
print(f"Document chunks: {len(chunks)}")
print(f"First 1 chunk example: {chunks[0:1]}")

# Generate embeddings
chunk_embeddings = embedding_model.encode(chunks)
print(f"Embedding dimensions: {chunk_embeddings.shape}")
# print(f"FAISS index dimensions: {chunk_embeddings[4]}")

# Create FAISS index
index = create_faiss_index(chunk_embeddings)

Document fetched and processed.
Document chunks: 324
First 1 chunk example: ["Alice’s Adventures in Wonderland | Project Gutenberg The Project Gutenberg eBook of Alice's Adventures in Wonderland This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org . If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. Title : Alice's Adventures in Wonderlan"]
Embedding dimensions: (324, 384)


## 4. Query Processing and Retrieval

In [89]:
def semantic_retrieval(query, index, chunks, top_k=3):
    """Retrieve relevant chunks using semantic similarity"""
    # Encode query
    query_embedding = embedding_model.encode([query])
    
    # Search FAISS index
    indices = index.search(query_embedding.astype(np.float32), top_k)
    print(f"Indices: {indices[0]}")
    
    # Return sorted chunks by relevance
    return [chunks[i] for i in indices[0]]

## 5. Enhanced RAG Workflow

In [99]:
# Initialize Gemini client
from dotenv import load_dotenv


load_dotenv()
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

def semantic_retrieval(query, index, chunks, top_k=3):
    """Retrieve relevant chunks using semantic similarity"""
    # Encode query
    query_embedding = embedding_model.encode([query])
    
    # Search FAISS index
    distances, indices = index.search(query_embedding.astype(np.float32), top_k)
    print(f"Indices: {indices[0]} {distances[0]}")
    
    # Return sorted chunks by relevance
    return [chunks[i] for i in indices[0]]

# Sample query
query = "what did the queen shout at the top of her voice"

# Retrieve relevant context
context_chunks = semantic_retrieval(query, index, chunks)
context = "\n".join(context_chunks)

# Generate response
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=f"""Answer the question based on the following context:
    {context}
    
    Question: {query}
    Answer:"""
)

# Display results
print("Retrieved Context:")
for chunk in context_chunks:
    print(f"- {chunk}...")

print("\nGenerated Answer:")
print(response.text)

Indices: [205 177 255] [1.0280211 1.0399594 1.0416878]
Retrieved Context:
- sense!” said Alice, very loudly and decidedly, and the Queen was silent. The King laid his hand upon her arm, and timidly said “Consider, my dear: she is only a child!” The Queen turned angrily away from him, and said to the Knave “Turn them over!” The Knave did so, very carefully, with one foot. “Get up!” said the Queen, in a shrill, loud voice, and the three gardeners instantly jumped up, and began bowing to the King, the Queen, the royal children, and everybody else. “Leave off that!” screamed the Queen. “You make me giddy.” And then, turning to the rose-tree, she went on, “What have you be...
-  the Dormouse: “not in that ridiculous fashion.” And he got up very sulkily and crossed over to the other side of the court. All this time the Queen had never left off staring at the Hatter, and, just as the Dormouse crossed the court, she said to one of the officers of the court, “Bring me the list of the singers in

## Key Enhancements

1. **Semantic Embeddings**: Uses `all-MiniLM-L6-v2` model for dense vector representations
2. **FAISS Index**: Efficient similarity search for quick retrieval
3. **Contextual Understanding**: Better captures semantic relationships than keyword matching
4. **Scalability**: Can handle larger document collections efficiently

To further improve:
- Experiment with different embedding models
- Add metadata filtering
- Implement hybrid search (dense + sparse)
- Use more sophisticated chunking strategies