---



# Unit 2: RAG, Vector Stores, and Indexing

## Part 4a: Embeddings & Vector Space

## 1. Introduction: Computers Don't Read English

In [15]:
%pip install python-dotenv --upgrade --quiet langchain langchain-huggingface sentence-transformers langchain-community

In [16]:
from dotenv import load_dotenv
load_dotenv()

import os
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

## 2. Viewing a Vector

In [17]:
vector = embeddings.embed_query("Containers")

print(f"Dimensionality: {len(vector)}")
print(f"First 5 numbers: {vector[:5]}")

Dimensionality: 384
First 5 numbers: [0.03146154433488846, 0.07303042709827423, -0.028095660731196404, 0.02582768350839615, 0.03802919760346413]


## 3. The Math: Cosine Similarity

In [18]:
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

vec_iron = embeddings.embed_query("Iron")
vec_rust = embeddings.embed_query("Rust")
vec_balloon = embeddings.embed_query("Balloon")

print(f"Iron vs Rust: {cosine_similarity(vec_iron, vec_rust):.4f}")
print(f"Iron vs Balloon: {cosine_similarity(vec_iron, vec_balloon):.4f}")

Iron vs Rust: 0.6303
Iron vs Balloon: 0.2926


### Analysis
**Iron & Rust** score higher than **Iron & Balloon**.
This Mathematical Distance is the foundation of all Search engines and RAG systems.

This is arguably the most important concept in modern AI.

---



# Unit 2 - Part 4b: Naive RAG Pipeline

## 1. Introduction: The Open-Book Test

RAG (Retrieval-Augmented Generation)
1.  **Retrieval:** Find the right page in the textbook.
2.  **Generation:** Write the answer using that page.

In [19]:
%pip install python-dotenv --upgrade --quiet faiss-cpu langchain-huggingface sentence-transformers langchain-community langchain-google-genai


In [20]:
from dotenv import load_dotenv
load_dotenv()

import getpass
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_huggingface import HuggingFaceEmbeddings

if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google API Key: ")

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

## 2. The "Knowledge Base" (Grounding)
RAG introduces "non-parametric memory" (external facts).

In [21]:
from langchain_core.documents import Document

docs = [
    Document(page_content="Anisha's favorite food is Donuts with extra chocolate."),
    Document(page_content="The secret password to the lab is 'very_very_secret_give_me_fries123'."),
    Document(page_content="LangChain is a framework for developing applications powered by language models."),
]

## 3. Indexing ( Storing the knowledge)

We use **FAISS** to store the embeddings.

In [22]:
from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever()

## 4. The RAG Chain

In [23]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """
Answer based ONLY on the context below:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

result = chain.invoke("What is the secret password?")
print(result)

The secret password to the lab is 'very_very_secret_give_me_fries123'.


---



# Unit 2 - Part 4c: Deep Dive into Indexing Algorithms

## 1. Introduction: The Scale Problem

**FAISS (Facebook AI Similarity Search)** was built for this.

In [24]:
import faiss
import numpy as np

d = 128
nb = 10000
xb = np.random.random((nb, d)).astype('float32')

## 2. Flat Index (Brute Force)

**Concept:** Check every single item.


In [25]:
index = faiss.IndexFlatL2(d)
index.add(xb)
print(f"Flat Index contains {index.ntotal} vectors")

Flat Index contains 10000 vectors


## 3. IVF (Inverted File Index)

In [26]:
nlist = 100
quantizer = faiss.IndexFlatL2(d)
index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist)

index_ivf.train(xb)
index_ivf.add(xb)

## 4. HNSW (Hierarchical Navigable Small World)

In [27]:
M = 16
index_hnsw = faiss.IndexHNSWFlat(d, M)
index_hnsw.add(xb)

## 5. PQ (Product Quantization)

In [28]:
m = 8
index_pq = faiss.IndexPQ(d, m, 8)
index_pq.train(xb)
index_pq.add(xb)
print("PQ Compression complete. RAM usage minimized.")

PQ Compression complete. RAM usage minimized.


In [29]:
k = 5
q = np.random.random((1, d)).astype('float32')

# Query Flat Index
D_flat, I_flat = index.search(q, k)
print(f"\nFlat Index Query Results (Top {k}):\nDistances: {D_flat}\nIndices: {I_flat}")

# Query IVF Index
index_ivf.nprobe = 10
D_ivf, I_ivf = index_ivf.search(q, k)
print(f"\nIVF Index Query Results (Top {k}):\nDistances: {D_ivf}\nIndices: {I_ivf}")

# Query HNSW Index
D_hnsw, I_hnsw = index_hnsw.search(q, k)
print(f"\nHNSW Index Query Results (Top {k}):\nDistances: {D_hnsw}\nIndices: {I_hnsw}")

# Query PQ Index
D_pq, I_pq = index_pq.search(q, k)
print(f"\nPQ Index Query Results (Top {k}):\nDistances: {D_pq}\nIndices: {I_pq}")


Flat Index Query Results (Top 5):
Distances: [[14.705984 14.87966  15.574528 15.587572 15.596102]]
Indices: [[9810 4764 8375 6623 1660]]

IVF Index Query Results (Top 5):
Distances: [[14.705984 14.87966  15.587572 15.90209  16.149097]]
Indices: [[9810 4764 6623 2111 2833]]

HNSW Index Query Results (Top 5):
Distances: [[14.705984  14.879661  15.587573  15.596102  15.6201935]]
Indices: [[9810 4764 6623 1660 2215]]

PQ Index Query Results (Top 5):
Distances: [[11.727423 11.803116 11.875454 11.919543 12.28454 ]]
Indices: [[4990 4989 1660 2970 6616]]
