
# Chapter 5: Vector Embeddings and Retrieval

This notebook explores:
- What are embeddings and how they work
- How to generate embeddings using pre-trained models
- Vector databases and similarity search
- Retrieval-Augmented Generation (RAG) pipelines

## Learning Objectives

- Generate vector embeddings from text using Hugging Face or Sentence Transformers
- Use cosine similarity for semantic comparison
- Index and retrieve using FAISS and ChromaDB
- Build a basic Retrieval-Augmented Generation (RAG) pipeline



## Generating Text Embeddings

We'll use SentenceTransformers to generate embeddings from a list of sentences.


In [None]:

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = ["AI is transforming the world.", "Artificial Intelligence is the future.", "Bananas are yellow."]

embeddings = model.encode(sentences, convert_to_tensor=True)
print("Embedding shape:", embeddings.shape)

# Cosine similarity
cos_sim = util.pytorch_cos_sim(embeddings[0], embeddings)
print("Cosine similarity matrix:\n", cos_sim)



## Similarity Search using FAISS

FAISS is a library for efficient similarity search and clustering of dense vectors.


In [None]:

import faiss
import numpy as np

# Convert to numpy
emb_np = embeddings.cpu().detach().numpy().astype('float32')

# Create FAISS index
index = faiss.IndexFlatL2(emb_np.shape[1])
index.add(emb_np)

query = model.encode(["AI innovations"], convert_to_tensor=True).cpu().detach().numpy().astype('float32')
D, I = index.search(query, k=2)

print("Most similar sentences:")
for idx in I[0]:
    print("-", sentences[idx])



## Using ChromaDB for Persistent Storage

ChromaDB supports storing and querying documents with metadata.


In [None]:

import chromadb
from chromadb.config import Settings

chroma_client = chromadb.Client(Settings(allow_reset=True))
chroma_client.reset()  # clear previous

collection = chroma_client.create_collection(name="genai_docs")
collection.add(
    documents=["Generative AI is powerful.", "LangChain helps chain LLM calls."],
    ids=["doc1", "doc2"],
    metadatas=[{"type": "intro"}, {"type": "tool"}]
)

results = collection.query(query_texts=["What is LangChain?"], n_results=1)
print("Top result:", results["documents"][0][0])



## Retriever-Augmented Generation (RAG)

Use retrieved documents to answer a query using a language model.


In [None]:

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

# Build vector DB
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
texts = ["Vector databases power semantic search.", "FAISS is fast and memory efficient."]
db = FAISS.from_texts(texts, embedding_model)

retriever = db.as_retriever()
# Uncomment below line if using OpenAI or LangChain-compatible model
# qa_chain = RetrievalQA.from_chain_type(llm=OpenAI(), retriever=retriever)
# print(qa_chain.run("What is FAISS used for?"))
print("RAG setup done — replace OpenAI with a local model for full pipeline.")



## Exercises

1. Add more documents and test different queries using FAISS and ChromaDB.
2. Try using cosine similarity instead of L2 distance for FAISS.
3. Integrate OpenAI, Cohere, or Hugging Face models into the RAG pipeline.
4. Visualize embedding space using t-SNE or PCA.

## References

- Sentence Transformers: https://www.sbert.net
- FAISS: https://github.com/facebookresearch/faiss
- ChromaDB: https://docs.trychroma.com
- LangChain RAG: https://docs.langchain.com/docs/use_cases/question_answering
