# Day 3: Vector Embeddings, Vector DBs, and RAG Basics

**Learning Goals:**
- Understand what embeddings are and why they matter
- Learn about vector databases (ChromaDB, FAISS)
- Build your first RAG (Retrieval Augmented Generation) system
- Implement semantic search over documents

**Time:** 2-3 hours

---

## What We'll Build Today

1. **Embeddings 101**: Convert text to vectors
2. **Vector Store**: Store and search embeddings with ChromaDB
3. **Document Loading**: Ingest and split documents
4. **Simple RAG**: Query your own documents using LLM + retrieval
5. **Semantic Search**: Find relevant info without exact keyword matches

## Part 1: Environment Setup

Load API keys and initialize the LLM (same as previous days).

In [1]:
import os
from dotenv import load_dotenv

load_dotenv(override=True)

OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
LANGCHAIN_API_KEY  = os.getenv("LANGCHAIN_API_KEY")
LANGCHAIN_TRACING  = os.getenv("LANGCHAIN_TRACING_V2", "false") == "true"

print("‚úÖ OpenRouter key loaded" if OPENROUTER_API_KEY else "‚ö†Ô∏è  Missing OPENROUTER_API_KEY")
print("‚úÖ LangSmith tracing enabled" if LANGCHAIN_TRACING else "‚ÑπÔ∏è  LangSmith tracing disabled")

‚úÖ OpenRouter key loaded
‚úÖ LangSmith tracing enabled


In [2]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="openai/gpt-3.5-turbo",
    openai_api_key=OPENROUTER_API_KEY,
    openai_api_base="https://openrouter.ai/api/v1",
    temperature=0.7,
)

print("‚úÖ LLM ready")

‚úÖ LLM ready


## Part 2: Understanding Embeddings

**What are embeddings?**
- Embeddings convert text into numerical vectors (arrays of numbers)
- Similar meanings = similar vectors (close in vector space)
- Enable semantic search: find "king" when searching for "monarch"

**Why do we need them?**
- Traditional search: exact keyword matching ("apple" won't find "fruit")
- Semantic search: understands meaning and context
- Foundation of RAG systems

### 2a. Creating Embeddings

Let's create embeddings for some text using OpenAI's embedding model.

In [3]:
from langchain_openai import OpenAIEmbeddings

# Initialize embedding model
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    openai_api_key=OPENROUTER_API_KEY,
    openai_api_base="https://openrouter.ai/api/v1",
)

# Create embeddings for sample texts
texts = [
    "The cat sleeps on the couch",
    "A feline rests on the sofa",
    "The dog plays in the yard",
    "Python is a programming language",
]

# Generate embeddings
text_embeddings = embeddings.embed_documents(texts)

print(f"Number of texts: {len(text_embeddings)}")
print(f"Embedding dimensions: {len(text_embeddings[0])}")
print(f"\nFirst embedding (truncated): {text_embeddings[0][:10]}...")

Number of texts: 4
Embedding dimensions: 1536

First embedding (truncated): [-0.05894222483038902, -0.03949085623025894, -0.04609081521630287, 0.039207689464092255, 0.035374049097299576, 0.010956370271742344, 0.018776126205921173, 0.026748355478048325, 0.019244439899921417, -0.04761555790901184]...


### 3c. Deleting Data

If you need to clear your vector database or delete specific documents:

1. **Delete the entire collection**: Clears everything in that specific collection.
2. **Delete specific documents**: Use document IDs to remove specific entries.
3. **Delete persistent storage**: If you used a `persist_directory`, you can simply delete the folder from your file system.


In [None]:
# 1. Delete the entire collection
# vectorstore.delete_collection()

# 2. Delete specific documents (if you have their IDs)
# vectorstore.delete(ids=["id1", "id2"])

# 3. To completely reset ChromaDB (manually delete the folder):
import shutil
import os

# Paths we used in this notebook
db_paths = ["./chroma_db", "./chroma_db_notes", "./chroma_db_filtered", "./chroma_db_complete"]

def reset_dbs():
    for path in db_paths:
        if os.path.exists(path):
            shutil.rmtree(path)
            print(f"Deleted {path}")
        else:
            print(f"{path} not found")

# Uncomment to reset all databases in this notebook
# reset_dbs()


### 2b. Measuring Similarity

Let's see how similar these embeddings are using cosine similarity.
Similar texts should have higher similarity scores.

## Part 3: Vector Databases

**What is a vector database?**
- Specialized database for storing and searching embeddings
- Optimized for similarity search (find nearest neighbors)
- Popular options: ChromaDB, FAISS, Pinecone, Weaviate

**Today we'll use ChromaDB** (simple, local, no setup required)

### 3a. Install ChromaDB

First, let's install the required package.

In [4]:
# Install chromadb (run once)
# !pip install chromadb

import chromadb
print(f"‚úÖ ChromaDB version: {chromadb.__version__}")

‚úÖ ChromaDB version: 1.5.0


### 3b. Create a Vector Store

Let's create a vector store and add some documents.

In [5]:
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document

# Sample documents about different topics
documents = [
    Document(
        page_content="LangChain is a framework for developing applications powered by language models.",
        metadata={"source": "langchain_docs", "topic": "frameworks"}
    ),
    Document(
        page_content="Vector databases store embeddings and enable semantic search capabilities.",
        metadata={"source": "vector_db_guide", "topic": "databases"}
    ),
    Document(
        page_content="RAG stands for Retrieval Augmented Generation, combining retrieval with LLMs.",
        metadata={"source": "rag_tutorial", "topic": "rag"}
    ),
    Document(
        page_content="Python is a high-level programming language known for its simplicity.",
        metadata={"source": "python_intro", "topic": "programming"}
    ),
    Document(
        page_content="Machine learning models can be fine-tuned on specific datasets to improve performance.",
        metadata={"source": "ml_guide", "topic": "machine_learning"}
    ),
]

# Create vector store with persistent storage
vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embeddings,
    collection_name="day3_collection",
    persist_directory="./chroma_db"  # This is where the data will be saved
)

print(f"‚úÖ Vector store created and persisted to ./chroma_db")


‚úÖ Vector store created and persisted to ./chroma_db


### 3c. Semantic Search

Now let's search for relevant documents using natural language queries.

In [6]:
# Search for relevant documents
query = "What is a framework for building AI applications?"

results = vectorstore.similarity_search(query, k=3)

print(f"Query: '{query}'\n")
print("Top 3 Results:")
print("=" * 60)

for i, doc in enumerate(results, 1):
    print(f"\n{i}. {doc.page_content}")
    print(f"   Source: {doc.metadata['source']} | Topic: {doc.metadata['topic']}")

Query: 'What is a framework for building AI applications?'

Top 3 Results:

1. LangChain is a framework for developing applications powered by language models.
   Source: langchain_docs | Topic: frameworks

2. Machine learning models can be fine-tuned on specific datasets to improve performance.
   Source: ml_guide | Topic: machine_learning

3. Vector databases store embeddings and enable semantic search capabilities.
   Source: vector_db_guide | Topic: databases


### 3d. Search with Scores

Let's see the similarity scores to understand how relevant each result is.

In [7]:
# Search with similarity scores
query = "How do I store vectors?"

results_with_scores = vectorstore.similarity_search_with_score(query, k=3)

print(f"Query: '{query}'\n")
print("Results with Similarity Scores:")
print("=" * 60)

for i, (doc, score) in enumerate(results_with_scores, 1):
    print(f"\n{i}. Score: {score:.4f}")
    print(f"   {doc.page_content}")
    print(f"   Source: {doc.metadata['source']}")

Query: 'How do I store vectors?'

Results with Similarity Scores:

1. Score: 0.9301
   Vector databases store embeddings and enable semantic search capabilities.
   Source: vector_db_guide

2. Score: 1.5569
   RAG stands for Retrieval Augmented Generation, combining retrieval with LLMs.
   Source: rag_tutorial

3. Score: 1.6893
   Machine learning models can be fine-tuned on specific datasets to improve performance.
   Source: ml_guide


In [None]:
# Creating a collection with Cosine Similarity
vectorstore_cosine = Chroma.from_documents(
    documents=documents,
    embedding=embeddings,
    collection_name="cosine_collection",
    persist_directory="./chroma_db_cosine",
    collection_metadata={"hnsw:space": "cosine"} # <--- This tells Chroma to use Cosine
)

# Test search with cosine
query = "How do I store vectors?"
results_cosine = vectorstore_cosine.similarity_search_with_score(query, k=3)

print(f"Query: '{query}' (using Cosine Similarity)")
print("=" * 60)
for i, (doc, score) in enumerate(results_cosine, 1):
    # Note: In Chroma's cosine path, it returns 1.0 - similarity 
    # so LOWER is still BETTER (closer to 0.0)
    print(f"{i}. Score: {score:.4f} | {doc.page_content[:60]}...")


### 3e. Understanding Distance Metrics

By default, Chroma uses **Squared L2 (Euclidean) distance** where:
- **Lower score = More similar**
- **0.0** is an exact match.

If you prefer **Cosine Similarity** (where you might expect higher scores for similarity), you have to tell Chroma when you create the collection using `collection_metadata`.

**Supported spaces:**
- `l2`: Squared L2 (default)
- `ip`: Inner Product
- `cosine`: Cosine Similarity

Here's how to initialize with Cosine Similarity:


## Part 4: Document Loading and Text Splitting

Real-world RAG systems need to handle long documents. Let's learn how to:
1. Load documents from various sources
2. Split them into manageable chunks
3. Store them in a vector database

### 4a. Text Splitting Basics

Why split documents?
- LLMs have token limits
- Smaller chunks = more precise retrieval
- Better context management

In [8]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Sample long text
long_text = """
LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). 
It provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.

The main value propositions of LangChain are:
1. Components: Abstractions for working with LMs, along with a collection of implementations for each abstraction.
2. Off-the-shelf chains: Structured assemblies of components for accomplishing specific higher-level tasks.

LangChain makes it easy to build complex applications by chaining together different components. 
For example, you can chain together a prompt template, an LLM, and an output parser to create a simple question-answering system.

Memory is another crucial component that allows your application to remember previous interactions.
This is essential for chatbots and conversational AI applications.
"""

# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,        # Max characters per chunk
    chunk_overlap=50,      # Overlap between chunks (preserves context)
    length_function=len,
)

# Split the text
chunks = text_splitter.create_documents([long_text])

print(f"Original text length: {len(long_text)} characters")
print(f"Number of chunks: {len(chunks)}\n")

for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i}: {len(chunk.page_content)} chars")
    print(f"{chunk.page_content[:100]}...\n")


Original text length: 911 characters
Number of chunks: 7

Chunk 1: 110 chars
LangChain is a framework designed to simplify the creation of applications using large language mode...

Chunk 2: 130 chars
It provides a standard interface for chains, lots of integrations with other tools, and end-to-end c...

Chunk 3: 160 chars
The main value propositions of LangChain are:
1. Components: Abstractions for working with LMs, alon...

Chunk 4: 107 chars
2. Off-the-shelf chains: Structured assemblies of components for accomplishing specific higher-level...

Chunk 5: 96 chars
LangChain makes it easy to build complex applications by chaining together different components....

Chunk 6: 129 chars
For example, you can chain together a prompt template, an LLM, and an output parser to create a simp...

Chunk 7: 166 chars
Memory is another crucial component that allows your application to remember previous interactions.
...



### 4b. Loading Documents from Files

LangChain provides loaders for many file types. Let's create a sample text file and load it.

In [9]:
# Create a sample document
sample_doc = """
# Personal AI Learning Journey - Week 1

## Day 1: Python and LangChain Basics
Today I learned the fundamentals of Python coming from a JavaScript background. 
Key concepts included snake_case naming, list comprehensions, and dictionary operations.
I built my first LangChain application - a simple chatbot using ChatOpenAI.

## Day 2: Chains and Memory
Explored different chain types including sequential chains and router chains.
Learned about prompt templates and how to use them effectively.
Implemented conversation memory using ConversationBufferMemory and MessageHistory.

## Day 3: Vector Embeddings and RAG
Understanding embeddings and vector databases.
Building a retrieval augmented generation system.
Learning to work with ChromaDB for semantic search.
"""

# Save to file
with open("../docs/learning_notes.txt", "w") as f:
    f.write(sample_doc)

print("‚úÖ Sample document created at ../docs/learning_notes.txt")

‚úÖ Sample document created at ../docs/learning_notes.txt


In [10]:
from langchain_community.document_loaders import TextLoader

# Load the document
loader = TextLoader("../docs/learning_notes.txt")
docs = loader.load()

print(f"Loaded {len(docs)} document(s)")
print(f"\nDocument content preview:")
print(docs[0].page_content[:200], "...\n")
print(f"Metadata: {docs[0].metadata}")

Loaded 1 document(s)

Document content preview:

# Personal AI Learning Journey - Week 1

## Day 1: Python and LangChain Basics
Today I learned the fundamentals of Python coming from a JavaScript background. 
Key concepts included snake_case naming ...

Metadata: {'source': '../docs/learning_notes.txt'}


### 4c. Split and Store in Vector Database

Now let's split the loaded document and store it in our vector database.

In [11]:
# Split the loaded document
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
)

split_docs = text_splitter.split_documents(docs)

print(f"Split into {len(split_docs)} chunks\n")

# Create a new vector store with these documents
vectorstore_notes = Chroma.from_documents(
    documents=split_docs,
    embedding=embeddings,
    collection_name="learning_notes",
    persist_directory="./chroma_db_notes"
)

print("‚úÖ Documents stored in ./chroma_db_notes")


Split into 4 chunks

‚úÖ Documents stored in ./chroma_db_notes


## Part 5: Building a Simple RAG System

Now for the exciting part! Let's build a complete RAG system that:
1. Retrieves relevant documents based on a query
2. Passes them to an LLM as context
3. Generates an answer grounded in those documents

### 5a. Basic RAG Chain

We'll use LangChain's RetrievalQA chain to build our RAG system.

In [13]:
from langchain_classic.chains import RetrievalQA

# Create a retriever from our vector store
retriever = vectorstore_notes.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 relevant chunks
)

# Create RAG chain
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # "stuff" means: stuff all docs into prompt
    retriever=retriever,
    return_source_documents=True,  # Include sources in response
)

print("‚úÖ RAG chain created")

‚úÖ RAG chain created


### 5b. Ask Questions!

Let's ask questions about our learning journey.

In [14]:
# Ask a question
query = "What did I learn on Day 2?"

result = rag_chain.invoke({"query": query})

print(f"Question: {query}\n")
print(f"Answer: {result['result']}\n")
print("=" * 60)
print("Source Documents:")
for i, doc in enumerate(result['source_documents'], 1):
    print(f"\n{i}. {doc.page_content[:150]}...")

Question: What did I learn on Day 2?

Answer: On Day 2, you explored different chain types such as sequential chains and router chains. You also learned about prompt templates and how to use them effectively. Additionally, you implemented conversation memory using ConversationBufferMemory and MessageHistory.

Source Documents:

1. ## Day 2: Chains and Memory
Explored different chain types including sequential chains and router chains.
Learned about prompt templates and how to us...

2. ## Day 1: Python and LangChain Basics
Today I learned the fundamentals of Python coming from a JavaScript background. 
Key concepts included snake_cas...

3. ## Day 3: Vector Embeddings and RAG
Understanding embeddings and vector databases.
Building a retrieval augmented generation system.
Learning to work ...


In [15]:
# Try another question
query = "What is LangChain used for?"

result = rag_chain.invoke({"query": query})

print(f"Question: {query}\n")
print(f"Answer: {result['result']}\n")
print("=" * 60)
print("Source Documents:")
for i, doc in enumerate(result['source_documents'], 1):
    print(f"\n{i}. {doc.page_content[:150]}...")

Question: What is LangChain used for?

Answer: LangChain is used for building conversational AI applications and systems. It allows developers to create chatbots, virtual assistants, and other natural language processing applications by utilizing different chain types, memory features, vector embeddings, and retrieval augmented generation systems. The framework provides tools for processing language input, generating responses, and managing conversation flow effectively.

Source Documents:

1. ## Day 1: Python and LangChain Basics
Today I learned the fundamentals of Python coming from a JavaScript background. 
Key concepts included snake_cas...

2. ## Day 2: Chains and Memory
Explored different chain types including sequential chains and router chains.
Learned about prompt templates and how to us...

3. ## Day 3: Vector Embeddings and RAG
Understanding embeddings and vector databases.
Building a retrieval augmented generation system.
Learning to work ...


### 5c. Custom RAG with Prompt Template

Let's build a more customized RAG system with a custom prompt.

In [16]:
from langchain_core.prompts import PromptTemplate
from langchain_classic.chains import RetrievalQA

# Custom prompt template
template = """You are a helpful AI assistant answering questions about a learning journey.

Use the following pieces of context to answer the question. 
If you don't know the answer based on the context, say so - don't make things up.
Always cite which day or topic the information comes from.

Context:
{context}

Question: {question}

Answer (be specific and cite sources):"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"]
)

# Create RAG chain with custom prompt
custom_rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt}
)

print("‚úÖ Custom RAG chain created")

‚úÖ Custom RAG chain created


In [17]:
# Test custom RAG
query = "What topics are covered in the first three days?"

result = custom_rag_chain.invoke({"query": query})

print(f"Question: {query}\n")
print(f"Answer: {result['result']}\n")

Question: What topics are covered in the first three days?

Answer: Day 1 covered Python basics such as snake_case naming, list comprehensions, and dictionary operations. Additionally, the day included building a simple chatbot using ChatOpenAI.

Day 2 covered different chain types like sequential chains and router chains, prompt templates, and implementing conversation memory using ConversationBufferMemory and MessageHistory.

Day 3 covered vector embeddings, building a retrieval augmented generation system, and working with ChromaDB for semantic search.



## Part 6: Advanced Vector Store Features

Let's explore some advanced features of vector stores.

### 6a. Metadata Filtering

Search only within specific metadata categories.

In [18]:
# Recreate our technical docs vectorstore
tech_documents = [
    Document(
        page_content="LangChain is a framework for developing applications powered by language models.",
        metadata={"source": "langchain_docs", "topic": "frameworks", "difficulty": "beginner"}
    ),
    Document(
        page_content="Vector databases store embeddings and enable semantic search capabilities.",
        metadata={"source": "vector_db_guide", "topic": "databases", "difficulty": "intermediate"}
    ),
    Document(
        page_content="RAG stands for Retrieval Augmented Generation, combining retrieval with LLMs.",
        metadata={"source": "rag_tutorial", "topic": "rag", "difficulty": "intermediate"}
    ),
    Document(
        page_content="Advanced RAG techniques include hypothetical document embeddings and multi-query retrieval.",
        metadata={"source": "rag_advanced", "topic": "rag", "difficulty": "advanced"}
    ),
]

vectorstore_filtered = Chroma.from_documents(
    documents=tech_documents,
    embedding=embeddings,
    collection_name="filtered_collection",
    persist_directory="./chroma_db_filtered"
)

print("‚úÖ Vector store with metadata created at ./chroma_db_filtered")


‚úÖ Vector store with metadata created at ./chroma_db_filtered


### 6b. Maximum Marginal Relevance (MMR)

MMR balances relevance with diversity - prevents returning very similar documents.

In [19]:
# Search with metadata filter
query = "Tell me about RAG"

# Only search intermediate level documents
results = vectorstore_filtered.similarity_search(
    query,
    k=3,
    filter={"difficulty": "intermediate"}
)

print(f"Query: '{query}' (filtered by difficulty=intermediate)\n")
print("Results:")
for i, doc in enumerate(results, 1):
    print(f"\n{i}. {doc.page_content}")
    print(f"   Metadata: {doc.metadata}")

Query: 'Tell me about RAG' (filtered by difficulty=intermediate)

Results:

1. RAG stands for Retrieval Augmented Generation, combining retrieval with LLMs.
   Metadata: {'source': 'rag_tutorial', 'topic': 'rag', 'difficulty': 'intermediate'}

2. Vector databases store embeddings and enable semantic search capabilities.
   Metadata: {'source': 'vector_db_guide', 'topic': 'databases', 'difficulty': 'intermediate'}


In [20]:
# Search with MMR
query = "What are frameworks for AI?"

# Standard similarity search
print("Standard Similarity Search:")
similarity_results = vectorstore_filtered.similarity_search(query, k=3)
for i, doc in enumerate(similarity_results, 1):
    print(f"{i}. {doc.page_content[:80]}...")

print("\n" + "=" * 60)

# MMR search (more diverse results)
print("\nMMR Search (diverse results):")
mmr_results = vectorstore_filtered.max_marginal_relevance_search(
    query, 
    k=3,
    fetch_k=10  # Fetch more candidates, then diversify
)
for i, doc in enumerate(mmr_results, 1):
    print(f"{i}. {doc.page_content[:80]}...")

Standard Similarity Search:
1. Advanced RAG techniques include hypothetical document embeddings and multi-query...
2. LangChain is a framework for developing applications powered by language models....
3. Vector databases store embeddings and enable semantic search capabilities....


MMR Search (diverse results):
1. Advanced RAG techniques include hypothetical document embeddings and multi-query...
2. LangChain is a framework for developing applications powered by language models....
3. Vector databases store embeddings and enable semantic search capabilities....


## Part 7: Complete RAG Application

Let's build a complete RAG application that can handle multiple document sources.

In [21]:
# Create a comprehensive knowledge base
knowledge_base = [
    # LangChain info
    "LangChain is a framework for building applications with large language models. It provides chains, agents, and memory systems.",
    "LangChain supports multiple LLM providers including OpenAI, Anthropic, and open-source models.",
    
    # Vector DB info
    "Vector databases like ChromaDB, Pinecone, and Weaviate store embeddings for semantic search.",
    "Embeddings are numerical representations of text that capture semantic meaning.",
    
    # RAG info
    "RAG (Retrieval Augmented Generation) combines information retrieval with language model generation.",
    "RAG helps LLMs access external knowledge and reduces hallucinations.",
    "The RAG process: 1) Embed query, 2) Retrieve relevant docs, 3) Generate answer with context.",
    
    # Python info
    "Python is popular for AI development due to its simplicity and extensive ecosystem.",
    "Key Python libraries for AI include LangChain, PyTorch, TensorFlow, and Hugging Face Transformers.",
]

# Convert to documents
kb_docs = [Document(page_content=text, metadata={"source": "knowledge_base"}) for text in knowledge_base]

# Create vector store
kb_vectorstore = Chroma.from_documents(
    documents=kb_docs,
    embedding=embeddings,
    collection_name="complete_kb",
    persist_directory="./chroma_db_complete"
)

print(f"‚úÖ Knowledge base created and stored at ./chroma_db_complete")


‚úÖ Knowledge base created and stored at ./chroma_db_complete


In [22]:
# Create a conversational RAG chain
from langchain_classic.chains import ConversationalRetrievalChain
from langchain_classic.memory import ConversationBufferMemory

# Setup memory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key="answer"
)

# Create conversational RAG
conversational_rag = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=kb_vectorstore.as_retriever(search_kwargs={"k": 3}),
    memory=memory,
    return_source_documents=True,
)

print("‚úÖ Conversational RAG system ready")

‚úÖ Conversational RAG system ready


  memory = ConversationBufferMemory(


In [23]:
# Have a conversation!
questions = [
    "What is RAG?",
    "How does it help with LLMs?",
    "What databases can I use for storing embeddings?"
]

for question in questions:
    print(f"\n{'='*60}")
    print(f"Q: {question}")
    print(f"{'='*60}")
    
    result = conversational_rag.invoke({"question": question})
    
    print(f"A: {result['answer']}\n")
    print(f"Sources: {len(result['source_documents'])} documents retrieved")


Q: What is RAG?
A: RAG stands for Retrieval Augmented Generation. It is a process that combines information retrieval with language model generation to help access external knowledge and reduce hallucinations in language models. The RAG process involves embedding a query, retrieving relevant documents, and generating an answer with context.

Sources: 3 documents retrieved

Q: How does it help with LLMs?
A: RAG (Retrieval Augmented Generation) helps with Language Model Generation by combining information retrieval with the generation process. It first embeds the query, then retrieves relevant documents, and finally generates answers with context. This process allows Language Models to access external knowledge, improve the quality of generated responses, and reduce hallucinations.

Sources: 3 documents retrieved

Q: What databases can I use for storing embeddings?
A: Vector databases like ChromaDB, Pinecone, and Weaviate are commonly used for storing embeddings for semantic search.

So

## Part 8: Summary & Key Takeaways

**What you learned today:**

1. **Embeddings**: Convert text to vectors that capture semantic meaning
2. **Vector Databases**: Store and search embeddings efficiently (ChromaDB)
3. **Semantic Search**: Find relevant info based on meaning, not keywords
4. **Document Processing**: Load and split documents into chunks
5. **RAG Systems**: Retrieve relevant docs + generate answers with LLM
6. **Advanced Features**: Metadata filtering, MMR, conversational RAG

**Key Concepts:**
- Embeddings enable semantic similarity search
- Vector stores are optimized for finding similar embeddings
- Text splitting is crucial for effective retrieval
- RAG reduces hallucinations by grounding LLMs in real data
- Conversational RAG maintains context across multiple turns

**Tomorrow (Day 4):** Advanced RAG with multiple sources, citations, and reranking!

## üéØ Practice Exercises

Try these on your own:

1. **Create your own knowledge base**: Add 10-20 facts about your favorite topic and build a RAG system
2. **Load a real file**: Use TextLoader or other loaders to ingest a PDF or markdown file
3. **Experiment with chunk sizes**: Try different chunk_size and chunk_overlap values
4. **Custom prompts**: Write your own RAG prompt template with a specific tone or format
5. **Metadata filtering**: Create documents with custom metadata and search with filters

**Bonus Challenge:**
Build a "personal notes assistant" that can answer questions about all your learning notes from Days 1-3!

## üìö Resources

- [LangChain - Retrieval](https://python.langchain.com/docs/modules/data_connection/)
- [ChromaDB Documentation](https://docs.trychroma.com/)
- [Understanding Embeddings](https://platform.openai.com/docs/guides/embeddings)
- [RAG Paper (original)](https://arxiv.org/abs/2005.11401)
- [Vector Database Comparison](https://www.pinecone.io/learn/vector-database/)

---

**Day 3 Complete! üéâ**

You now understand the fundamentals of RAG systems. Tomorrow we'll make them production-ready with advanced techniques!

### 2b. Measuring Similarity

Let's see how similar these embeddings are using cosine similarity.
Similar texts should have higher similarity scores.