# RAG (Retrieval-Augmented Generation) with Vector Databases

## Introduction
RAG is a technique that gives LLMs access to specific, up-to-date data without fine-tuning. It works by:
1.  **Retrieving** relevant documents from a database.
2.  **Augmenting** the user prompt with that context.
3.  **Generating** a response using the augmented prompt.

---

## 1. Setup
We need `langchain-chroma` and `langchain-openai` (or alternatives).

In [None]:
%pip install -qU langchain-openai langchain-chroma langchain-community beautifulsoup4

## 2. Load and Split Documents
We will load a webpage and split it into chunks so it fits into the LLM's context window.

In [None]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

# Split into 1000 character chunks with 200 overlap
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

print(f"Created {len(splits)} chunks from the document.")

## 3. Indexing (Embeddings & VectorStore)
We convert text into numbers (embeddings) and store them in a VectorDB (Chroma).

In [None]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(
    documents=splits, 
    embedding=OpenAIEmbeddings(),
    persist_directory="./chroma_db"
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

## 4. The RAG Chain (using LCEL)
We create a chain that takes a question, retrieves context, and formats it for the model.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI(model="gpt-4o-mini")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# result = rag_chain.invoke("What are the components of an LLM agent?")
# print(result)

## Summary
1.  **Loading**: Get the raw data.
2.  **Splitting**: Chunk it for memory efficiency.
3.  **Embedding**: Convert text to math (vectors).
4.  **Retrieval**: Find common semantic matches.
5.  **Generation**: Let the LLM answer using the retrieved facts.