
# Traditional RAG from Scratch

Welcome to the warm-up notebook for this repository. The goal is to demystify the classic retrieval-augmented generation pipeline before we add agentic behaviours on top of it. Every step is fully transparent so you can connect the dots between data, maths, and code.



## 1. Load a tiny knowledge base

We will start with a short list of study notes about agentic RAG.


In [None]:

documents = [
    {
        "title": "Agent loops",
        "text": "Agentic RAG adds planning loops that let the model break a task into steps."
    },
    {
        "title": "Classic pipeline",
        "text": "Traditional RAG retrieves relevant context chunks and feeds them to a language model."
    },
    {
        "title": "Evaluation",
        "text": "Measuring retrieval quality helps you understand if the context was helpful."
    },
    {
        "title": "Tooling",
        "text": "Agents can call tools such as search, code execution, or calculators when a single prompt is not enough."
    }
]
len(documents)



## 2. Tokenise the documents

We build a very small tokenizer that lowercases the text and splits on letters only. This keeps the maths easy to follow.


In [None]:

import re

def tokenize(text: str):
    tokens = re.findall(r"[a-zA-Z']+", text.lower())
    return tokens

sample_tokens = tokenize(documents[0]["text"])
sample_tokens[:10]



## 3. Build a vocabulary and compute document statistics

We collect every unique token and count how often it appears in each document. This is the data we will use for scoring later.


In [None]:

from collections import Counter
import numpy as np

# Build the vocabulary
vocab = sorted({token for doc in documents for token in tokenize(doc["text"])})
word_index = {word: idx for idx, word in enumerate(vocab)}

# Compute document frequencies
term_counts = []
df_counts = Counter()
for doc in documents:
    counts = Counter(tokenize(doc["text"]))
    term_counts.append(counts)
    for token in counts:
        df_counts[token] += 1

N = len(documents)
idf = {
    term: np.log((1 + N) / (1 + df_counts[term])) + 1
    for term in vocab
}
len(vocab), list(idf.items())[:5]



## 4. Vectorise documents with TF-IDF

Now we convert each document into a numerical vector. Term Frequency–Inverse Document Frequency (TF-IDF) gives higher weight to words that are unique to a document.


In [None]:
def vectorise(counts):
    vector = np.zeros(len(vocab), dtype=float)
    total_tokens = sum(counts.values()) or 1
    for token, count in counts.items():
        if token not in word_index:
            continue
        idx = word_index[token]
        tf = count / total_tokens
        vector[idx] = tf * idf[token]
    return vector

doc_matrix = np.vstack([vectorise(counts) for counts in term_counts])
doc_matrix.shape



## 5. Retrieve context for a question

We turn a user question into a vector and rank the documents with cosine similarity.


In [None]:

def retrieve(query: str, top_k: int = 2):
    query_counts = Counter(tokenize(query))
    query_vector = vectorise(query_counts)
    # Normalise vectors to avoid magnitude differences
    doc_norms = np.linalg.norm(doc_matrix, axis=1)
    query_norm = np.linalg.norm(query_vector)
    similarities = (doc_matrix @ query_vector) / (doc_norms * (query_norm or 1))
    scored = [
        (score, doc)
        for score, doc in zip(similarities, documents)
    ]
    ranked = sorted(scored, key=lambda item: item[0], reverse=True)
    return ranked[:top_k]

results = retrieve("How do agents plan their work?")
results



## 6. Compose a final answer

In a production system this step would call a language model. To keep things simple we stitch together a response using the retrieved snippets.


In [None]:

def simple_answer(question: str, retrieved):
    context = "
".join(f"- {doc['text']}" for _, doc in retrieved)
    return (
        f"Question: {question}

Context used:
{context}

"
        "Takeaway: Agentic RAG builds on this baseline by deciding when to loop, "
        "call tools, or request more context before answering."
    )

print(simple_answer("How do agents plan their work?", results))



## 7. What changes in agentic RAG?

Classic RAG stops after a single retrieve-and-respond cycle. Agentic RAG layers on abilities such as:

* **Planning** – break the task into subtasks and decide which tool should handle each one.
* **Reflection** – review an initial answer, critique it, and iterate until it meets a quality bar.
* **Tool use** – call external APIs, code execution sandboxes, or search services when the retrieved text is not enough.
* **Evaluation loops** – measure retrieval quality and response usefulness, then adjust prompts or data sources automatically.

As you work through the subsequent notebooks, look for these extra loops and decisions. They reuse the same retrieval building block you built here, so understanding this foundation makes the agentic upgrades much easier to follow.



---
You now have a minimal, transparent RAG pipeline. Use it as a mental model when studying the agentic notebooks in this repo.
