RAG from Scratch

Demystify Retrieval-Augmented Generation (RAG) by building it yourself - step by step.
No black boxes. No cloud APIs. Just clear explanations, simple examples, and local code you fully understand.

This project follows the same philosophy as AI Agents from Scratch:
make advanced AI concepts approachable for developers through minimal, well-explained, real code.

What You'll Learn

What RAG really is, and why it’s so powerful for knowledge retrieval.
How embeddings work, turn text into numbers your model can understand.
How to build a local vector database, store and query documents efficiently.
How to connect everything, retrieve context and feed it into an LLM for grounded answers.
How to re-rank and normalize, improving retrieval precision and reducing noise.
Step-by-step code walkthroughs, every function explained, nothing hidden.

Concept Overview

Retrieval-Augmented Generation (RAG) enhances language models by giving them access to external knowledge.
Instead of asking the model to “remember” everything, you let it retrieve relevant context before generating a response.

Pipeline:

Knowledge Requirements, define questions and data needs.
Data Loading, import and structure your documents.
Text Splitting & Chunking, divide data into manageable pieces.
Embedding, turn chunks into numerical vectors.
Vector Store, save and index embeddings for fast retrieval.
Retrieval, fetch the most relevant context for a given query.
Post-Retrieval Re-Ranking, re-order results to prioritize the best context.
Query Preprocessing & Embedding Normalization, clean and standardize input vectors for consistency.
Augmentation, merge retrieved context into the model’s prompt.
Generation, produce grounded answers using a local LLM.

Learning Path

Follow these examples in order to build understanding progressively:

0. How RAG Works

examples/00_how_rag_works/
Code | Code Explanation | Concepts

What you'll learn:

The core idea behind Retrieval-Augmented Generation
How retrieval and generation work together
A minimal, simplified end-to-end RAG flow in under 70 lines of code

Key concepts: retrieval, generation, context injection, similarity search

1. Data Loading

examples/02_data_loading/
Code | Code Explanation | Concepts

What you'll learn:

Loading raw text data
Normalizing and preparing documents

Key concepts: file I/O, preprocessing, document structure

2. Text Splitting & Chunking

examples/03_text_splitting_and_chunking/
Code | Code Explanation | Concepts

What you'll learn:

How to split long text into manageable chunks
Overlaps, boundaries, and chunk strategies

Key concepts: chunking logic, context windows, granularity trade-offs

3. Embedding

examples/04_intro_to_embeddings/02_generate_embeddings/
Code | Code Explanation | Concepts

What you'll learn:

How embeddings represent meaning as vectors
How to generate embeddings locally

Key concepts: vector representation, similarity, embedding models

4. Vector Store

examples/05_building_vector_store/01_in_memory_store/
Code | Code Explanation | Concepts

What you'll learn:

How to store embeddings
How nearest-neighbor search works

Key concepts: indexing, vector search, metadata storage

5. Basic Retrieval

examples/06_retrieval_strategies/01_basic_retrieval/
Code | Code Explanation | Concepts

What you'll learn:

Retrieving relevant chunks from the vector store
Understanding similarity scoring

Key concepts: augment, scoring, top-k retrieval

At the end you can look into Showcase to see everything you learned so far in action.

Project Structure

├── src/                                    # Reusable library code
│   ├── embeddings/
│   │   ├── index.js                        # Main exports
│   │   ├── EmbeddingModel.js               # Model wrapper class
│   │   └── EmbeddingCache.js               # Caching layer
│   │
│   ├── vector-stores/
│   │   ├── index.js                        # Main exports
│   │   ├── BaseVectorStore.js              # Abstract base class
│   │   ├── InMemoryVectorStore.js          # In-memory implementation
│   │   ├── LanceDBVectorStore.js           # LanceDB implementation
│   │   └── QdrantVectorStore.js            # Qdrant implementation
│   │
│   ├── loaders/
│   │   ├── index.js
│   │   ├── BaseLoader.js                   # Abstract loader
│   │   ├── PDFLoader.js                    # PDF loading
│   │   ├── TextLoader.js                   # Text file loading
│   │   └── DirectoryLoader.js              # Batch loading
│   │
│   ├── text-splitters/
│   │   ├── index.js
│   │   ├── BaseTextSplitter.js             # Base class
│   │   ├── CharacterTextSplitter.js        
│   │   ├── RecursiveCharacterTextSplitter.js
│   │   └── TokenTextSplitter.js
│   │
│   ├── retrievers/
│   │   ├── index.js
│   │   ├── BaseRetriever.js                # Base retriever
│   │   ├── VectorStoreRetriever.js         # Vector search
│   │   ├── RerankerRetriever.js            # With reranking
│   │   └── HybridRetriever.js              # Multiple strategies
│   │
│   ├── chains/
│   │   ├── index.js
│   │   ├── RetrievalChain.js               # Query → Retrieve → Format
│   │   ├── RAGChain.js                     # Full RAG pipeline
│   │   └── ConversationalChain.js          # With memory
│   │
│   ├── prompts/
│   │   ├── index.js
│   │   ├── PromptTemplate.js               # Template class
│   │   └── templates/
│   │       ├── qa.js                       # Q&A templates
│   │       ├── summarization.js
│   │       └── conversation.js
│   │
│   ├── utils/
│   │   ├── index.js
│   │   ├── Document.js                     # Document class
│   │   ├── similarity.js                   # Similarity functions
│   │   ├── tokenizer.js                    # Token counting
│   │   └── validators.js                   # Input validation
│   │
│   └── index.js                            # Main library export
│
├── examples/
│   ├── 00_how_rag_works/
│   │   └── example.js                      # Minimal RAG simulation with naive keyword search
│   │
│   ├── 01_intro_to_llms/
│   │   └── example.js                      # Introduction to LLMs, the brain of your RAG system
│   │
│   ├── 02_data_loading/
│   │   └── example.js                      # Load and preprocess raw text data
│   │
│   ├── 03_text_splitting_and_chunking/
│   │   └── example.js                      # Split long text into chunks for embedding
│   │
│   ├── 04_intro_to_embeddings/
│   │   ├── 01_text_similarity_basics/
│   │   └── 02_generate_embeddings/
│   │
│   ├── 05_building_vector_store/
│   │   ├── 01_in_memory_store/
│   │   ├── 02_nearest_neighbor_search/
│   │   └── 03_metadata_filtering/
│   │
│   ├── 06_retrieval_strategies/
│   │   ├── 01_basic_retrieval/
│   │   ├── 02_query_preprocessing/
│   │   ├── 03_hybrid_search/
│   │   ├── 04_multi_query_retrieval/
│   │   ├── 05_query_rewriting/
│   │   ├── 06_rank_results/
│   │   └── 07_post_retrieval_reranking/
│   │
│   ├── 07_prompt_engineering_for_rag/
│   │   ├── 01_context_stuffing/
│   │   ├── 02_citation_prompts/
│   │   └── 03_context_compression/
│   │
│   ├── 08_rag_in_action/
│   │   ├── 01_basic_rag/
│   │   ├── 02_error_handling/
│   │   └── 03_streaming_responses/
│   │
│   ├── 09_evaluating_rag_quality/
│   │   ├── 01_retrieval_metrics/
│   │   ├── 02_generation_metrics/
│   │   └── 03_end_to_end_evaluation/
│   │
│   ├── 10_observability_and_caching/
│   │   └── example.js                      # Cache repeated queries and log performance
│   │
│   ├── 11_metadata_and_structured_data/
│   │   └── example.js
│   │
│   ├── 12_graph_db_integration/
│   │   └── example.js                      # Graph database using kuzu npm package
│   │
│   ├── 13_knowledge_requiremens/
│   │   └── example.js                      # Define what knowledge is needed and where it comes from
│   │
│   ├── tutorials/                          # Higher-level guides
│   │   ├── basic-rag-pipeline.js
│   │   ├── conversational-rag.js
│   │   ├── multi-modal-rag.js
│   │   └── advanced-retrieval.js
│   │
│   ├── templates/                          # Starter templates
│   │   ├── simple-rag/
│   │   ├── api-server/
│   │   └── chatbot/
│   │
│   ├── tests/                              # Unit tests
│   │   ├── embeddings/
│   │   ├── vector-stores/
│   │   └── ...
│   │
│   └── README.md

How it works

Goal	What You Add	Why It Helps
Concept clarity	`00_how_rag_works`	See retrieval + generation in <70 lines before touching vectors.
Mathematical intuition	`04_intro_to_embeddings/01_text_similarity_basics.js`	Learn cosine similarity without black-box APIs.
Hands-on understanding	`05_building_vector_store/01_in_memory_store.js`	Understand how embeddings are stored and compared.
Better results	`06_retrieval_strategies/07_post_retrieval_reranking.js`	Reduce noise and redundancy in retrieved context.
Query quality	`06_retrieval_strategies/02_query_preprocessing.js`	Ensure embeddings represent consistent meaning.
Knowledge connectivity	`12_graph_db_integration/example.js`	Explore how a graph database can improve retrieval and reasoning.

Each folder contains:

A minimal example (example.js)
A detailed explanation of every step
Comments in the code to teach the concept clearly

Current Implementation Status

This project is being built step by step, following an educational approach where each concept is introduced incrementally.

✅ What's Implemented

The following core components and examples are currently available:

Examples & Tutorials:

✅ 00_how_rag_works - Minimal RAG simulation to understand the concept
✅ 01_intro_to_llms - Getting started with local LLMs (node-llama-cpp basics, building LLM wrapper)
✅ 02_data_loading - Loading and preprocessing raw text data
✅ 03_text_splitting_and_chunking - Splitting long text into manageable chunks
✅ 04_intro_to_embeddings - Text similarity basics and generating embeddings
✅ 05_building_vector_store - In-memory store, nearest neighbor search, metadata filtering
✅ 06_retrieval_strategies - Basic retrieval (partial implementation)

🚧 What's Coming Next

The following are advanced topics that will be added step by step in the coming weeks and months:

Advanced Retrieval Strategies:

🚧 Query preprocessing and normalization
🚧 Hybrid search combining multiple strategies
🚧 Multi-query retrieval
🚧 Query rewriting and expansion
🚧 Result ranking and scoring
🚧 Post-retrieval reranking

Prompt Engineering for RAG:

🚧 Context stuffing techniques
🚧 Citation and source attribution prompts
🚧 Context compression

RAG in Production:

🚧 Error handling and fallbacks
🚧 Streaming responses
🚧 End-to-end RAG pipeline examples

Evaluation & Optimization:

🚧 Retrieval metrics (precision, recall, MRR)
🚧 Generation quality metrics
🚧 End-to-end evaluation frameworks

Advanced Features:

🚧 Observability and performance monitoring
🚧 Caching strategies for repeated queries
🚧 Metadata and structured data handling
🚧 Graph database integration (using kuzu)
🚧 Multi-modal RAG

Templates & Guides:

🚧 Complete starter templates (simple RAG, API server, chatbot)
🚧 Higher-level tutorials and best practices

Note: This is an educational project focused on building understanding from the ground up. Each new topic will be introduced with clear explanations, minimal examples, and thoroughly commented code. The goal is not to rush through features, but to ensure every concept is deeply understood before moving to the next.

Requirements

Node.js 18+
Local LLM (e.g., node-llama-cpp)
npm packages for embeddings, vector math, and optional kuzu

Install dependencies:

npm install
node 00_how_rag_works/example.js

Philosophy

This repository is not about fancy frameworks or huge models.
It’s about understanding, line by line, how RAG works under the hood.

If you can explain it, you can build it.
If you can build it, you can improve it.

Contribute

Contributions are welcome!
If you have a clear, educational RAG example, open a PR.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.husky		.husky
examples		examples
helpers		helpers
images		images
models		models
src		src
.env_example		.env_example
.gitignore		.gitignore
DOWNLOAD.md		DOWNLOAD.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG from Scratch

What You'll Learn

Concept Overview

Learning Path

0. How RAG Works

1. Data Loading

2. Text Splitting & Chunking

3. Embedding

4. Vector Store

5. Basic Retrieval

Project Structure

How it works

Current Implementation Status

✅ What's Implemented

🚧 What's Coming Next

Requirements

Philosophy

Contribute

See Also

About

Uh oh!

Languages

License

pguso/rag-from-scratch

Folders and files

Latest commit

History

Repository files navigation

RAG from Scratch

What You'll Learn

Concept Overview

Learning Path

0. How RAG Works

1. Data Loading

2. Text Splitting & Chunking

3. Embedding

4. Vector Store

5. Basic Retrieval

Project Structure

How it works

Current Implementation Status

✅ What's Implemented

🚧 What's Coming Next

Requirements

Philosophy

Contribute

See Also

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages