RAG Demo

A small, clear reference project that demonstrates a full RAG (Retrieval-Augmented Generation) workflow: ingest PDFs, text files, and raw strings; chunk and embed; store in ChromaDB; and answer questions via an LLM using retrieved context, orchestrated with LangGraph.

What is RAG?

RAG combines retrieval (finding relevant pieces of your data) with generation (an LLM producing an answer). Instead of relying only on the model’s training data, you:

Ingest your content (PDFs, text files, raw text).
Chunk it into smaller segments and embed each chunk.
Store embeddings in a vector store (here, ChromaDB).
On each query, embed the question, retrieve the most relevant chunks, and pass them as context to the LLM.
The LLM answers using only (or mainly) that context, with citations.

So: chunking + embeddings let you search by meaning (similarity) and keep context size manageable; ChromaDB holds those embeddings and does fast similarity search; LangGraph makes the pipeline explicit (state → nodes → edges) so you can see and change each step.

Why LangGraph?

LangGraph models the pipeline as a graph: state (query, retrieved docs, context, answer) flows through nodes (retrieve, build context, generate) and edges (including a conditional branch when no documents are found). That makes the flow easy to follow and extend (e.g. add a guardrail, extra filters, or hybrid retrieval).

Repo structure

README.md
src/
  config.py           # Env vars, constants (chunk size, k, model names)
  loaders/            # PDF, text file, raw string → LangChain Documents
  chunking/           # Recursive character splitter, chunk_id metadata
  vectorstore/        # Chroma init, upsert, persist, load
  rag/                # Retriever, prompt builder, LLM, citations; schema_retriever, sql_prompting
  graph/              # LangGraph state, nodes, compiled graph
  db/                 # Schema extraction (pg_catalog), read-only query execution
  cli.py              # ingest, query, ingest-schema, ask-db, benchmark commands
data/
sql/                  # Optional: football schema (competitions, seasons, teams, etc.)
  pdfs/               # Sample PDFs (add your own)
  texts/              # Sample .txt (e.g. sample.txt)
.env.example
pyproject.toml
tests/

Setup

Clone and enter the repo.

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

Configure environment:
```
cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-your-key
```
Optional: set CHROMA_PERSIST_DIR (default: ./chroma_db). For the ask-db feature (natural-language to SQL), also set DATABASE_URL (e.g. postgresql://user:pass@localhost:5432/football).

Run instructions

Ingest

Index PDFs, text files, and optionally a raw string:

python -m src.cli ingest --pdf-dir data/pdfs --text-dir data/texts

With a raw string:

python -m src.cli ingest --text-dir data/texts --raw "Your extra content here."

PDFs: one Document per page; metadata source_type=pdf, file_name, page.
Text files: one Document per file; metadata source_type=text, file_name.
Raw: one Document; metadata source_type=raw, name/id.

Content is chunked (recursive character splitter), embedded with OpenAI, and upserted into Chroma with dedupe by content+metadata hash.

Query

Ask a question; the app retrieves relevant chunks and calls the LLM with that context:

python -m src.cli query "What is RAG?"

Example output:

Answer: RAG stands for Retrieval-Augmented Generation. It combines a retriever
that finds relevant documents with a language model that generates answers.
The model is given the retrieved context so it can answer using your data...

Citations:
  - sample.txt

If nothing is retrieved, you get a short “I don’t have any relevant documents” style message (guardrail node).

Natural-language to SQL (ask-db)

You can query a PostgreSQL database in plain English: the app retrieves relevant schema (table and column descriptions) from ChromaDB, sends it to the LLM to generate a SELECT query, runs the query read-only, and prints the result. The pipeline uses a separate Chroma collection for schema (so it does not mix with document RAG).

How it works

Schema ingest (ingest-schema): Connect to Postgres, read table and column metadata (including COMMENT ON TABLE / COMMENT ON COLUMN from pg_catalog), build one document per table, embed them, and store in the schema collection (e.g. schema_football).
Ask (ask-db): Your question is embedded; the top relevant schema documents are retrieved (using SCHEMA_RETRIEVAL_K and optional SCHEMA_RETRIEVAL_MAX_DISTANCE). That schema context plus your question go to the LLM, which returns a single SELECT (structured output). The app runs only that SELECT and prints the result.

Prerequisites

A PostgreSQL database with tables (and optionally comments). The repo includes a sample schema in sql/football.sql (competitions, seasons, teams, players, games, appearances).
In .env: DATABASE_URL=postgresql://user:password@localhost:5432/yourdb and OPENAI_API_KEY (for embeddings and for the LLM when using OpenAI).

1. Ingest the schema (once, or after DDL changes)

python -m src.cli ingest-schema

This pulls the public schema from DATABASE_URL, builds one document per table (name, description, columns with types and descriptions), and upserts them into the schema Chroma collection.

2. Ask a question in natural language

python -m src.cli ask-db "Which team scored the most goals at home?"

The CLI prints all steps so you can see what was retrieved, what SQL was generated, and the result:

Step 1 – Question: Your question.
Step 2 – Retrieved schema: The table(s) and columns (with similarity distance) that were sent to the LLM.
Step 3 – Generated SQL: The generated SELECT and, if present, a short explanation.
Step 4 – Result: The result set in tabular form, or "(No rows)" if empty.

Only single SELECT queries are executed; anything else is rejected.

Optional configuration (ask-db only)

In .env you can set:

CHROMA_SCHEMA_COLLECTION — Chroma collection name for schema (default: schema_football).
SCHEMA_RETRIEVAL_K — Number of schema documents to retrieve (default: same as RETRIEVAL_K).
SCHEMA_RETRIEVAL_MAX_DISTANCE — Max L2 distance for schema similarity (0 = off). Use e.g. 1.0 or 2.0 if you get no or too many results.

Example (after loading sql/football.sql and running ingest-schema)

python -m src.cli ask-db "List all competitions"

Example output shape:

--- Step 1: Question ---
List all competitions

--- Step 2: Retrieved schema ---
[Table 1] competitions (distance=0.xxxx)
Table: competitions
Description: Master list of competitions/tournaments...
Columns:
  - id (bigint): Primary key...
  ...

--- Step 3: Generated SQL ---
SELECT id, name, country, competition_type FROM competitions;

Explanation: Lists all rows from the competitions table.

--- Step 4: Result ---
id  name                  country   competition_type
--  --------------------  --------  ----------------
1   Premier League        England   league
2   UEFA Champions League           international

Benchmark (ask-db)

You can run 15 fixed NLP queries in a single process to compare performance when ChromaDB stays warm (no per-query boot). Useful to see how much time is spent in schema retrieval vs. LLM vs. database.

Run with full output (user query, execution time, and database response per query):

python -m src.cli benchmark

For each of the 15 queries you get: the original user question, total execution time for that query, and the result table from PostgreSQL. At the end, total execution time for all 15 is printed.

Run with step timings only (--timing-only):

python -m src.cli benchmark --timing-only

With --timing-only, the CLI prints only per-step times for each query (no user query text or result table):

chroma — time for schema retrieval (ChromaDB similarity search)
LLM — time for the structured LLM call (SQL generation)
DB — time for executing the generated SELECT on PostgreSQL

Example output:

Query 1: chroma 0.15s, LLM 1.42s, DB 0.01s
Query 2: chroma 0.08s, LLM 1.38s, DB 0.01s
...
Query 15: chroma 0.07s, LLM 1.45s, DB 0.02s

Total execution time: 45.23s

Run ingest-schema at least once before using benchmark; the same schema collection and DATABASE_URL are used.

Running with LangGraph dev (web UI)

You can run the RAG graph under the LangGraph dev server and inspect node executions in the web UI (LangGraph Studio), and test with multiple queries via the API or script.

Install dev dependencies (includes LangGraph CLI and SDK):
```
python -m venv .venv
```
```
pip install -e ".[dev]"
```
Ingest some documents (so the graph has something to retrieve):
```
python -m src.cli ingest --text-dir data/texts
```
Paths like data/texts are resolved from the project root (where langgraph.json lives), so you can run ingest from any directory. The log will show action=load_docs source=text count=N files=[...] so you can confirm every .txt file (e.g. sample.txt, dog.txt) was loaded. After adding or changing files in data/texts or data/pdfs, run ingest again or the new content won’t appear in retrieval.

Important: The LangGraph server only runs the query graph (retrieve → generate). It does not load or ingest files. ChromaDB is updated only when you run ingest from the CLI. If you add or change files in data/texts (or data/pdfs), run ingest again from the repo root; both the CLI and the LangGraph API will then use the same ChromaDB (stored under the project’s chroma_db directory). If langgraph dev doesn’t see your data, check the server logs for [vectorstore.chroma] action=init persist_dir=... and collection_doc_count=... to confirm the path and that the collection has documents; you can set CHROMA_PERSIST_DIR in .env to an absolute path (e.g. C:\...\chroma_db) to force the same DB for both CLI and server.
Start the LangGraph dev server (from the repo root):

Activate the virtual environment first (required so the langgraph CLI is on your PATH):
```
# Windows (Git Bash or WSL): source .venv/Scripts/activate
# Windows (CMD): .venv\Scripts\activate.bat
# Windows (PowerShell): .venv\Scripts\Activate.ps1
source .venv/Scripts/activate   # or use the command for your shell
langgraph dev
```
When ready you'll see:
- API: http://localhost:2024
- Docs: http://localhost:2024/docs
- LangGraph Studio: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
Open LangGraph Studio from the URL above. Use the rag assistant. Send input as custom state with a query field, for example:
```
{"query": "What is RAG?"}
```
You can run multiple queries; each run will show the retrieve → build_context → generate (or guardrail) nodes in the UI.
Test with multiple queries from the command line (with langgraph dev running):
```
python scripts/run_queries_via_api.py
```
Or pass your own questions:
```
python scripts/run_queries_via_api.py "What is RAG?" "What is ChromaDB?" "What is LangGraph?"
```
Optional: set BASE_URL if your server is not at http://localhost:2024.

Running Mistral locally (Docker)

You can use a local Mistral model instead of OpenAI for the generation step (after ChromaDB retrieval). Ingest still uses OpenAI embeddings unless you change that separately.

Start Mistral with the provided Compose file (requires Docker and an NVIDIA GPU):
```
docker compose -f docker-compose.mistral.yml up -d
```
Wait until the model has finished loading (logs will show when the server is ready). The first run downloads the model into a Docker volume.

Configure the app in .env:

LLM_PROVIDER=mistral_local
LLM_BASE_URL=http://localhost:8000/v1
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3

Query as usual (no OpenAI key needed for the query step):
```
python -m src.cli query "What is RAG?"
```
For gated Hugging Face models, set HF_TOKEN in your environment or in a .env used by Docker (e.g. pass it when running docker compose). vLLM is GPU-oriented; on a machine without an NVIDIA GPU, consider alternatives such as Ollama for CPU-friendlier local inference.

Example outputs with citations

After ingesting data/texts/sample.txt:

Query: What is RAG?
Answer: Explains RAG; Citations: sample.txt.
Query: What is ChromaDB?
Answer: Explains ChromaDB’s role; Citations: sample.txt.
Query: What is LangGraph?
Answer: Explains orchestration as a graph; Citations: sample.txt.

Citations are derived from document metadata (file_name, page, source_type).

Troubleshooting

OPENAI_API_KEY is not set
Required for ingest (embeddings) and for query when LLM_PROVIDER=openai. Create a .env file with OPENAI_API_KEY=sk-... (see .env.example) or export it. For query with LLM_PROVIDER=mistral_local you do not need an OpenAI key.
No documents to ingest
Ensure --pdf-dir and/or --text-dir exist and contain at least one .pdf or .txt, or use --raw "...".
Empty or irrelevant answers
Run ingest first; then query. Increase retrieval k or chunk size/overlap in src/config.py if needed.
DATABASE_URL is not set
Required for ingest-schema and ask-db. Set it in .env (e.g. postgresql://postgres:password@localhost:5432/football). Ensure the database exists and the user has permission to read pg_catalog and your tables.
No schema documents retrieved (ask-db)
Run ingest-schema first so the schema collection is populated. If you still see no results, try increasing SCHEMA_RETRIEVAL_MAX_DISTANCE (e.g. 2.0) or set it to 0 to disable distance filtering.

Minimal tests

From the repo root:

pytest

Tests check that:

Ingestion produces non-empty documents (with mocked or env API key).
Vectorstore persists and reloads (or that the store can be built).
A query returns an answer and citations when the index is populated.

Extending the demo

Hybrid retrieval: Keep the current retriever interface; add a keyword/BM25 path and merge results before building context.
More loaders: Add new modules under src/loaders/ that yield Document with the same metadata conventions.
Graph: Add nodes (e.g. re-ranking, fact-check) and wire them in src/graph/graph.py with new edges.

License

Use and extend as you like for learning and reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Demo

What is RAG?

Why LangGraph?

Repo structure

Setup

Run instructions

Ingest

Query

Natural-language to SQL (ask-db)

Benchmark (ask-db)

Running with LangGraph dev (web UI)

Running Mistral locally (Docker)

Example outputs with citations

Troubleshooting

Minimal tests

Extending the demo

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
scripts		scripts
sql		sql
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
WORKFLOW.md		WORKFLOW.md
docker-compose.mistral.yml		docker-compose.mistral.yml
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

RAG Demo

What is RAG?

Why LangGraph?

Repo structure

Setup

Run instructions

Ingest

Query

Natural-language to SQL (ask-db)

Benchmark (ask-db)

Running with LangGraph dev (web UI)

Running Mistral locally (Docker)

Example outputs with citations

Troubleshooting

Minimal tests

Extending the demo

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages