Agentic RAG

Multi-provider agentic RAG built with LangGraph and LangChain.

This project indexes a set of source documents, persists a local Qdrant vector index on disk, and answers questions with a retrieval-first workflow that can rewrite queries, call retrieval as a tool, and generate grounded answers with citations. Rewrite loops are bounded by MAX_REWRITES, and exhausted retrieval paths terminate gracefully with an insufficient_context result instead of looping indefinitely.

Features

LangGraph-based agentic RAG flow
Multi-provider chat model support
Multi-provider embedding support
OpenAI-compatible endpoint support
Persistent local Qdrant index in .cache/vectorstores
CLI with step-by-step execution tracing
Source-aware answers with citations

Supported Providers

Chat

google
openai
openai-compatible
anthropic
litellm

Embeddings

google
openai
openai-compatible
litellm

Quickstart

git clone git@github.com:scldrn/RAGmain.git
cd RAGmain
/opt/homebrew/bin/python3.11 -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'

Python 3.10+ is required. Python 3.11 is the preferred local baseline.

Create a .env file with your provider config. Minimal Google AI Studio example:

CHAT_PROVIDER=google
CHAT_MODEL=gemini-2.5-flash
EMBEDDING_PROVIDER=google
EMBEDDING_MODEL=gemini-embedding-2-preview
INDEX_CACHE_DIR=.cache/vectorstores
INGESTION_MODE=auto
FETCH_TIMEOUT_SECONDS=20
GOOGLE_API_KEY=your_api_key

Run a query:

python -m agentic_rag --question "What does Lilian Weng say about reward hacking?"

Run the explicit query command:

python -m agentic_rag query --question "What does Lilian Weng say about reward hacking?"

Pre-build the index explicitly:

python -m agentic_rag ingest

Run with trace output:

python -m agentic_rag \
  --question "What does Lilian Weng say about reward hacking?" \
  --show-steps

Run with startup diagnostics and verbose logs:

python -m agentic_rag \
  --verbose \
  --question "What does Lilian Weng say about reward hacking?"

Provider Examples

OpenAI:

CHAT_PROVIDER=openai
CHAT_MODEL=gpt-4.1-mini
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=your_api_key

Anthropic chat + OpenAI embeddings:

CHAT_PROVIDER=anthropic
CHAT_MODEL=claude-3-5-sonnet-20241022
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
ANTHROPIC_API_KEY=your_api_key
OPENAI_API_KEY=your_api_key

OpenAI-compatible local endpoint:

CHAT_PROVIDER=openai-compatible
CHAT_MODEL=local-model-name
CHAT_API_BASE=http://localhost:1234/v1
CHAT_API_KEY=lm-studio
EMBEDDING_PROVIDER=openai-compatible
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_API_BASE=http://localhost:1234/v1
EMBEDDING_API_KEY=lm-studio

CLI Options

Common options:

query
ingest
--question
--url
--chat-provider
--chat-model
--embedding-provider
--embedding-model
--chat-api-base
--embedding-api-base
--show-steps
--diagram
--verbose

Show full help:

python -m agentic_rag --help

How It Works

Load and split source documents.
Build or reuse a cached vector index.
Let the graph decide whether to answer directly or retrieve.
Retrieve relevant chunks as a tool call.
Grade retrieved context.
Rewrite the question if retrieval quality is weak.
Stop at insufficient_context if rewrites are exhausted, otherwise generate a final cited answer.

Default Sources

The current default setup indexes a small set of Lilian Weng blog posts so the project works immediately as a reference RAG example.

You can add your own sources at runtime:

python -m agentic_rag \
  --url https://example.com/doc1 \
  --url https://example.com/doc2 \
  --question "Summarize the main ideas"

Development

Run the local quality gates:

make check

Equivalent individual commands:

ruff check .
ruff format --check .
mypy src/agentic_rag
pytest --cov=src/agentic_rag --cov-report=term-missing --cov-fail-under=85

Main files:

src/agentic_rag/service.py
src/agentic_rag/errors.py
src/agentic_rag/providers.py
src/agentic_rag/app.py
src/agentic_rag/graph.py
src/agentic_rag/cli.py
src/agentic_rag/settings.py

Notes

python -m agentic_rag --question "..." remains supported as a legacy shortcut for python -m agentic_rag query --question "...".
INGESTION_MODE=auto builds a missing index on first query. INGESTION_MODE=explicit requires calling AgenticRagService.ingest() before querying.
MAX_REWRITES bounds rewrite loops. When retrieval remains weak after that limit, the service returns a structured insufficient_context termination reason and a graceful fallback answer.
Corrupted or incomplete Qdrant cache directories are detected and rebuilt automatically.
Document fetches use per-request timeouts, retry each URL, and continue indexing with the remaining sources when only some URLs fail.
--verbose enables startup diagnostics and runtime logging for command, providers, cache path, index state, and final query outcome.
The first run for a new source/config combination builds embeddings and writes a local Qdrant index.
Later runs reuse the cached vector index and are much cheaper/faster.
Install Git hooks with pre-commit install after setting up the virtualenv.
make check is the canonical local CI gate and matches the GitHub Actions workflow.
The shared runtime entry point is AgenticRagService, which powers both indexing and queries.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
src/agentic_rag		src/agentic_rag
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic RAG

Features

Supported Providers

Chat

Embeddings

Quickstart

Provider Examples

CLI Options

How It Works

Default Sources

Development

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic RAG

Features

Supported Providers

Chat

Embeddings

Quickstart

Provider Examples

CLI Options

How It Works

Default Sources

Development

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages