Multi-provider agentic RAG built with LangGraph and LangChain.
This project indexes a set of source documents, persists a local Qdrant vector index on disk, and answers questions with a retrieval-first workflow that can rewrite queries, call retrieval as a tool, and generate grounded answers with citations. Rewrite loops are bounded by MAX_REWRITES, and exhausted retrieval paths terminate gracefully with an insufficient_context result instead of looping indefinitely.
- LangGraph-based agentic RAG flow
- Multi-provider chat model support
- Multi-provider embedding support
- OpenAI-compatible endpoint support
- Persistent local Qdrant index in
.cache/vectorstores - CLI with step-by-step execution tracing
- Source-aware answers with citations
googleopenaiopenai-compatibleanthropiclitellm
googleopenaiopenai-compatiblelitellm
git clone git@github.com:scldrn/RAGmain.git
cd RAGmain
/opt/homebrew/bin/python3.11 -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'Python 3.10+ is required. Python 3.11 is the preferred local baseline.
Create a .env file with your provider config. Minimal Google AI Studio example:
CHAT_PROVIDER=google
CHAT_MODEL=gemini-2.5-flash
EMBEDDING_PROVIDER=google
EMBEDDING_MODEL=gemini-embedding-2-preview
INDEX_CACHE_DIR=.cache/vectorstores
INGESTION_MODE=auto
FETCH_TIMEOUT_SECONDS=20
GOOGLE_API_KEY=your_api_keyRun a query:
python -m agentic_rag --question "What does Lilian Weng say about reward hacking?"Run the explicit query command:
python -m agentic_rag query --question "What does Lilian Weng say about reward hacking?"Pre-build the index explicitly:
python -m agentic_rag ingestRun with trace output:
python -m agentic_rag \
--question "What does Lilian Weng say about reward hacking?" \
--show-stepsRun with startup diagnostics and verbose logs:
python -m agentic_rag \
--verbose \
--question "What does Lilian Weng say about reward hacking?"OpenAI:
CHAT_PROVIDER=openai
CHAT_MODEL=gpt-4.1-mini
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=your_api_keyAnthropic chat + OpenAI embeddings:
CHAT_PROVIDER=anthropic
CHAT_MODEL=claude-3-5-sonnet-20241022
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
ANTHROPIC_API_KEY=your_api_key
OPENAI_API_KEY=your_api_keyOpenAI-compatible local endpoint:
CHAT_PROVIDER=openai-compatible
CHAT_MODEL=local-model-name
CHAT_API_BASE=http://localhost:1234/v1
CHAT_API_KEY=lm-studio
EMBEDDING_PROVIDER=openai-compatible
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_API_BASE=http://localhost:1234/v1
EMBEDDING_API_KEY=lm-studioCommon options:
queryingest--question--url--chat-provider--chat-model--embedding-provider--embedding-model--chat-api-base--embedding-api-base--show-steps--diagram--verbose
Show full help:
python -m agentic_rag --help- Load and split source documents.
- Build or reuse a cached vector index.
- Let the graph decide whether to answer directly or retrieve.
- Retrieve relevant chunks as a tool call.
- Grade retrieved context.
- Rewrite the question if retrieval quality is weak.
- Stop at
insufficient_contextif rewrites are exhausted, otherwise generate a final cited answer.
The current default setup indexes a small set of Lilian Weng blog posts so the project works immediately as a reference RAG example.
You can add your own sources at runtime:
python -m agentic_rag \
--url https://example.com/doc1 \
--url https://example.com/doc2 \
--question "Summarize the main ideas"Run the local quality gates:
make checkEquivalent individual commands:
ruff check .
ruff format --check .
mypy src/agentic_rag
pytest --cov=src/agentic_rag --cov-report=term-missing --cov-fail-under=85Main files:
src/agentic_rag/service.pysrc/agentic_rag/errors.pysrc/agentic_rag/providers.pysrc/agentic_rag/app.pysrc/agentic_rag/graph.pysrc/agentic_rag/cli.pysrc/agentic_rag/settings.py
python -m agentic_rag --question "..."remains supported as a legacy shortcut forpython -m agentic_rag query --question "...".INGESTION_MODE=autobuilds a missing index on first query.INGESTION_MODE=explicitrequires callingAgenticRagService.ingest()before querying.MAX_REWRITESbounds rewrite loops. When retrieval remains weak after that limit, the service returns a structuredinsufficient_contexttermination reason and a graceful fallback answer.- Corrupted or incomplete Qdrant cache directories are detected and rebuilt automatically.
- Document fetches use per-request timeouts, retry each URL, and continue indexing with the remaining sources when only some URLs fail.
--verboseenables startup diagnostics and runtime logging for command, providers, cache path, index state, and final query outcome.- The first run for a new source/config combination builds embeddings and writes a local Qdrant index.
- Later runs reuse the cached vector index and are much cheaper/faster.
- Install Git hooks with
pre-commit installafter setting up the virtualenv. make checkis the canonical local CI gate and matches the GitHub Actions workflow.- The shared runtime entry point is
AgenticRagService, which powers both indexing and queries.