Multimodal RAG service built upon RAGAnything + LightRAG with PostgreSQL defaults and additional enhancements as the unified service.
- π Flexible data sourcing -- Ingest from local filesystem, Azure Blob Storage, or Snowflake tables
- ποΈ Multimodal ingestion with granular enhancements -- PDF, Word, Excel, PowerPoint, images, and more via parsing engine
- π€ Multi-provider LLM -- OpenAI, Azure OpenAI, Anthropic, Google Gemini, Qwen, MiniMax, Ollama, OpenRouter, xInference
- π Knowledge graph + Vector semantic -- Dual retrieval with Apache AGE (graph) and pgvector (vector) in a single PostgreSQL instance
βοΈ Reranking -- LLM-based listwise OR Reranker from Cohere, Jina, Aliyun, Azure Cohere; Point any backend at a custom endpoint (Xinference, Ollama etc.)- β¨ Retrieval enrichment -- Enhanced answer and retrieval formation for better citation and reference.
- π Three interfaces -- Python SDK, REST API, and MCP server
pip install corprag # requires Python 3.12import asyncio
from corprag import RAGService, CorpragConfig
async def main():
# Minimal config -- just needs an LLM API key
config = CorpragConfig(openai_api_key="sk-...")
# Initialize (connects to PostgreSQL, sets up RAG engine)
service = await RAGService.create(config=config)
# Ingest documents
result = await service.aingest(source_type="local", path="./docs")
print(f"Ingested {result['ingested']} documents")
# Retrieve (structured contexts + sources, no LLM answer)
result = await service.aretrieve(query="What are the key findings?")
print(result.contexts)
# Answer (LLM-generated answer + structured contexts + sources)
result = await service.aanswer(query="What are the key findings?")
print(result.answer)
await service.close()
asyncio.run(main())Note: The SDK requires a running PostgreSQL instance with pgvector + AGE extensions, or use JSON fallback for development (see Configuration).
git clone https://github.com/hanlianlu/corprag.git
cd corprag
cp .env.example .env
# Edit .env -- at minimum set CORPRAG_OPENAI_API_KEY
docker compose upEverything is included: PostgreSQL (pgvector + AGE), REST API (:8100), and MCP server (:8101).
# Health check
curl http://localhost:8100/health
# Ingest documents
curl -X POST http://localhost:8100/ingest \
-H "Content-Type: application/json" \
-d '{"source_type": "local", "path": "/app/corprag_storage/sources"}'
# Retrieve (contexts + sources, no LLM answer)
curl -X POST http://localhost:8100/retrieve \
-H "Content-Type: application/json" \
-d '{"query": "What are the key findings?"}'
# Answer (LLM-generated answer + contexts + sources)
curl -X POST http://localhost:8100/answer \
-H "Content-Type: application/json" \
-d '{"query": "What are the key findings?"}'pip install corprag
corprag-mcp # stdio modeAdd to Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"corprag": {
"command": "corprag-mcp"
}
}
}Available MCP tools: retrieve, answer, ingest, list_files, delete_files.
Note: Like the SDK, the MCP server requires PostgreSQL with pgvector + AGE, or JSON fallback storage (see Configuration).
git clone https://github.com/hanlianlu/corprag.git
cd corprag
# Configure environment
cp .env.example .env
# Edit .env -- at minimum set CORPRAG_OPENAI_API_KEY
# Install dependencies
uv sync
# Start PostgreSQL (pick one)
docker compose up postgres -d # via Docker
# starts all services including PostgreSQL, API, and MCP
docker compose up -d uv run pytest tests/unit # unit tests (no external services)
uv run pytest tests/integration # integration tests (requires PostgreSQL)
uv run pytest # all tests
uv run pytest --cov-report=html # + HTML report β htmlcov/index.htmluv run ruff check src/ tests/ scripts/ # lint check
uv run ruff format --check src/ tests/ scripts/ # format check
uv run ruff check --fix src/ tests/ scripts/
uv run ruff format src/ tests/ scripts/ Tip: To skip PostgreSQL entirely during development, set these in your
.env:CORPRAG_VECTOR_STORAGE=NanoVectorDBStorage CORPRAG_GRAPH_STORAGE=NetworkXStorage CORPRAG_KV_STORAGE=JsonKVStorage CORPRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
Note: Excel-to-PDF conversion requires LibreOffice (
libreofficeon PATH). If not installed, Excel files are ingested as-is without conversion. The Docker image includes LibreOffice.
All configuration is via CORPRAG_ environment variables, a .env file, or constructor arguments.
Priority order (highest to lowest):
- Constructor args --
CorpragConfig(openai_api_key="sk-...") - Environment variables --
CORPRAG_OPENAI_API_KEY=sk-... .envfile- Defaults
| Variable | Default | Description |
|---|---|---|
CORPRAG_LLM_PROVIDER |
openai |
openai, azure_openai, anthropic, google_gemini, qwen, minimax, ollama, openrouter |
CORPRAG_EMBEDDING_PROVIDER |
(follows llm_provider) |
Override embedding provider (e.g., openai when using Anthropic) |
CORPRAG_VISION_PROVIDER |
(follows llm_provider) |
Override vision provider |
CORPRAG_EMBEDDING_MODEL |
text-embedding-3-large |
Embedding model |
Each provider has its own API key. Model names are unified across providers.
See .env.example for all provider-specific variables.
| Variable | Default | Options |
|---|---|---|
CORPRAG_VECTOR_STORAGE |
PGVectorStorage |
PGVectorStorage, MilvusVectorDBStorage, NanoVectorDBStorage, ... |
CORPRAG_GRAPH_STORAGE |
PGGraphStorage |
PGGraphStorage, Neo4JStorage, NetworkXStorage, ... |
CORPRAG_KV_STORAGE |
PGKVStorage |
PGKVStorage, JsonKVStorage, RedisKVStorage, ... |
CORPRAG_DOC_STATUS_STORAGE |
PGDocStatusStorage |
PGDocStatusStorage, JsonDocStatusStorage, ... |
See .env.example for all available configuration options.
Five backends are available. The cohere, jina, and aliyun backends use LightRAG's built-in rerank functions and can target any API-compatible service via RERANK_BASE_URL.
| Variable | Default | Description |
|---|---|---|
CORPRAG_RERANK_BACKEND |
llm |
llm, cohere, jina, aliyun, azure_cohere |
CORPRAG_RERANK_MODEL |
(backend default) | Model name sent to the endpoint |
CORPRAG_RERANK_BASE_URL |
(provider default) | Custom endpoint URL for any compatible service |
CORPRAG_RERANK_API_KEY |
β | Generic API key (falls back to provider-specific keys) |
Backend defaults (used when RERANK_MODEL / RERANK_API_KEY are not set):
| Backend | Default model | Provider-specific key |
|---|---|---|
llm |
(follows INGESTION_MODEL) |
(follows LLM_PROVIDER credentials) |
cohere |
rerank-v4.0-pro |
CORPRAG_COHERE_API_KEY |
jina |
jina-reranker-v2-base-multilingual |
CORPRAG_JINA_API_KEY |
aliyun |
gte-rerank-v2 |
CORPRAG_ALIYUN_RERANK_API_KEY |
azure_cohere |
Cohere-rerank-v4.0-pro |
CORPRAG_AZURE_COHERE_API_KEY + CORPRAG_AZURE_COHERE_ENDPOINT |
Examples:
# Cohere (direct)
CORPRAG_RERANK_BACKEND=cohere
CORPRAG_COHERE_API_KEY=your-key
# Local reranker via Xinference / LiteLLM / any Cohere-compatible endpoint
CORPRAG_RERANK_BACKEND=cohere
CORPRAG_RERANK_MODEL=bge-reranker-v2-m3
CORPRAG_RERANK_BASE_URL=http://localhost:9997/v1/rerank
# LLM-based listwise reranker (default -- no extra config needed)
CORPRAG_RERANK_BACKEND=llmSee .env.example for all reranking options.
| Method | Endpoint | Description |
|---|---|---|
POST |
/ingest |
Ingest documents from local, Azure Blob, or Snowflake |
POST |
/retrieve |
Retrieve contexts and sources (no LLM answer) |
POST |
/answer |
LLM-generated answer with contexts and sources |
GET |
/files |
List ingested documents |
DELETE |
/files |
Delete documents |
GET |
/health |
Health check with storage status |
Set CORPRAG_API_AUTH_TOKEN to enable bearer token authentication.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Python SDK Β· REST API (:8100) Β· MCP (:8101) β
βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β CorpragConfig
ββββββββββΌβββββββββ
β RAGService β
ββββββ¬ββββββββ¬βββββ
β β
βββββββββββββΌββ ββββΌββββββββββββββ
β Ingestion β β Retrieval β
β Pipeline β β (RAGAnything β
β β β + LightRAG) β
β local β β β
β azure blob β β retrieve() β
β snowflake β β answer() β
ββββββββ¬βββββββ βββββββββ¬βββββββββ
βββββββββββ¬ββββββββ
β
βββββββββββββββ΄βββββββββββββββββββ
β β
ββββββββββββΌβββββββββ ββββββββββββββββββΌββββββ
β LLM Providers β β Storage β
β Chat Β· Embed β β PostgreSQL β
β Vision Β· Rerank β β (pgvector + AGE) β
β β β β
β OpenAI Β· Azure β β Neo4j Β· Milvus β
β Anthropic Β· β β Redis Β· JSON Β· ... β
β Gemini Β· Qwen β ββββββββββββββββββββββββ
β Ollama Β· ... β
βββββββββββββββββββββ
Apache License 2.0. See LICENSE for details.
Built by HanlianLyu. Contributions welcome! Please open issues or pull requests on GitHub.