A Model Context Protocol (MCP) server that enables Retrieval Augmented Generation (RAG). It indexes your documents and serves relevant context to Large Language Models via the MCP protocol.
{
"mcpServers": {
"rag": {
"command": "npx",
"args": ["-y", "mcp-rag-server"],
"env": {
"BASE_LLM_API": "http://localhost:11434/v1",
"EMBEDDING_MODEL": "nomic-embed-text",
"VECTOR_STORE_PATH": "./vector_store",
"CHUNK_SIZE": "500"
}
}
}
}# Index documents (default project)
>> tool:embedding_documents {"path":"./docs"}
# Index documents for a project
>> tool:embedding_documents {"path":"./docs/kore","project":"kore"}
# Sync project documents (only new/changed files)
>> tool:embedding_documents {"path":"./docs/kore","project":"kore","sync":true}
# Sync all previously indexed projects (no path needed)
>> tool:sync_documents {}
# Check status
>> resource:embedding-status
<< rag://embedding/status
Current Path: ./docs/file1.md
Completed: 10
Failed: 0
Total chunks: 15
Failed Reason:- Integration Examples
- Features
- Installation
- Docker Deployment
- Quick Start
- Configuration
- Usage
- How RAG Works
- Development
- Contributing
- License
- Index documents in
.txt,.md,.json,.jsonl, and.csvformats - Customizable chunk size for splitting text
- Local vector store powered by SQLite (via LangChain's LibSQLVectorStore)
- Supports multiple embedding providers (OpenAI, Ollama, Granite, Nomic)
- Exposes MCP tools and resources over stdio for seamless integration with MCP clients
npm install -g mcp-rag-servergit clone https://github.com/kwanLeeFrmVi/mcp-rag-server.git
cd mcp-rag-server
npm install
npm run build
npm startcp .env.example .env
# Edit .env with your real values
docker compose up -d --buildFor a full server setup and GitHub publishing guide, see GUIA_DESPLIEGUE.md.
export BASE_LLM_API=http://localhost:11434/v1
export EMBEDDING_MODEL=granite-embedding-278m-multilingual-Q6_K-1743674737397:latest
export VECTOR_STORE_PATH=./vector_store
export CHUNK_SIZE=500
# Run (global install)
mcp-rag-server
# Or via npx
npx mcp-rag-server💡 Tip: We recommend using Ollama for embedding. Install and pull the
nomic-embed-textmodel:
ollama pull nomic-embed-text
export EMBEDDING_MODEL=nomic-embed-text| Variable | Description | Default |
|---|---|---|
BASE_LLM_API |
Base URL for embedding API | http://localhost:11434/v1 |
LLM_API_KEY |
API key for your LLM provider | (empty) |
EMBEDDING_MODEL |
Embedding model identifier | nomic-embed-text |
VECTOR_STORE_PATH |
Directory for local vector store | ./vector_store |
CHUNK_SIZE |
Characters per text chunk (number) | 500 |
💡 Recommendation: Use Ollama embedding models like
nomic-embed-textfor best performance.
Once running, the server exposes these tools via MCP:
sync_documents(project?: string): Synchronize already indexed documents. Ifprojectis omitted, synchronizes all known projects.embedding_documents(path: string, project?: string, sync?: boolean): Index documents under the given path (isolated byproject). Setsync=trueto update only new/changed files.query_documents(query: string, k?: number, project?: string): Retrieve topkchunks (default 15) for a projectremove_document(path: string, project?: string): Remove a specific documentremove_all_documents(confirm: boolean, project?: string): Clear the entire index (confirm=true)list_documents(project?: string): List all indexed document paths
Clients can also read resources via URIs:
rag://documents— List all document URIs (default project)rag://documents/{project}— List all document URIs for a projectrag://document/{path}— Fetch full content of a document (default project)rag://document/{project}/{path}— Fetch full content of a document for a projectrag://query-document/{numberOfChunks}/{query}— Query documents as a resource (default project)rag://query-document/{project}/{numberOfChunks}/{query}— Query documents as a resource for a projectrag://embedding/status— Check current indexing status (default project)rag://embedding/status/{project}— Check current indexing status for a project
- Indexing: Reads files, splits text into chunks based on
CHUNK_SIZE, and queues them for embedding. - Embedding: Processes each chunk sequentially against the embedding API, storing vectors in SQLite.
- Querying: Embeds the query and retrieves nearest text chunks from the vector store, returning them to the client.
npm install
npm run build # Compile TypeScript
npm start # Run server
npm run watch # Watch for changes