DocuMind

Fully local RAG agent built with FastMCP + ChromaDB + Ollama models (phi4-mini:3.8b-q4_K_M and embeddinggemma:300m-qat-q8_0), managed with UV and linted/formatted with Ruff.

Privacy and locality

All components run on your machine:

LLM inference via local Ollama
Embedding generation via local Ollama
Vector storage/query via local ChromaDB

No cloud APIs are required.

Architecture (C4, Mermaid)

C4Container
title DocuMind - C4 Container Diagram
Person(user, "User", "Ingests documents and chats with the assistant")

System_Ext(ollama, "Ollama", "Local model runtime for chat and embeddings")

System_Boundary(documind, "DocuMind (Local)") {
  Container(client, "Interactive Client (client.py)", "Python CLI", "Runs chat loop, calls MCP tools, stores conversation memory")
  Container(server, "FastMCP Server (server.py)", "Python / FastMCP", "Exposes add_document, semantic_search, collection_stats")
  ContainerDb(chroma, "ChromaDB", "Local vector database", "Stores ingested documents and conversation memory")
}

Rel(user, client, "Uses", "CLI")
Rel(client, server, "Invokes tools", "MCP over stdio or SSE")
Rel(server, ollama, "Generates embeddings", "HTTP")
Rel(client, ollama, "Runs chat + embeddings", "HTTP")
Rel(server, chroma, "Reads/writes document vectors", "Local DB API")
Rel(client, chroma, "Reads/writes conversation memory", "Local DB API")

Prerequisites

Python 3.13+
Ollama installed and running
UV installed

Install/pull models:

ollama pull phi4-mini:3.8b-q4_K_M
ollama pull embeddinggemma:300m-qat-q8_0
ollama list

Project layout

<project-root>/
├── pyproject.toml
├── uv.lock
├── .python-version
├── config.py
├── server.py
├── client.py
├── ingest.py
├── scripts/
├── data/
└── chroma_db/   # runtime-created, ignored by git

Setup

cd <project-root>
python3 -m uv sync

Ingest documents

v1 ingestion supports text/markdown files only (.txt, .md, .markdown).

uv run python ingest.py data/my_notes.txt
uv run python ingest.py data/notes.txt data/report.md

Run the server (stdio + SSE)

Start Ollama if needed:

ollama serve

Run FastMCP server over stdio (default):

cd <project-root>
uv run python server.py --transport stdio

Enable verbose MCP context logs on the server:

cd <project-root>
uv run python server.py --transport stdio --log-level DEBUG --to-client-debug

Run FastMCP server over SSE:

cd <project-root>
uv run python server.py --transport sse --host 127.0.0.1 --port 8000

Run the interactive client

Client persists conversation history in Chroma (conversation_memory collection) and supports stdio and SSE transports.

Launch interactive client with default session id (stdio):

cd <project-root>
uv run python client.py

Launch with a custom persisted session:

cd <project-root>
uv run python client.py --session-id my-session

Override the server launch command used by the client (stdio mode):

cd <project-root>
uv run python client.py --transport stdio --server-command "uv run python server.py --transport stdio"

Connect client to an already running SSE server:

cd <project-root>
uv run python client.py --transport sse --sse-url "http://127.0.0.1:8000/sse"

Client log forwarding is always enabled; --log-level only changes verbosity:

cd <project-root>
uv run python client.py --log-level INFO
uv run python client.py --log-level DEBUG

FastMCP tools (MVP)

add_document(text, doc_id=None, source="")
semantic_search(query, n_results=5, source_filter="")
collection_stats()

Testing (scripts + Ruff only)

Run script-based checks:

cd <project-root>
./scripts/ruff_check.sh
./scripts/smoke_ingest.sh

Direct Ruff commands:

cd <project-root>
python3 -m uv run ruff format .
python3 -m uv run ruff format --check . && python3 -m uv run ruff check .

Operations checks

Verify collection count:

cd <project-root>
uv run python -c "import chromadb; c=chromadb.PersistentClient('./chroma_db'); print(c.get_or_create_collection('documents').count())"

Troubleshooting

Connection refused on localhost:11434
- Ensure ollama serve is running.
Missing model errors
- Re-run ollama pull phi4-mini:3.8b-q4_K_M and ollama pull embeddinggemma:300m-qat-q8_0.
Empty search results
- Check ingestion completed and collection count is non-zero.
ChromaDB embedding dimension mismatch
- Keep one embedding model per collection; clear chroma_db/ and re-ingest if model changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuMind

Privacy and locality

Architecture (C4, Mermaid)

Prerequisites

Project layout

Setup

Ingest documents

Run the server (stdio + SSE)

Run the interactive client

FastMCP tools (MVP)

Testing (scripts + Ruff only)

Operations checks

Troubleshooting

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
client.py		client.py
config.py		config.py
ingest.py		ingest.py
pyproject.toml		pyproject.toml
server.py		server.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

DocuMind

Privacy and locality

Architecture (C4, Mermaid)

Prerequisites

Project layout

Setup

Ingest documents

Run the server (stdio + SSE)

Run the interactive client

FastMCP tools (MVP)

Testing (scripts + Ruff only)

Operations checks

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages