The sub-etha MCP is a Retrieval-Augmented Generation (RAG) server designed to provide up-to-date API documentation for fast-moving projects, focused on Microsoft's AutoGen framework.
Large Language Models (LLMs) are often trained on public data that can be months or years out of date. For a rapidly evolving project like AutoGen, this means a model's internal knowledge is frequently behind the latest stable release. This server solves that by providing a current, searchable documentation source that can be queried through a standard MCP interface.
This server depends on a custom extension to AutoGen’s Sphinx build process to produce the structured JSON output used by sub-etha MCP.
To build with the required extension, clone and build AutoGen from this branch:
git clone https://github.com/fortmort/autogen.git
cd autogen
git checkout feature/sphinx-json-extensionThen follow the AutoGen project’s existing Sphinx documentation build instructions to generate the JSON export.
After a successful Sphinx build that includes the patched extension, the structured documentation file will be available at:
autogen/python/docs/build/api_data.json
Point sub-etha MCP at this file via the API_DATA_FILE environment variable.
The server employs a hybrid search system to deliver the most relevant documentation for a user's query. The process consists of two main phases: data ingestion and query processing.
The server's knowledge comes from a structured JSON file, api_data.json.
- Source data: Generated by the patched Sphinx documentation build of the AutoGen project. It extracts class definitions, method signatures, parameter descriptions, and code examples into a clean, structured format.
- Automated indexing: On its first run, the server processes
api_data.jsonand splits the documentation into semantic chunks (e.g., a class description is one chunk, and each of its examples is a separate chunk). - Dual indexes: Chunks are ingested into two different search indexes to enable hybrid search:
- ChromaDB (vector/semantic search) for understanding meaning and intent.
- Whoosh (lexical/keyword search) for matching specific terms, class names, and function names.
- Data versioning: On startup, the server computes a checksum of
api_data.json. If it has changed since the last run, the server purges the old indexes and re-ingests the data automatically.
On call, the server runs a multi-stage pipeline:
- Relevance gate: A quick semantic check against the vector index rejects clearly unrelated queries.
- Hybrid search: If relevant, the query runs against both the semantic (ChromaDB) and lexical (Whoosh) indexes.
- Reciprocal Rank Fusion (RRF): Results from both searches are combined; documents ranked highly by both receive a significant boost for improved accuracy.
- Context assembly: The server assembles a clean, readable context string from the top-ranked documentation objects, including relevant code examples, while respecting the configured size limit.