sub-etha MCP: AutoGen Documentation Server

The sub-etha MCP is a Retrieval-Augmented Generation (RAG) server designed to provide up-to-date API documentation for fast-moving projects, focused on Microsoft's AutoGen framework.

Large Language Models (LLMs) are often trained on public data that can be months or years out of date. For a rapidly evolving project like AutoGen, this means a model's internal knowledge is frequently behind the latest stable release. This server solves that by providing a current, searchable documentation source that can be queried through a standard MCP interface.

Build dependency: Patched AutoGen Sphinx build

This server depends on a custom extension to AutoGen’s Sphinx build process to produce the structured JSON output used by sub-etha MCP.

To build with the required extension, clone and build AutoGen from this branch:

git clone https://github.com/fortmort/autogen.git
cd autogen
git checkout feature/sphinx-json-extension

Then follow the AutoGen project’s existing Sphinx documentation build instructions to generate the JSON export.

After a successful Sphinx build that includes the patched extension, the structured documentation file will be available at:

autogen/python/docs/build/api_data.json

Point sub-etha MCP at this file via the API_DATA_FILE environment variable.

How it works

The server employs a hybrid search system to deliver the most relevant documentation for a user's query. The process consists of two main phases: data ingestion and query processing.

1. Data ingestion & indexing

The server's knowledge comes from a structured JSON file, api_data.json.

Source data: Generated by the patched Sphinx documentation build of the AutoGen project. It extracts class definitions, method signatures, parameter descriptions, and code examples into a clean, structured format.
Automated indexing: On its first run, the server processes api_data.json and splits the documentation into semantic chunks (e.g., a class description is one chunk, and each of its examples is a separate chunk).
Dual indexes: Chunks are ingested into two different search indexes to enable hybrid search:
1. ChromaDB (vector/semantic search) for understanding meaning and intent.
2. Whoosh (lexical/keyword search) for matching specific terms, class names, and function names.
Data versioning: On startup, the server computes a checksum of api_data.json. If it has changed since the last run, the server purges the old indexes and re-ingests the data automatically.

2. Query processing & ranking - MCP tool: `search_autogen_docs`

On call, the server runs a multi-stage pipeline:

Relevance gate: A quick semantic check against the vector index rejects clearly unrelated queries.
Hybrid search: If relevant, the query runs against both the semantic (ChromaDB) and lexical (Whoosh) indexes.
Reciprocal Rank Fusion (RRF): Results from both searches are combined; documents ranked highly by both receive a significant boost for improved accuracy.
Context assembly: The server assembles a clean, readable context string from the top-ranked documentation objects, including relevant code examples, while respecting the configured size limit.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/sub_etha		src/sub_etha
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
env.example		env.example
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sub-etha MCP: AutoGen Documentation Server

Build dependency: Patched AutoGen Sphinx build

How it works

1. Data ingestion & indexing

2. Query processing & ranking - MCP tool: `search_autogen_docs`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sub-etha MCP: AutoGen Documentation Server

Build dependency: Patched AutoGen Sphinx build

How it works

1. Data ingestion & indexing

2. Query processing & ranking - MCP tool: search_autogen_docs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Query processing & ranking - MCP tool: `search_autogen_docs`

Packages