CAIT - Core AI Toolkit

A modular Model Context Protocol (MCP) server that extends AI assistants with practical capabilities: file I/O, a persistent Python REPL, AST-aware code analysis, semantic text search, document conversion, Wikipedia & arXiv tools, a persistent vector memory database, and other general utilities.

A total of 37 tools across 9 modules. Each module can be disabled independently via the CAIT_DISABLE environment variable. Made by AI for AI.

Requirements

Python 3.11+
Core: fastmcp, chromadb
Online research: wikipedia-api, arxiv
Document conversion: docling or markitdown
Scientific computing (optional, for REPL use): sympy, scipy, matplotlib, plotly, vispy

Installation

git clone https://github.com/JacobBruce/CAIT
cd CAIT
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install fastmcp chromadb wikipedia-api arxiv markitdown

Replace markitdown with docling for more robust document conversions. Then copy copilot-instructions.md or CLAUDE.md into the correct location (more details below).

The instructions include general guidance for how to behave, how to use CAIT tools, and how to use the Firecrawl search tools. The instructions may need to be adapted to suite different setups.

If you are working in a Python environment you may want to make use of this agent prompt: python-coder.agent.md. There is also research-assistant.agent.md for deep research.

There is also a skill file called project-planning.md which is helpful for planning the implementation details of a project. The agent produces a PLAN.md file and TASKS.md file.

Environment Variables

Variable	Default	Description
`CAIT_FILES_PATH`	`~/.cait/files/`	Directory for downloaded files and document cache
`CAIT_DISABLE`	(empty)	Comma-separated module names to exclude at startup (e.g. `wiki,arxiv`)

Client Configuration

VS Code (GitHub Copilot)

Add to your workspace .vscode/mcp.json or user settings.json:

{
  "servers": {
    "bitfreak/cait": {
      "type": "stdio",
      "command": "/absolute/path/to/.venv/bin/python",
      "args": ["-m", "cait.server"],
      "cwd": "/absolute/path/to/CAIT"
    }
  }
}

For user settings.json, nest the above under "mcp": { ... }.

Copy copilot-instructions.md into your project's .github/ folder to give Copilot guidance on using CAIT tools effectively.

Claude Desktop

Edit claude_desktop_config.json:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "bitfreak/cait": {
      "command": "/absolute/path/to/.venv/bin/python",
      "args": ["-m", "cait.server"],
      "env": {
        "PYTHONPATH": "/absolute/path/to/CAIT"
      }
    }
  }
}

Claude Code

claude mcp add cait -e PYTHONPATH=/absolute/path/to/CAIT \
  -- /absolute/path/to/.venv/bin/python -m cait.server

Copy CLAUDE.md to your project root (or to ~/.claude/CLAUDE.md for global use) to give Claude Code guidance on using CAIT tools effectively.

Recommended MCP Servers

Firecrawl

Firecrawl is a web scraping and search API that pairs naturally with CAIT, adding powerful web search, full-page scraping, and site crawling. A free API key is available at firecrawl.dev.

Serena

Serena provides many tools for semantic code retrieval and editing. Both CAIT and Serena include a similar memory system so it is recommended to disable one of them.

Tool Reference

File System — `fs`

Tool	Description
`get_file_info`	Metadata for a single file: size, line count, permissions, timestamps. Does not read content.
`get_dir_info`	Directory listing with per-entry metadata. Supports glob patterns and recursion.
`append_file`	Append text to a file. Useful for NOTES.md, TASKS.md, log files.
`download_file`	Download a URL to `~/.cait/files/` (or `CAIT_FILES_PATH`). Returns the local path.
`fetch_url`	HTTP GET/POST with custom headers and body. Use `save_to` to avoid large responses in context. `convert=True` returns clean markdown via Docling or MarkItDown.

Persistent Python REPL — `repl`

Tool	Description
`repl_exec`	Execute Python code in a persistent session. Variables, imports, and function definitions survive between calls. Returns stdout, stderr, and exception info.
`repl_read`	Inspect a named variable from the REPL session without executing code. Returns repr, type, and JSON value.
`repl_vars`	List all user-defined variables in the current REPL session. Returns name, type, repr, and JSON value for each. Useful for reviewing session state without running code.
`repl_reset`	Clear all variables and imports from the REPL session.

Code Analysis — `code`

All code tools perform AST-aware search — they skip occurrences in comments and strings, unlike text grep.

Tool	Description
`find_definitions`	Find all definitions of a function, class, or variable. Returns file, line, docstring, and kind.
`find_calls`	Find all call sites of a function. Matches bare calls, method calls, and chained calls.
`find_imports`	Find all files that import a given module or name.
`find_references`	Find all uses of an identifier (loads, stores, deletes, attribute accesses).

Text Search & Embeddings — `text`

Uses all-MiniLM-L6-v2 (bundled with ChromaDB — no separate download). Chunk embeddings are cached in memory so repeated queries on the same document skip re-embedding.

Tool	Description
`search_text`	Semantically search or summarize a text string or plain text file (`.txt`, `.md`, `.rst`). Query given → extract mode (most relevant chunks). Query empty → summarize mode (most representative chunks).
`encode_text`	Return raw 384-dimensional float embeddings for one or more strings or files.
`text_similarity`	Cosine similarity between two texts (0–1).

Document Tools — `document`

Tool Description

convert_doc Convert PDF, DOCX, PPTX, XLSX, HTML, LaTeX, images, audio, and more to markdown or plain text. Backends: docling (higher quality, layout-aware), markitdown (lighter, better for Office files), auto (tries docling, falls back to markitdown). Use save_to to write large outputs to a file. strip_tables=True removes noisy pipe-table syntax. rich_pdf=True enables Docling's code detection and formula extraction (slower).

search_doc Same as search_text but handles many document formats (PDF, DOCX, HTML, URLs). Converts via convert_doc on first call and caches the result — repeat calls are instant.

Wikipedia — `wiki`

Tool	Description
`wiki_search`	Search Wikipedia. Returns titles, snippets, word counts, and URLs.
`wiki_sections`	List all sections of a page as a table of contents (no text).
`wiki_section`	Get the text of a specific section. Use `wiki_sections` first to find section titles.
`wiki_page`	Get full page text or just the summary (`summary_only=True`). Supports non-English via `language` parameter.

arXiv — `arxiv`

Tool	Description
`arxiv_search`	Search arXiv. Supports field prefixes (`ti:`, `au:`, `abs:`, `cat:`) and boolean operators. Returns metadata for up to 100 papers.
`arxiv_paper`	Fetch a paper by ID. `full_text=False` (default) returns abstract + metadata. `full_text=True` downloads and converts the full PDF. Use `save_to` for large outputs.

Datetime & Utilities — `utils`

Tool	Description
`get_datetime`	Current date, time, timezone, UTC offset, weekday, and Unix timestamp. Accepts any IANA timezone name.
`timer_start`	Start a named wall-clock timer.
`timer_stop`	Stop a timer and return elapsed seconds.
`timer_list`	List all running timers and their current elapsed time.
`diff_text`	Unified diff between two strings or files. Returns diff text plus added/removed line counts.

Memory Database — `memory`

Persistent ChromaDB vector store at ~/.cait/files/ (shared across projects). Content is embedded with all-MiniLM-L6-v2 for semantic retrieval.

Tool	Description
`mem_add`	Add a new entry. Fields: `title`, `content` (embedded), `tags`, `description`, `source`, `entry_id`.
`mem_search`	Find entries by semantic similarity to a query. Optionally filter by tags.
`mem_get`	Retrieve a full entry by ID.
`mem_list`	List entries sorted by date (newest first). Content omitted for brevity.
`mem_set`	Update fields of an existing entry. Only non-empty values are applied.
`mem_edit`	Edits content in-place — regex replace when pattern is given, or append when not.
`mem_delete`	Permanently delete an entry by ID.
`mem_find`	Fast metadata scan — no embedding. Match by title substring, exact source URL, or tags. Use this for deduplication checks before `mem_add`.

Disabling Modules

Set CAIT_DISABLE to a comma-separated list of module names to exclude their tools at startup:

CAIT_DISABLE=wiki,arxiv python -m cait.server

Available module names: fs, text, code, repl, wiki, arxiv, utils, memory, document

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
agents		agents
cait		cait
instructions		instructions
skills		skills
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
icon.png		icon.png
manifest.json		manifest.json
smithery.yaml		smithery.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAIT - Core AI Toolkit

Requirements

Installation

Environment Variables

Client Configuration

VS Code (GitHub Copilot)

Claude Desktop

Claude Code

Recommended MCP Servers

Firecrawl

Serena

Tool Reference

File System — `fs`

Persistent Python REPL — `repl`

Code Analysis — `code`

Text Search & Embeddings — `text`

Document Tools — `document`

Wikipedia — `wiki`

arXiv — `arxiv`

Datetime & Utilities — `utils`

Memory Database — `memory`

Disabling Modules

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CAIT - Core AI Toolkit

Requirements

Installation

Environment Variables

Client Configuration

VS Code (GitHub Copilot)

Claude Desktop

Claude Code

Recommended MCP Servers

Firecrawl

Serena

Tool Reference

File System — fs

Persistent Python REPL — repl

Code Analysis — code

Text Search & Embeddings — text

Document Tools — document

Wikipedia — wiki

arXiv — arxiv

Datetime & Utilities — utils

Memory Database — memory

Disabling Modules

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

File System — `fs`

Persistent Python REPL — `repl`

Code Analysis — `code`

Text Search & Embeddings — `text`

Document Tools — `document`

Wikipedia — `wiki`

arXiv — `arxiv`

Datetime & Utilities — `utils`

Memory Database — `memory`

Packages