A macOS developer toolkit for evaluating, chatting with, and planning work using AI.
Run structured test cases against AI providers (Claude CLI and Codex) to measure how well they handle coding tasks. Define assertions — required text, file changes, command traces, and rubric-based quality checks — then inspect results with per-case grading details and saved artifacts. Compare providers side-by-side across suites of test cases.
Two chat modes in a toggleable panel:
- API Chat — Talk to Claude directly via the Anthropic API with streaming responses and persistent conversation history.
- Claude Code Chat — Interact with the Claude Code CLI, with session history, slash command autocomplete, image attachments, and message queuing.
Describe what you want to build in plain language and get a phased implementation plan. Execute phases one at a time with live progress tracking, completion checklists, and elapsed time monitoring. Plans are stored per repository and can be created, resumed, and managed from the app or CLI.