Skip to content

sseshachala/maggy

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

330 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Maggy

From Claude Bootstrap to autonomous AI engineering.

πŸ€– AI agents (Claude Code, Codex, Gemini CLI): Read GETSTARTED.md first. Architecture, routing table, commands, conventions β€” everything you need in 2 minutes.

This project started as Claude Bootstrap β€” an opinionated set of skills, hooks, and rules for Claude Code. Over time it grew into something much bigger: a multi-model routing system, a persistent memory layer, an intent-tracking code graph, container-based orchestration, and a full engineering command center. The bootstrap scaffolding is still here, but the future of this project is Maggy β€” an autonomous engineering system that routes work across AI models, learns from outcomes, and manages the full development lifecycle.

We ship mWP (minimum wowable product, 5-7 on the 11-star scale), not MVP. Every feature should make you think "I need this" β€” not just "it works."

62 skills, TDD enforcement via Stop hooks, agent teams, persistent memory (Mnemos), intent tracking (iCPG), and multi-model AI command center. Works with Claude Code, Kimi CLI, and OpenAI Codex CLI.

Quick Start

git clone https://github.com/alinaqi/maggy.git
cd maggy && ./install.sh

# In any project directory
claude
> /initialize-project

Claude will validate tools, ask about your stack, create the repo structure, copy skills/rules/hooks, and spawn an agent team.

Maggy β€” Autonomous Engineering System

Maggy is the core of this project. It routes tasks across models, tracks performance, learns from outcomes, and manages the full development lifecycle from a local dashboard or CLI REPL.

  • Multi-model routing β€” semantic blast scoring routes tasks across Claude/Codex/Kimi/Ollama based on complexity, cost, and proven performance
  • Task blueprints β€” self-learning workflows; Maggy captures tool sequences from successful tasks and replays them with cheaper models
  • Chat β€” interactive sessions with markdown rendering, streaming, session persistence, and file upload
  • Execute β€” one-click TDD pipeline with iCPG context enrichment
  • Tasks β€” AI-prioritized inbox from GitHub Issues or Asana
  • Competitors β€” auto-discovered competitors + daily AI briefing
  • Insights β€” CLI session analysis, health signals, reviewer evaluation
  • Reviewer knowledge map β€” tracks which reviewer (CodeRabbit, Codex, local) is best at which finding category
cd maggy && pip install -e .
maggy serve   # dashboard at localhost:8080
maggy         # CLI REPL (runs from any project directory)

See maggy/README.md for setup and routing details.

Bootstrap Layer

The original scaffolding that sets up any project for AI-assisted development:

Layer What Why
Skills 62 skills loaded via @include in CLAUDE.md Language, framework, security, AI patterns
Rules Conditional rules (activate by file path) Quality gates, TDD workflow, security β€” only when relevant
Hooks Stop hooks for TDD loops Tests run after every Claude response, failures feed back automatically
Agents Team Lead + Quality + Security + Review + Merger + Feature Coordinated pipeline: spec β†’ test β†’ implement β†’ review β†’ PR
Memory Mnemos (typed graph on disk) Survives compaction, crashes, restarts
Intent iCPG (code property graph) Tracks why code exists, detects drift
Explore iCPG-powered code explorer trace_path, search_graph, query_graph instead of grep
Routing Plan-vs-execute classifier CLAUDE tier β†’ PLAN FIRST. DEEPSEEK/GEMINI β†’ EXECUTE DIRECTLY
Plugins Event-driven plugin system Drop folder into ~/.maggy/plugins/, auto-discovered on startup

Plugin System

Maggy has an mWP-first plugin architecture. Drop a folder with plugin.yaml + plugin.py into ~/.maggy/plugins/ or plugins/ β€” it's auto-discovered and loaded at startup. Works standalone with Claude Bootstrap (no Maggy server needed).

# plugin.yaml
id: my-plugin
version: 1
entrypoint: plugin.py
hooks:
  - event: on_pr_merged
    handler: handle_pr_merged
  - event: on_feature_shipped
    handler: handle_feature_shipped

First plugin: Build-in-Public β€” autonomous storyteller that notices your work, synthesizes a narrative, and publishes across channels without you asking.

PR merged β†’ AI extracts narrative arc β†’ anonymizes sensitive names
β†’ formats per channel (LinkedIn teaches, X punches)
β†’ schedules via Buffer API
  • Multi-channel: LinkedIn (professional deep dives) + X (sharp one-liners) β€” different voice per platform
  • Auto-redaction: anonymize.yaml replaces company names, strips revenue/user data
  • AI-powered: DeepSeek synthesizes the story β€” not templates
  • Zero-click: Triggers from hooks, never asks for manual approval

See skills/build-in-public/SKILL.md for channel best practices.

Skills (62)

Core β€” TDD, memory, intent tracking, code review, agent teams, security, commit hygiene, cross-agent delegation, Polyphony orchestration

Languages β€” Python, TypeScript, Node.js, React, React Native, Android (Java/Kotlin), Flutter

Databases β€” Supabase, Firebase, Cloudflare D1, DynamoDB, Aurora, Cosmos DB

AI β€” Agentic development, LLM patterns, AI models reference

UI β€” Web (Tailwind), mobile, visual testing, Playwright, PWA

Integrations β€” Stripe, Reddit, Shopify, WooCommerce, Medusa, Klaviyo, Teams, PostHog

See full skills catalog for details.

Cross-Tool Compatibility

Feature Claude Code Kimi CLI Codex CLI DeepSeek V4
Skills .claude/skills/ .kimi/skills/ .codex/skills/ via Claude Code
Instructions CLAUDE.md (uses skills) AGENTS.md via Claude Code
Memory 9-section XML summary None Encrypted blob / text summary Mnemos typed graph
Routing Manual Manual Manual 6-tier auto-routing

install.sh auto-detects installed tools. /sync-agents syncs config across tools on demand.

Memory: Mnemos vs. Codex vs. Claude Code

Every AI coding tool loses context on compaction. The difference is whether it prevents failure or just reacts to it.

Codex compaction is an opaque encrypted blob triggered by a single token counter. When it misfires, the agent enters documented "death spirals" β€” up to 26 compactions per session, re-reading the same files 10-20Γ—, burning 160M+ tokens on work that used to cost 89M. No telemetry surfaces why compaction fired. No memory survives the session.

Claude Code uses 9-section XML summarization at a hardcoded ~95% token threshold. The summary is opaque to the user, discard decisions are invisible, and critical context (active errors, file contents) is silently dropped. No cross-session recall, no team context, no signal that the agent is struggling before the summary happens.

Maggy Mnemos treats memory as a typed graph where goals and constraints are never evicted, while ephemeral context decays by relevance. A 4-dimension fatigue model (token pressure, scope scatter, reread ratio, error density) triggers consolidation early β€” in the COMPRESS state at 40-60% load, long before a death spiral. Mnemos measures re-read ratio explicitly β€” the leading indicator of a compaction death spiral. When the agent starts re-reading files it already read, fatigue rises and consolidation triggers before the context window is full. Every eviction decision is auditable in SQLite. Cross-session memory via Engram.

Codex Claude Code Maggy (Mnemos)
Compaction trigger Configurable token threshold, blind to workload Hardcoded ~95% token threshold, blind 4-dimension fatigue score β€” token-aware but not token-blind
What survives Opaque AES-encrypted blob (both paths) 9-section XML summary Typed memory nodes with per-type eviction (goals/constraints never evicted)
Transparency Zero β€” cannot audit the summary Readable but discard decisions invisible Fully auditable β€” SQLite + JSONL, every node and eviction on disk
Death spiral prevention None β€” known to compact for hours None β€” no pre-failure signal Re-read ratio + fatigue scoring triggers consolidation at 40-60%, before the window is full
Cross-session memory None None Engram store β€” typed, queryable, persists across sessions
Pre-compaction safety None β€” compacts reactively None Checkpoint written before compaction β€” critical nodes survive even if compaction fails

Routing: Maggy vs. the Landscape

Every AI tool claims to pick the right model. Here's how they actually compare:

OpenRouter Martian Portkey Semantic Router Maggy
Approach Performance-based, user-defined fallbacks LLM-as-Classifier, trained router model Gateway: retries, load balancing, rule-based Embedding similarity, pre-defined routes LLM-as-Classifier with cascading fallback
Classification cost None (user picks) API call (~$0.001) None (rule-based) None (embeddings) $0 (local qwen3)
Classifier resilience N/A Single point of failure N/A N/A Cascade: qwen3 β†’ kimi β†’ deepseek β†’ cache
Fatigue-aware No No No No Yes β€” 4-dimension fatigue, PRE_SLEEP/REM escalation
Mid-task switching No No No No Checkpoint-based state transfer (in progress)
Memory-aware Token count only No Token count only No Semantic: typed nodes, per-type eviction, re-read ratio
Self-learning No No No No Per-project routing profiles with success/failure tracking

Three things only Maggy has:

  1. Fatigue-aware routing β€” nobody routes based on agent state. When Mnemos detects PRE_SLEEP (0.60), Maggy skips cheap tiers. At REM (0.75), it forces premium models. OpenRouter can't do this. Martian can't. No paper proposes it.

  2. Cascading classifier resilience β€” every other router has a single point of failure. If Martian's classifier is down, routing stops. Maggy cascades through qwen3 β†’ kimi β†’ deepseek-flash β†’ cached tier. The classifier itself is multi-model.

  3. Semantic memory, not token counting β€” Portkey checks token_count > 8000 to switch context windows. Maggy tracks what KIND of memory matters: goals survive compaction, error traces decay, code-refs persist. Routes based on semantic importance, not a counter.

Core Concepts

TDD via Stop Hooks β€” tests run after every Claude response. Failures feed back automatically. No plugins needed. Details β†’

Mnemos Memory β€” typed graph on disk (goals, constraints, results, context). Survives compaction, crashes, multi-agent failures. 4-dimension fatigue model writes checkpoints before things go wrong. Details β†’

iCPG Intent Tracking β€” links every code change to a ReasonNode with intent, postconditions, and invariants. 6-dimension drift detection. Details β†’

Agent Teams β€” 6 agents with enforced pipeline (spec β†’ test β†’ implement β†’ review β†’ security β†’ PR). Only Feature agents can edit code. Details β†’

Usage

# New project
mkdir my-app && cd my-app
claude
> /initialize-project

# Existing project
cd my-existing-app
claude
> /initialize-project    # auto-detects existing code

# Update skills globally
cd "$(cat ~/.claude/.bootstrap-dir)"
git pull && ./install.sh

Docs

License

MIT β€” See LICENSE


Need help scaling AI in your org? Claude Code & MCP experts

About

What started as an opinionated Claude Code setup kit is now an autonomous AI engineering command center

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 86.7%
  • Shell 6.7%
  • JavaScript 5.4%
  • HTML 1.2%