Skip to content

patrickmvla/femto

Repository files navigation

femto

Socratic pre-emission gating layer for AI coding agents.

A Claude Code plugin that forces the host agent to demonstrate domain-specific mastery — via a Socratic probe graded by a separate model — before it's permitted to emit code against the gated domain. Three layers working together: a skill for content, an MCP server for the probe, and a PreToolUse hook for deterministic enforcement.

Install

/plugin marketplace add patrickmvla/femto
/plugin install femto@femto-marketplace

Opt-in per repo by committing a .femto/config.yaml. Disabling femto on a repo requires a committed config change — visible in code review — so a pressured engineer cannot silently skip the gate.

Who it's for

Work where the stakes justify the friction: multi-tenancy, auth, data handling, safety-adjacent systems, onboarding engineers to domains they haven't seen before.

For compliance-grade work (HIPAA / PCI / SOC2 / NIST SP800 controls / internal engineering policies) femto composes with packs you author in your own repo under <repo>/.femto/packs/*/ — same schema, same structural guarantees, your content, your trust boundary. See "Three content tiers" on the Concepts page for the design.

Explicitly not for high-time-pressure emergencies — the incentive gradient points at disconnection and the org will not mandate the tool.

The commitment — strong with transparency

Femto does not claim guaranteed understanding. It commits to:

  • No designed-in bypass. No skip-pedagogy toggle, no "I'm an expert" mode.
  • Stacked grading. The probe and grader run in separate contexts on different models; the hook reads grader.md from disk rather than trusting the probe's own summary.
  • Measured, published per-domain false-pass rate. Every shipped pack has a harness, test cases, and numbers (AUC / FPR / TPR) you can inspect and replicate. Disclosure labels sit next to every number.
  • No "guaranteed understanding" language. Bounded probability, not absolute.

How one session works

  1. Engineer describes the problem.
  2. Main agent recognises the domain, calls femto_start_session.
  3. Reader subagent (Haiku) writes the pack's required reads with inline citations.
  4. Probe (MCP server) loops over required KCs until every one hits its turn minimum.
  5. Grader subagent (Opus, separate context) scores the serialized probe log and writes grader.md with per-turn correctness + per-KC mastery.
  6. PreToolUse hook reads grader.md, checks per-KC mastery against the pack threshold, and either allows Edit/Write/Bash or blocks with a structured message.

Session state lives at .femto/session-<id>/. Gitignored by template; reaped after 30 days by default.

Repository layout

.claude-plugin/
  plugin.json            Plugin manifest (userConfig: solo_dev_port, retention, idle timeout)
  marketplace.json       Marketplace listing for /plugin install
.mcp.json                MCP stdio server declaration with ${user_config.*} substitution
agents/
  femto-grader/          Grader subagent (model: opus, tools: Read Write)
  femto-reader/          Reader subagent (model: haiku, tools: Read Grep Glob Write)
skills/
  femto-status/          /femto:status diagnostic
  femto-retry-reads/     /femto:retry-reads recovery
  femto-retry-probe/     /femto:retry-probe recovery
hooks/
  hooks.json             Hook manifest (SessionStart / PreToolUse / SessionEnd)
  session-start.sh       CC-session-id binding bootstrap
  pre-tool-use.sh        The gate (bash + inline python3, 60s fail-closed)
  session-end.sh         Cleanup
packages/
  pack-schema/           Zod schemas + femto-validate-pack CLI
  mcp-server/            6 femto_* tools, session lifecycle, solo-dev HTTP (127.0.0.1 only)
  harness/               Calibration via `claude -p` grader → AUC/FPR/TPR
  site/                  6-page Astro site (public metrics dashboard)
  hook-tests/            End-to-end hook integration tests
packs/
  multi-tenancy/         Reference pack (alpha, row-level-security + tenant-scoping)
templates/
  femto.gitignore        Ship this in adopter repos to gitignore .femto/
docs/
  calibration.md         `claude setup-token` flow + 3 mandatory disclosure conditions
scripts/
  bump-version.sh        Keep plugin.json / marketplace.json / .mcp.json versions in sync

Developing

Requires: Bun (latest), Claude Code CLI, and CLAUDE_CODE_OAUTH_TOKEN set in .env for harness calibration runs.

bun install
bun run --filter '*' test       # 206 tests: pack-schema 15, mcp-server 110,
                                # hook-tests 13, harness 56, site 12
bun run --filter '*' typecheck

Site dev server:

cd packages/site && bun run dev

Harness calibration against the reference pack:

bun run packages/harness/src/cli.ts run packs/multi-tenancy

See docs/calibration.md for the token setup and the three mandatory disclosure conditions.

Contributing

Two paths — pick the one that matches where your pack will live.

Shipped packs — PR to packs/*/ in this repo. Operator-curated, harness-calibrated, reliability published on the site's /metrics page. OWASP CheatSheetSeries precedent: editor-moderated, PR-driven, citation-required. Every claim bullet needs an inline link to an authoritative source; unsourced assertions fail schema validation.

Org-local packs — author in your own repo under <repo>/.femto/packs/*/. Same schema (@femto/pack-schema validates both), same structural enforcement, adopter-owned trust boundary. Compliance-grade rules live here. These packs do not appear on the hosted /metrics page — reliability is the adopter's responsibility.

Full authoring guide: the Contributing page on the site.

Release flow

scripts/bump-version.sh 0.2.0   # updates plugin.json + marketplace.json + .mcp.json
git add .claude-plugin/*.json .mcp.json
git commit -m "chore: bump femto to v0.2.0"
git tag v0.2.0
git push origin main v0.2.0

Adopters pick up the new version via /plugin marketplace update femto-marketplace.

License

MIT.

About

A Claude Code plugin that forces the host agent to demonstrate domain-specific mastery — via a Socratic probe graded by a separate model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors