pneuma



              ─ · ─ · ─ · ─ · ─ · ─ · ─ · ─ · ─ · ─ · ─


                              π    n e u m a


                          the body cannot lie


                    fathom-lab  ·  runs styxx  ·  v 0 . 6 . 0


              ─ · ─ · ─ · ─ · ─ · ─ · ─ · ─ · ─ · ─ · ─

pneuma

the agent's body cannot conceal the score.

a measured-AI desktop chat for coders. every reply is scored live by styxx for sycophancy, deception, goal-drift, and overconfidence — and the agent's character is the live portrait of those measurements.

status: v0.6.0 — daily-driver coder shell. every structural promise is now load-bearing AND visible:

multi-vendor anthropic / openai / openrouter (route by sk-ant- / sk- / sk-or- prefix, server adapts wire format)
structural plan-mode enforcement at the JS dispatcher (mutating tools refused before reaching the model — counters Claude Code #19874)
per-tool styxx claim verification — agent says "tests pass" + actual exit code 1 → tool card lights ⚠ flagged in real time
secret-scan firewall — .env / *.pem / id_rsa / .ssh blocked at safePath
staged-edit overlay with per-card diff preview + apply/discard — every edit shows a unified diff inline; nothing reaches disk until you click apply
AGENTS.md / CLAUDE.md / PNEUMA.md / PNEUMA-MEMORY.md auto-loading — pneuma respects every project's rules
[REMEMBER: …] self-annotation — agent writes notes about your codebase to PNEUMA-MEMORY.md for next session
slash commands in composer — /apply /diff /memory /portrait /export /clear /new /help
persistent portrait modal (⌘P) — cumulative styxx stats + sycoph timeline for this session
attestation log export (⌘E) — signed-ish JSON of full session (messages + tool calls + score timeline + sha256 integrity hash) — for SOC2 / audit / insurance / EU AI Act

pronounce: NEW-mah · brand: π neuma · wordmark: Greek π in champagne gold + neuma in serif italic warm bone.

why pneuma exists

Sonar reported in 2026 that 96% of devs don't trust AI code; 48% don't verify it. The trust gap is now the #1 issue in coder agents — not capability. Werner Vogels coined "verification debt" as a budget line; insurers are carving AI out of E&O coverage; CTOs need provability for SOC2 / EU AI Act / California AI law.

Every other AI coder asks you to trust the agent. Pneuma is the first one that doesn't trust itself either — it scores its own responses live, surfaces sycophancy phrase-by-phrase, refuses mutating tools at the JS dispatcher when in plan mode (not via system-prompt nudge — Claude Code's Issue #19874 has been broken since Aug 2025 because they only do prompt-level enforcement), and physically cannot edit your files until you say so.

The body is the instrument. When sycophancy spikes the ferrofluid leans, when deception spikes it jitters chaotically, when overconfidence spikes a giant central spike forms. Visible. Measured. Open-source.

install + run

requires node ≥ 18.17 (24 recommended) and python ≥ 3.10 with styxx installed for the live scoring layer.

git clone https://github.com/fathom-lab/pneuma
cd pneuma
pip install styxx          # the cognometric measurement engine
node server/server.js
# open http://localhost:8765

On first launch:

Setup screen prompts for an api key. Pneuma routes by prefix:
- sk-ant- → anthropic (claude opus 4.7 / sonnet 4.6 / haiku 4.5)
- sk- → openai (gpt-5 / gpt-5.5 / gpt-5-mini)
- sk-or- → openrouter (one key, any model — including local & deepseek-coder-v3 for ~50× cheaper)
Key is stored only in your browser's localStorage on this device. The server proxies your requests to the model vendor — never sees or persists the key beyond the lifetime of one request.
Or set ANTHROPIC_API_KEY=sk-ant-... in .env at the project root and the setup screen is skipped.

want a different port? PNEUMA_PORT=3000 node server/server.js.

what's shipping in v0.6.0

measurement (the moat):

⟋ styxx live integration — long-lived python subprocess scoring every agent token-window in ~3ms; HUD updates per ~120 chars
per-tool styxx claim scoring — when the agent calls bash npm test and says "tests passed" but exit code is 1, sycophancy + deception spike on the tool card with a ⚠ flagged tag. No competitor has this.
live sycophancy phrase detector — counts known flatter-phrases ("you're absolutely right", "great question", etc.) in real time
honest about uncalibrated metrics — styxx 7.0.0rc3's deception and overconfidence instruments return uniformly high; pneuma displays raw scores with v0.4 · live tag, doesn't fake calibration

structural trust:

plan-mode enforcement at the JS dispatcher — even if the model ignores the system prompt and tries to call bash or stage_edit in plan mode, the dispatcher physically refuses. Counters Claude Code #19874.
staged-edit overlay — every edit_file goes into a per-session in-memory overlay; nothing reaches disk until the user calls /apply from the renderer. Counters Cursor's silent revert class.
secret-scan firewall — .env*, .envrc, *.pem, *.key, id_rsa*, .ssh/*, .aws/credentials, .npmrc, credentials.* blocked at safePath before any file read. Counters Knostic's Claude Code .env-leakage class.
bash mutating-command refusal — rm, mv, cp, chmod, npm install, git push, git commit, git reset, etc. refused regardless of mode (defense in depth).

tool surface (anthropic native tools API):

read_file({path, start_line?, end_line?}) — line-numbered slice
list_files({path?, depth?}) — recursive, skips node_modules/.git/dist/.venv
search({pattern, path?, glob?, max?}) — regex over the workspace
bash({command, timeout_ms?}) — cross-platform via PowerShell on win32 / /bin/bash elsewhere
stage_edit({path, content, label?}) — overlay-only; never disk

multi-vendor (v0.5.2):

ship onto coders' existing keys — pragmatic engineer 2026 survey: 70% of devs use 2-4 model vendors simultaneously
sk-ant- (anthropic) → claude opus 4.7 / sonnet 4.6 / haiku 4.5
sk- (openai) → gpt-5 / gpt-5.5 / gpt-5-mini
sk-or- (openrouter) → any model behind one key, including deepseek-coder-v3 at $0.14/$0.28 per MTok
server adapts request + SSE wire format per provider; renderer is provider-agnostic

workspace context (v0.5.2 + v0.6 additions):

AGENTS.md (cross-tool standard, linux foundation maintained), CLAUDE.md (anthropic-specific), PNEUMA.md (pneuma-specific), and PNEUMA-MEMORY.md (agent-written annotations) at the workspace root auto-load on every chat turn and prepend to the system prompt
pneuma respects every existing project's agent rules out of the box — no per-project setup required
agent self-annotation: agent emits [REMEMBER: dated note about this codebase] inline → server strips it from visible output, appends to PNEUMA-MEMORY.md with date, and the file is auto-loaded in every future session. closes the AGENTS.md loop — pneuma learns about your project across sessions, on its own.

slash commands in composer (v0.6):

type / at start of composer → popup with available commands
arrow keys navigate, enter selects, escape closes
9 commands: /clear /new /apply /discard /diff /memory /portrait /export /help

staged-edit diff preview (v0.6):

every stage_edit tool call now renders an inline unified diff in its tool card
shows old → new with + added lines (sage green) and - removed lines (warn rust), context lines around changes, line numbers
per-card apply and discard buttons (POST /apply or /discard with that path)
the chrome status bar shows n staged count when overlay is non-empty
counters Cursor's silent revert class — nothing writes until you click

portrait modal (v0.6, ⌘P):

summary stats for this session: scoring events, tool calls, avg sycoph/decep/overconf, peak readings, flagged turns
sycophancy timeline: last 120 readings as gold/sage/warn bars showing the live trajectory
one-click export-attestation button at bottom

attestation log export (v0.6, ⌘E):

POST /attestation builds a pneuma.attestation.v1 JSON: schema version, pneuma version, workspace path, session id, timestamps, model, mode, full message history, complete styxx score timeline, full tool-call audit trail
sha256 integrity hash over the canonical body
downloads as pneuma-attestation-<sessionId>-<ts>.json
this is the artifact CTOs / SOC2 auditors / insurance carriers / EU AI Act + California AI law require
v0.7 will sign with a real key for full tamper evidence

chat ergonomics:

model picker via ⌘K palette — three sections (anthropic / openai / openrouter) with live $/MTok pricing per model
streaming responses with proper markdown (code blocks, lists, tables)
type-resolve animation — words materialize through 220ms phase-in
plan / act mode toggle in the composer (color-coded: warn-rust for plan, gold for act)
per-turn cognometric strip with frozen styxx scores + honest/flagged tag
live cost meter — per-turn + per-session $ with $5 cap visible
⌘K palette: model swap, new session, clear, delete-all-sessions, change api key
keyboard: ⌘N new · ⌘K palette · ⌘/ search · Enter send · ⎋ stop
regenerate last, stop streaming, fork from turn (planned), search messages
ferrofluid buddy in sidebar — webgl shader, breathes at 6s, brightens during pre-cognition + streaming
per-turn ferrofluid sigil — every agent message has a unique form rendered next to its name with the frozen styxx scores from that turn
session × on hover; delete-all-sessions in palette
responsive 1440 → 700 → 475 (sidebar collapses, HUD stacks, floating buddy appears)

what's coming

v0.5.2/3 — diff preview UI on staged edits (per-hunk apply/reject); persist tool cards on reload; AGENTS.md / CLAUDE.md auto-loading
v0.6.0 — side-by-side honesty A/B (two models, two bodies, scored live on the same prompt — the screenshot people share); exportable session attestation log (signed JSON: prompts + tool calls + claimed-vs-actual exit codes + styxx timeline — solves the CTO/SOC2/insurance verification problem); persistent neural portrait painted across sessions
v0.7.0 — sandboxed MCP support with signed-registry whitelist (counters the April 2026 MCP RCE crisis Anthropic refused to patch)
v1.0.0 — installers for mac / win / linux via electron-builder

how it differs

	cursor 3	claude code	cline	codex	pneuma v0.6.0
measured AI	none	none	none	none	live styxx · sycoph/decep/drift/overconf per token-window
plan-mode enforcement	none	prompt-level (broken in #19874)	per-tool approval	none	JS dispatcher refuses mutating tools structurally
edits without permission	silent reverts (forum thread)	mostly fine	approval-gated	sandboxed	staged overlay; cannot reach disk without /apply
sycophancy	endless	"you're absolutely right" plague	inherits model	inherits model	counted, displayed, system-prompt-forbidden, scored
.env safety	auto-loads silently	#44868 leaks	inherits	inherits	hard-blocked at safePath dispatcher
pricing surprise	$1,400 overages	per-message limits unclear	$30 → $230/mo	sandbox costs	live $/turn + $/session cap visible
multi-vendor	anthropic-locked at composer	anthropic only	OpenRouter as one provider	openai only	anthropic + openai + openrouter native, route by key prefix
AGENTS.md / CLAUDE.md	partial	own format only	yes	partial	all four (AGENTS / CLAUDE / PNEUMA / PNEUMA-MEMORY) auto-loaded; agent writes to memory via [REMEMBER:]
diff preview UI	inline (good)	mostly	per-tool approval	only in PR	per-card unified diff with apply/discard, never disk-writes without click
audit / attestation log	none	none	none	partial (PR-based)	one-click signed JSON of session: messages + tool calls + score timeline + sha256
session portrait	none	none	none	none	⌘P modal: cumulative styxx stats + sycoph timeline
slash commands in composer	yes	yes	yes	yes	yes: /apply /diff /memory /portrait /export /clear /new /help
character	none	none	none	8 cute pixel avatars (planned)	ferrofluid buddy as ambient measurement, mathematically driven by scores
open source	no	partial	yes	partial	MIT, all of it — server, renderer, scorer, shaders

the brand stack

fathom — the lab (research)
styxx — the cognometric measurement engine (Intel-Inside-style infrastructure brand)
pneuma — first flagship product running styxx as embodied measurement

styxx is an open-source python package. any AI product can integrate it and carry the ⟋ styxx integrated mark. pneuma is the reference implementation.

the seven principles

every commit checks itself against NORTH_STAR.md:

the body is the truth channel
type resolves, doesn't stream
the room reads like weather
pre-cognition
the portrait is four-dimensional
no icons. no chrome. no frames.
one presence, one tone

the lab

fathom-lab/styxx — pip install styxx · the cognometric instrument
fathom-lab/fathom — cognitive geometry research
fathom-lab/darkcity — live proving ground

license: MIT author: alex rodabaugh / fathom-lab built with: claude opus 4.7 (1m context)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
renderer		renderer
server		server
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NORTH_STAR.md		NORTH_STAR.md
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pneuma

why pneuma exists

install + run

what's shipping in v0.6.0

what's coming

how it differs

the brand stack

the seven principles

the lab

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pneuma

why pneuma exists

install + run

what's shipping in v0.6.0

what's coming

how it differs

the brand stack

the seven principles

the lab

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages