Your AI knows what you said. It doesn't know what you decided.
AI memory stores content -- what was discussed, what was mentioned, what came up. None of it stores cognitive structure: which positions are settled, which were rejected and why, which questions are still open. Ask an AI about something you resolved three sessions ago and it surfaces everything -- proposal and rejection, old draft and final decision -- with equal weight and no sense of which is current.
Cairn maintains a typed reasoning graph -- propositions, contradictions, refinements, syntheses, tensions -- with confidence scores and lifecycle status. An LLM with access to this graph knows the state of your thinking, not just a flat log of things you said.
git clone https://github.com/smcady/Cairn.git cairn && cd cairn
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"Create .env.local with your API key:
ANTHROPIC_API_KEY=sk-ant-...
Then, from any project you want to use Cairn in:
cairn initThat's it. Cairn uses local embeddings (fastembed) by default, so no additional API keys are needed. For higher-quality embeddings, optionally add a Voyage AI key:
VOYAGE_API_KEY=pa-... # optional: auto-detected, upgrades embedding quality
Cairn has two integration surfaces. Choose the one that matches how you work.
If you use Claude Code as your AI development tool, cairn integrates via hooks and an MCP server. No application code required.
From your project directory:
cairn initThis configures everything in one step:
.claude/settings.json-- Stop hook (captures conversations) + Orient hook (injects prior reasoning before each response).mcp.json-- MCP server exposing graph query tools- Database -- initialized at
./cairn.db
All paths are resolved automatically. Restart your Claude Code session after running cairn init.
Manual setup (if you prefer not to use cairn init)
See .mcp.json.example for the MCP server template. Hooks go in .claude/settings.json:
{
"hooks": {
"Stop": [{"matcher": "", "hooks": [{"type": "command",
"command": "CAIRN_DB=\"/path/to/db\" /path/to/cairn/.venv/bin/python /path/to/cairn/scripts/hook_ingest.py"}]}],
"UserPromptSubmit": [{"matcher": "", "hooks": [{"type": "command",
"command": "CAIRN_DB=\"/path/to/db\" /path/to/cairn/.venv/bin/python /path/to/cairn/scripts/hook_orient.py",
"timeout": 10000}]}]
}
}Replace all /path/to/ values with absolute paths.
| Tool | When to use it |
|---|---|
harness_orient(topic) |
Before answering on any topic discussed in prior sessions |
harness_query('decision_log') |
"What did we decide about X?" |
harness_query('current_state') |
"Where do things stand overall?" |
harness_query('disagreement_map') |
"What's still unresolved?" |
harness_search(query) |
Find specific nodes before re-opening a discussion |
harness_status |
Quick graph overview at session start |
harness_trace(node_id) |
"How did we arrive at this position?" |
For a concrete example of what Cairn captures from a 4-turn pricing conversation, including classifier output, confidence changes, and tool responses, see docs/walkthrough.md.
After a few conversations, harness_status returns the current state of your reasoning graph:
## Graph Stats
total_nodes: 42
total_edges: 38
active: 35
resolved: 5
propositions: 28
questions: 8
tensions: 2
## Active Propositions
- [a1b2c3d4e5f6] (confidence: 0.8, support: 2, challenges: 0)
Ship with project-level .mcp.json as the default configuration
- [b2c3d4e5f6a1] (confidence: 0.7, support: 1, challenges: 1)
SQLite is sufficient for single-user deployment
## Open Questions
- [c3d4e5f6a1b2] Does the classifier correctly handle purely operational
exchanges (no reasoning content) by producing zero events?
## Syntheses
- [d4e5f6a1b2c3] The event log is the truth; the graph is a derived view.
Any node's current status is computed from the full chain of events
that touch it.
If you're building an agent or application with the Anthropic SDK, cairn integrates as a library. You control the capture and retrieval loop.
Capture -- one import change. Every messages.create() and messages.stream() call auto-ingests into the graph as a background task.
# Before
from anthropic import AsyncAnthropic
# After
from cairn.integrations.anthropic import AsyncAnthropicRetrieval -- query the graph before each turn and inject context into the system prompt. You decide when and how to orient.
import cairn
cairn.init(db_path="./my_project.db") # or set CAIRN_DB env var
# Orient on a topic: returns structured summary of settled/contested/open
context = await cairn.orient("pricing strategy")
# Query a specific view: current_state, decision_log, disagreement_map, coverage_report
decisions = cairn.query("decision_log")
# Direct engine access for search
engine = cairn.get_engine()
results = await engine.search_nodes("usage-based pricing", k=5)For a complete agent loop with automatic orientation before each turn, see examples/agent_loop.py.
Every exchange runs through a classify, resolve, mutate pipeline:
Content
| [classifier] LLM extracts typed cognitive events
| [resolver] Vector search maps descriptions to existing graph nodes
| [mutator] Deterministic graph mutations
| [vector index] Embed new/updated nodes
v
Event Log (immutable) + Reasoning Graph (derived)
The event log is the truth -- append-only, ordered, every typed cognitive event. The reasoning graph is a view derived from it -- nodes are ideas, edges are typed relationships, every node carries a status computed from the full event chain. Add a contradiction and the original proposal is marked superseded. The AI reads the graph, not the history.
Each user builds their own graph; the database is gitignored. For details on scoping options and how capture works across surfaces, see docs/configuration.md.
- Python 3.12+
- ANTHROPIC_API_KEY -- classifier LLM
- VOYAGE_API_KEY -- optional. Use Anthropic's embedding partner for higher-quality vector embeddings (local fastembed used by default)
.venv/bin/python -m pytest tests/ -m "not integration" # unit tests (no API keys needed)
.venv/bin/python -m pytest tests/ -m integration # integration tests (requires API keys)
.venv/bin/python -m pytest tests/ # everythingSee docs/limitations.md for a full list. Key points:
- Single-user system -- each person builds their own graph; shared team memory is out of scope for this implementation
- Classifier is domain-dependent -- tested on business strategy conversations; may need tuning for other domains
- Confidence scoring -- evidence-strength-weighted deltas are implemented, but no recency decay yet
- Beyond Retrieval: A Case for Reasoning Memory -- the full argument
Questions or ideas? Start a discussion.
MIT
