Pyramid Memory

Agent Quick Start

You are an AI agent. Here is exactly what to do:

# 1. Install dependencies (sqlite-vec is critical, it's a SQLite extension for vector search)
pip install -r requirements.txt

# 2. Set your API key
export OPENAI_API_KEY=sk-...

# 3. Import your conversation history (pick ONE based on your source)
python cli.py import -w ~/memory --openclaw                              # Default: ~/.openclaw/agents/main/sessions
python cli.py import -w ~/memory --openclaw --source /path/to/sessions   # Custom OpenClaw path
python cli.py import -w ~/memory --glenn --source conversations.db       # Glenn SQLite format
python cli.py import -w ~/memory --claude --source conversations.json    # Claude JSON export

# 4. Sync: summarize, embed, synthesize, and write files
python cli.py sync -w ~/memory

# 5. Review the generated files - read MEMORY.md and each file in models/
#    Check that the synthesized content makes sense and captures key facts
#    The models/ directory contains topic-specific files discovered from conversations

Workspace structure: The -w/--workspace option specifies a directory that contains everything:

~/memory/
  pyramid.db          # Database (filename configurable via --db)
  MEMORY.md           # Generated memory file
  SOUL.md             # Hand-crafted identity (not overwritten)
  USER.md             # Hand-crafted user info (not overwritten)
  models/
    gavrie.md
    project-x.md
    ...

After generation: Read through MEMORY.md and all files in models/. These represent the system's understanding of you, the user, and discovered topics. Verify the facts are accurate and the temporal organization (recent vs. historical) makes sense.

Ongoing Sync

The sync command is your main ongoing mechanism. It detects new conversations, extracts observations, updates summaries, embeds, synthesizes, and writes files.

# Basic sync (processes any new observations, updates dirty models)
python cli.py sync -w ~/memory

# With incremental import from OpenClaw sessions
python cli.py sync -w ~/memory --source ~/.openclaw/agents/main/sessions

When to run sync:

After each conversation ends
On a schedule (e.g., every few hours)
Before starting work that needs fresh memory context

If starting fresh with no history: Use internal observe to add observations manually, then sync:

python cli.py internal observe -w ~/memory "User prefers dark mode"
python cli.py internal observe -w ~/memory "User's name is Alex"
python cli.py sync -w ~/memory

Note: observe is an internal command because single observations are assigned in small batches, which limits the system's ability to discover new models. Prefer bulk imports via import or incremental sync via sync --source.

TL;DR

A pyramidal memory system for AI agents. Extracts observations from conversations, organizes them into mental models (assistant, user, and discovered topics), compresses them into tiered summaries, and synthesizes coherent narratives. Query via semantic search or export to markdown files for full context loading in OpenClaw agents.

Key feature: Lazy updates with dirty tracking. Only models and summaries that have changed inputs are regenerated.

System Architecture

Pyramid Memory implements a hierarchical memory system designed for AI agents to maintain long-term context across conversations. The architecture has five layers:

┌─────────────────────────────────────────────────────────────┐
│                    SYNTHESIS LAYER                          │
│  LLM synthesizes pyramids into coherent mental models       │
│  Results cached in DB (content_dirty tracking)              │
├─────────────────────────────────────────────────────────────┤
│                    RETRIEVAL LAYER                          │
│  Pyramid retrieval, semantic search, markdown export        │
├─────────────────────────────────────────────────────────────┤
│                    COMPRESSION LAYER                        │
│  Tier 0: 10 obs → Tier 1: 10 T0 → Tier 2: 10 T1 → ...      │
│  Sources tracked, is_dirty propagation                      │
├─────────────────────────────────────────────────────────────┤
│                    ORGANIZATION LAYER                       │
│  Mental models: assistant, user, + discovered topics        │
├─────────────────────────────────────────────────────────────┤
│                    EXTRACTION LAYER                         │
│  LLM tool calls extract observations from conversations     │
└─────────────────────────────────────────────────────────────┘

Core Concepts

Observations

Factual statements extracted from conversations.

Schema:

text: String, single factual sentence
timestamp: DateTime, when the observation occurred
model_id: Foreign key to the assigned mental model (initially NULL)

Extraction behavior:

LLM processes conversation chunks via add_observation tool calls
Captures specific facts: names, dates, numbers, places, preferences
Avoids meta-observations ("user shared info") in favor of concrete facts ("User's son Tom is 8")

Example observations:

User prefers dark mode in all applications
User relocated to Austin in May 2025
User is starting a consulting practice focused on AI
User mentioned enjoying coffee in the morning

Mental Models

Categories that organize observations and summaries. Each model represents a conceptual entity.

Base models (always present):

Name	Purpose
`assistant`	Agent's own experience, reflections, insights, preferences, evolving self-understanding
`user`	Primary user's identity, preferences, projects, life events

Discovered models: Created automatically during summarization when the LLM identifies distinct entities (specific people, projects, companies, topics) that warrant separate tracking.

Schema:

name: String, unique identifier (lowercase, hyphenated)
description: String, derived from highest-tier summary
is_base: Boolean, true for assistant/user
synthesized_content: Text, cached synthesis result
content_dirty: Boolean, true when synthesis needs regeneration

Model assignment behavior:

Unassigned observations are processed before tier-0 summarization
LLM calls assign_model for each observation
New models are created on-demand when model_name doesn't exist

Summaries

Narrative prose summaries representing observations or lower-tier summaries. Uses a tiered structure where each tier compresses STEP (10) items from the tier below.

Tier structure:

Tier	Compresses	Count
0	10 observations	10
1	10 tier-0 summaries	100
2	10 tier-1 summaries	1000
N	10 tier-(N-1) summaries	10^(N+1)

Schema:

model_id: Foreign key to mental model
tier: Integer, compression level
text: Summary text in narrative prose
start_timestamp: DateTime, coverage start
end_timestamp: DateTime, coverage end
is_dirty: Boolean, true when summary needs regeneration

Summary sources: The summary_sources table tracks which observations or summaries went into each summary, enabling dirty propagation when inputs change.

Summary format:

Summaries are written in clear, readable narrative prose. Importance is conveyed through word choice (e.g., "significantly", "notably", "critically") rather than markers or scores. Specific facts (names, dates, numbers, places) are preserved.

Dirty Tracking

The system uses lazy updates with dirty propagation:

When observation created/assigned: Model marked content_dirty = True
When summary regenerated: Parent summaries marked is_dirty = True, model marked content_dirty = True
When model synthesized: Result cached in synthesized_content, content_dirty = False

This ensures only affected models and summaries are regenerated during sync.

Pyramid Retrieval

For any model, retrieves all summaries from each tier, ordered by tier (highest first) and timestamp (newest first).

Example pyramid for model with tiers 0-2:

Tier 2:  [summary covering 1000 observations]
Tier 1:  [summary] [summary] ...
Tier 0:  [summary] [summary] [summary] ...

This structure ensures:

Recent events have high granularity
Older events are compressed but retained
Total context stays bounded regardless of history length

Model Synthesis

When exporting to markdown, the pyramid and any unsummarized observations are synthesized into a coherent mental model organized by temporal sections.

Deduplication: Summaries from different tiers cover overlapping time periods by design (a tier-2 summary contains the same information as the tier-1 and tier-0 summaries it was created from). To avoid sending redundant content to the LLM, synthesis uses get_non_overlapping_summaries() which:

Includes all summaries from the highest tier
For lower tiers, only includes summaries whose end_timestamp exceeds all higher-tier coverage
Result: recent periods use lower tiers (more detail), older periods use higher tiers (already compressed)

Section	Time Range
Last 3 Days	Within 72 hours
This Week	3-7 days ago
This Month	7-30 days ago
This Quarter	30-90 days ago
This Year	90-365 days ago
Earlier	More than a year ago

Synthesis rules:

Newer details override older ones (e.g., if location changes, use most recent)
Duplicate facts are mentioned only once per section
Each section is self-contained to avoid cross-section repetition
Output is third-person narrative prose (except assistant model which uses first-person)

Database Schema

SQLite with sqlite-vec extension for vector search.

Tables

models

CREATE TABLE models (
    id INTEGER PRIMARY KEY,
    name VARCHAR UNIQUE NOT NULL,
    description TEXT,
    is_base BOOLEAN DEFAULT FALSE,
    synthesized_content TEXT,
    content_dirty BOOLEAN DEFAULT TRUE
);

observations

CREATE TABLE observations (
    id INTEGER PRIMARY KEY,
    text TEXT NOT NULL,
    timestamp DATETIME,
    model_id INTEGER REFERENCES models(id)
);

summaries

CREATE TABLE summaries (
    id INTEGER PRIMARY KEY,
    model_id INTEGER NOT NULL REFERENCES models(id),
    tier INTEGER NOT NULL,
    text TEXT NOT NULL,
    start_timestamp DATETIME NOT NULL,
    end_timestamp DATETIME NOT NULL,
    is_dirty BOOLEAN DEFAULT FALSE
);

summary_sources

CREATE TABLE summary_sources (
    id INTEGER PRIMARY KEY,
    summary_id INTEGER NOT NULL REFERENCES summaries(id),
    source_type TEXT NOT NULL,  -- 'observation' or 'summary'
    source_id INTEGER NOT NULL
);

memory_vec (virtual table for embeddings)

CREATE VIRTUAL TABLE memory_vec USING vec0(
    id INTEGER PRIMARY KEY,
    source_type TEXT,      -- 'observation' or 'summary'
    source_id INTEGER,
    embedding float[1536]  -- text-embedding-3-small dimensions
);

imported_sessions

CREATE TABLE imported_sessions (
    id INTEGER PRIMARY KEY,
    file_path VARCHAR UNIQUE NOT NULL,
    last_size INTEGER NOT NULL,
    last_mtime DATETIME NOT NULL
);

LLM Integration

Configuration

Parameter	Value
Model	`gpt-4.1-mini`
Max tokens per call	~10,000
Token estimation	~4 chars per token
Embedding model	`text-embedding-3-small`
Embedding dimensions	1536

Tool Definitions

add_observation - Used during import/extraction

{
    "name": "add_observation",
    "parameters": {
        "text": {"type": "string", "description": "Single factual sentence"}
    }
}

assign_model - Used during tier-0 summarization

{
    "name": "assign_model",
    "parameters": {
        "observation_id": {"type": "integer"},
        "model_name": {"type": "string", "description": "assistant, user, or new topic name"}
    }
}

Chunking Strategy

When processing exceeds MAX_TOKENS:

Estimate tokens using 4 chars/token heuristic
Split into chunks under limit
Process each chunk in parallel (default 10 workers)
Aggregate results

CLI Reference

All commands require --workspace / -w to specify the workspace directory.

Main Commands

`import`

Extract observations from existing conversation data.

# Glenn format (SQLite database)
python cli.py import -w ~/memory --glenn --source conversations.db \
    --parallel 10 \
    --conversation 42 \
    --limit 1000

# Claude format (JSON export)
python cli.py import -w ~/memory --claude --source conversations.json \
    --parallel 10 \
    --limit 1000

# OpenClaw format (JSONL sessions)
python cli.py import -w ~/memory --openclaw  # uses default ~/.openclaw/agents/main/sessions
python cli.py import -w ~/memory --openclaw --source /path/to/sessions \
    --limit 1000

Flag	Description
`-w`, `--workspace`	Workspace directory (required)
`--db`	Database filename (default: pyramid.db)
`--glenn`	Glenn SQLite database format
`--claude`	Claude JSON export format
`--openclaw`	OpenClaw JSONL session format
`--source`	Path to source file/directory (optional for openclaw)
`--parallel`	Number of parallel workers (default: 10)
`--conversation`	Process specific conversation ID only (glenn only)
`--user`	Filter by username (glenn only)
`--limit`	Limit number of messages

`sync`

Main command for ongoing sync. Processes dirty items and writes files.

python cli.py sync -w ~/memory
python cli.py sync -w ~/memory --source ~/.openclaw/agents/main/sessions
python cli.py sync -w ~/memory --parallel 20

Flag	Description
`-w`, `--workspace`	Workspace directory (required)
`--db`	Database filename (default: pyramid.db)
`--source`	Path to sessions directory for incremental import
`--parallel`, `-p`	Number of parallel workers (default: 10)

What sync does:

If --source provided: incrementally import new messages from OpenClaw sessions
Assign unassigned observations to models
Create new tier-0 summaries (groups of 10 observations)
Create higher-tier summaries (groups of 10 lower-tier summaries)
Process dirty summaries (regenerate if inputs changed)
Embed new observations and summaries
Synthesize dirty models
Write markdown files to workspace

`search`

Semantic search across memory with optional temporal weighting.

python cli.py search -w ~/memory "What programming languages does the user prefer?"
python cli.py search -w ~/memory "user's family" --limit 10 --raw
python cli.py search -w ~/memory "recent projects" --time-weight 0.5  # favor recent results
python cli.py search -w ~/memory "historical facts" --time-weight 0   # pure semantic

Flag	Description
`-w`, `--workspace`	Workspace directory (required)
`--db`	Database filename (default: pyramid.db)
`--limit`	Number of results (default: 20)
`--raw`	Show raw results without LLM synthesis
`--time-weight`	Time decay weight from 0-1 (default: 0.3). 0 = pure semantic similarity, 1 = heavy recency bias.

Internal Commands

For debugging and manual control. Accessed via cli.py internal COMMAND.

`internal observe`

Add a single observation manually. Note: single observations are assigned in small batches, limiting the system's ability to discover new models. Prefer bulk imports.

python cli.py internal observe -w ~/memory "User prefers vim keybindings"

`internal summarize`

Run summarization only.

python cli.py internal summarize -w ~/memory
python cli.py internal summarize -w ~/memory --max-obs 100  # Limit observations
python cli.py internal summarize -w ~/memory --max-tier 1   # Only build up to tier 1

`internal embed`

Generate embeddings only.

python cli.py internal embed -w ~/memory
python cli.py internal embed -w ~/memory --force  # Re-embed everything

`internal generate`

Write markdown files from cached synthesis (no LLM calls).

python cli.py internal generate -w ~/memory

`internal synthesize`

Synthesize dirty models without writing files.

python cli.py internal synthesize -w ~/memory

Output Files

Output	Contents
`MEMORY.md`	Synthesized self + user memories, index of other models
`models/{name}.md`	Individual model files for non-core models

Not generated (identity files, hand-crafted):

SOUL.md - Who the assistant is
USER.md - Who the user is

MEMORY.md Format

# Memory

Synthesized memory from conversations. SOUL.md and USER.md are identity files and not overwritten.

---

## Self

[Synthesized assistant/self observations organized by time...]

---

## User

[Synthesized user observations organized by time...]

---

## Other Models

- [models/project-a.md](models/project-a.md): Project A - description
- [models/person-b.md](models/person-b.md): Person B - description

Module Reference

`db.py`

SQLAlchemy models and database initialization.

Model, Observation, Summary, SummarySource, ImportedSession - ORM classes
get_engine(db_path) - Create SQLAlchemy engine
get_session(db_path) - Create session
init_db(db_path) - Initialize tables, run migrations, create base models
migrate_db(db_path) - Add new columns to existing databases

`llm.py`

LLM integration for observation extraction.

client - OpenAI client instance
MODEL - Model name constant
estimate_tokens(text) - Token count estimation
chunk_messages(messages) - Split messages into processable chunks
process_chunk(chunk) - Extract observations from a chunk
extract_observations(messages, on_progress, max_workers) - Main extraction entry point

`summarize.py`

Summarization pipeline with dirty tracking.

STEP - Items per summary (10)
assign_models_to_observations(session, observations) - Model assignment
mark_model_dirty(session, model_id) - Mark model for re-synthesis
mark_overlapping_summaries_dirty(session, model_id, timestamp) - Mark affected summaries
record_summary_sources(session, summary, source_type, source_ids) - Track what went into a summary
propagate_dirty_upward(session, summary) - Mark parent summaries dirty
run_tier0_summarization(db_path, on_progress, max_workers) - Run tier 0
run_higher_tier_summarization(db_path, on_progress, max_workers) - Run tiers 1+
process_dirty_tier0(db_path, on_progress, max_workers) - Regenerate dirty tier-0 summaries
process_dirty_higher_tiers(db_path, on_progress, max_workers) - Regenerate dirty higher-tier summaries
process_all_dirty(db_path, on_progress, max_workers) - Process all dirty summaries

`pyramid.py`

Pyramid retrieval and synthesis.

get_pyramid(session, model_id) - Returns dict of tier → summaries
get_unsummarized_observations(session, model_id, by_tier) - Get observations not yet in tier 0
get_non_overlapping_summaries(by_tier) - Filter summaries to avoid tier overlap
synthesize_model(name, description, by_tier, unsummarized_obs) - Generate coherent mental model narrative
prepare_model_data(session, model, ref_date) - Prepare data for synthesis
synthesize_dirty_models(db_path, on_progress, max_workers) - Synthesize all dirty models, cache results

`embeddings.py`

Vector embedding utilities.

EMBEDDING_MODEL - Model name
EMBEDDING_DIM - Dimension count (1536)
enrich_for_embedding(text, timestamp, end_timestamp) - Prepend temporal context
embed_many(texts, max_workers, on_progress) - Batch embed with parallel processing
search_memory(conn, query_text, limit, time_weight) - Search memory with temporal reranking

`loaders.py`

Message loading from various formats.

load_glenn_messages(source, conversation, user, limit) - Load from Glenn SQLite format
load_claude_messages(source, limit) - Load from Claude JSON export
load_openclaw_messages(source, limit) - Load from OpenClaw JSONL sessions
load_openclaw_incremental(source, session_tracking) - Load only new messages since last sync

`generate.py`

Markdown generation from cached synthesis.

CORE_MODELS - List of core model names (assistant, user)
update_model_descriptions(session, on_progress) - Fill in missing descriptions
render_memory(assistant_content, user_content, other_models) - Generate MEMORY.md content
render_model_file(model) - Generate model file from cached synthesis
export_models(workspace, db_path, on_progress, model_ids) - Main export function

`sync.py`

Orchestration for the sync command.

embed_new_items(db_path, on_progress, max_workers) - Embed items without embeddings
write_model_files(db_path, workspace, on_progress) - Write markdown files
sync(workspace, db, source, on_progress, max_workers) - Main sync function

`cli.py`

Command-line interface with 4 main commands and internal subgroup.

Processing Flows

Sync Flow

sync command
├── If --source: load_openclaw_incremental → extract_observations → save to DB
├── run_tier0_summarization (assign + create new tier-0 summaries)
├── run_higher_tier_summarization (create new higher-tier summaries)
├── process_all_dirty (regenerate dirty summaries)
├── embed_new_items (embed new observations/summaries)
├── synthesize_dirty_models (synthesize dirty models, cache results)
└── export_models (write markdown files from cached synthesis)

Dirty Propagation Flow

New observation added
└── Model marked content_dirty = True

Summary regenerated
├── Record sources in summary_sources
├── Parent summaries marked is_dirty = True
└── Model marked content_dirty = True

Model synthesized
├── Result cached in synthesized_content
└── content_dirty = False

Search Flow

Query → get_embedding → memory_vec MATCH → candidates (3x limit)
     → fetch timestamps → compute_time_penalty → rerank by combined score
     → fetch Observation/Summary objects → LLM synthesis → answer

Configuration

Environment variables (via .env):

OPENAI_API_KEY=sk-...

Constants in code:

# llm.py
MODEL = 'gpt-4.1-mini'
MAX_TOKENS = 10000
CHARS_PER_TOKEN = 4

# summarize.py
STEP = 10

# embeddings.py
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIM = 1536
MAX_TOKENS_PER_REQUEST = 250000
MAX_ITEMS_PER_REQUEST = 2048

Testing

Tests are in the test/ directory. Run with pytest:

pytest test/ -v

File	Coverage
`test_db.py`	ORM models, relationships
`test_llm.py`	Token estimation, message chunking
`test_summarize.py`	Observation grouping, chunking
`test_pyramid.py`	Pyramid retrieval, time bucketing
`test_embeddings.py`	Serialization, constants
`test_loaders.py`	Message loading, week grouping
`test_generate.py`	Index rendering, constants

Dependencies

openai          # LLM and embeddings API
click           # CLI framework
python-dotenv   # Environment variable loading
sqlalchemy      # ORM and database
sqlite-vec      # Vector similarity search (pip installs this, it's a loadable SQLite extension)
pytest          # Testing

Note for agents: pip install -r requirements.txt handles everything including sqlite-vec. The extension is automatically loaded when needed. No manual SQLite compilation required.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.cursor/plans		.cursor/plans
test		test
.gitignore		.gitignore
RATIONALE.md		RATIONALE.md
README.md		README.md
cli.py		cli.py
db.py		db.py
embeddings.py		embeddings.py
generate.py		generate.py
llm.py		llm.py
loaders.py		loaders.py
pyramid.py		pyramid.py
requirements.txt		requirements.txt
summarize.py		summarize.py
sync.py		sync.py

Folders and files

Latest commit

History

Repository files navigation

Pyramid Memory

Agent Quick Start

Ongoing Sync

TL;DR

System Architecture

Core Concepts

Observations

Mental Models

Summaries

Dirty Tracking

Pyramid Retrieval

Model Synthesis

Database Schema

Tables

LLM Integration

Configuration

Tool Definitions

Chunking Strategy

CLI Reference

Main Commands

import

sync

search

Internal Commands

internal observe

internal summarize

internal embed

internal generate

internal synthesize

Output Files

MEMORY.md Format

Module Reference

db.py

llm.py

summarize.py

pyramid.py

embeddings.py

loaders.py

generate.py

sync.py

cli.py

Processing Flows

Sync Flow

Dirty Propagation Flow

Search Flow

Configuration

Testing

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`import`

`sync`

`search`

`internal observe`

`internal summarize`

`internal embed`

`internal generate`

`internal synthesize`

`db.py`

`llm.py`

`summarize.py`

`pyramid.py`

`embeddings.py`

`loaders.py`

`generate.py`

`sync.py`

`cli.py`

Packages