Engram

Next-generation graph memory for agentic workflows

Ingest once with the LLM → recall with graph + vectors + scoring — no LLM on the read path.

🧠 Structured · ⚡ Async-first · 🔌 100+ models via LiteLLM · 🛡️ Decay, cache, rate limits · 🚀 Production-minded

📖 Developer docs · GitHub · Issues

Why Engram · Docs · Verify LiteLLM · Install · Quick start · Contributing · License

🚀 Why Engram? (Next-generation memory)

Most agent memory is a flat pile of chunks or a single vector index. That breaks down when you need structured relationships, incremental truth, and fast, cheap retrieval at scale.

Approach	Pain point	Engram
Chunk + vector RAG	Loses who/what/how; hard to reason about entities	Neo4j graph with dynamic labels & relationships from the LLM
One big summary	Goes stale; expensive to rewrite	Per-message extraction + strength decay so unused facts fade
“Ask the LLM” for every recall	Cost + latency	`recall()` = embeddings + vector index + BFS + composite score
Rigid schema	Doesn’t fit every domain	Schema emerges at runtime from structured JSON extraction

Engram is built for multi-step agents, copilots, and long-running workflows: isolated user_id namespaces, hooks for audit/telemetry, health checks, and a CLI smoke test (engram_memory-e2e) you can run in CI against a real database.

Documentation

Developer-focused guides live under docs/:

Guide	Topics
Documentation home	Index, links, orientation
Getting started	Install, LiteLLM check, first `ingest` / `recall`
Configuration	Environment variables, `Config`, `user_id` pattern
API overview	Clients, models, exceptions
Production & operations	Health, `engram_memory-e2e`, logging

On PyPI, the package metadata includes a Documentation URL that points to the same docs/ tree on GitHub.

✅ Verify your model with LiteLLM first

All production LLM traffic in Engram goes through LiteLLM (litellm.acompletion). Before you set LLM_MODEL / LLM_API_KEY in Engram, prove the same route works in a minimal call:

pip install litellm
# Quick check (adjust model + env for your provider):
python -c "import litellm; print(litellm.completion(model='gpt-4o-mini', messages=[{'role':'user','content':'Say OK.'}]).choices[0].message.content)"

Or async, matching what Engram uses internally:

import asyncio
import litellm

async def main():
    r = await litellm.acompletion(
        model="gpt-4o-mini",  # e.g. anthropic/claude-..., azure/deployment-name, openai/...
        messages=[{"role": "user", "content": "Reply OK."}],
        api_key="...",       # or rely on OPENAI_API_KEY / provider-specific env
        # api_base="...",   # enterprise / Azure / custom gateway
    )
    print(r.choices[0].message.content)

asyncio.run(main())

Use the exact model id (and api_base / api_version if required) in your LLM_* environment variables — see Quick start.

Integration status: the supported path is LiteLLMAdapter (engram_memory/llm/litellm_adapter.py). Legacy adapters under engram_memory/llm/ may exist for reference; new provider-specific integrations should prefer LiteLLM or land as clean PRs — we are open to contributions (see below).

✨ Features

1 LLM call to ingest · 0 LLM calls to recall — extract structured graph once; retrieve with vectors + traversal + scoring
Slim context, minimal tokens — only node summaries and relationship types are sent to the LLM (~735 tokens/ingest avg), not raw properties or embeddings
Token tracking & cost estimation — every IngestResult includes tokens_prompt, tokens_completion, tokens_total for precise cost monitoring
Batched Neo4j writes — nodes grouped by label and relationships grouped by type, written via UNWIND queries to minimize round-trips
Single-query graph traversal — variable-length Cypher replaces per-node BFS; one round-trip regardless of graph size
Update-aware extraction — LLM is instructed to update existing entities instead of creating duplicates, preventing graph bloat
Dynamic graph schema — labels, properties, and relationship types from the model, not hand-maintained DDL
Async-first — AsyncMemoryClient + sync MemoryClient wrapper
Composite ranking — α·vector_similarity + β·decay^hops + γ·strength
Memory decay & archival — strength fades; stale nodes drop out of active recall
Hierarchical cluster summaries — broad vs detailed search() modes
Two-tier embeddings — optional int8 + float32 path for speed/quality tradeoffs
Optimistic locking, rate limiting, circuit breaker — safer under load
Hooks + observability — lifecycle hooks, JSON logs, metrics, optional OpenTelemetry
Background tasks — decay, hierarchy rebuild, weight-learning telemetry

📦 Installation

PyPI release is in progress. Until the package is published, install from this repository:

git clone https://github.com/hackdavid/engram-memory.git
cd engram-memory
pip install -e .

When Engram is on PyPI, a normal install will be:

pip install engram-memory-sdk

Either path installs the runtime stack: Neo4j driver, Pydantic, LiteLLM, and local embeddings (SentenceTransformers + PyTorch) for EMBEDDING_PROVIDER=local (the default). If you use EMBEDDING_PROVIDER=openai, add the OpenAI SDK: pip install engram-memory-sdk[openai-embed] (after PyPI) or pip install -e ".[openai-embed]" from a clone.

End-to-end validation (production smoke)

Run from a directory that has .env or engram_memory/.env configured (see .env.example). Install first with pip install -e . from a clone, or from PyPI when it is available.

How to run

Use case	Command
Default (recommended)	`python -m engram_memory.cli.e2e_validate`
Same, from a Windows clone	`scripts\engram_memory-e2e.cmd` (repo root)
Pip console script (if on `PATH`)	`engram_memory-e2e`
Clone, package not installed	`python scripts/e2e_validate.py`

On Windows, engram_memory-e2e often fails with “not recognized” because Python’s Scripts folder is not on PATH. Prefer the python -m … row above, or add that Scripts directory to PATH (conda env, %LocalAppData%\Programs\Python\Python3xx\Scripts, etc.).

What the default run does

Health checks
Five sequential ingests (one LLM call each), with per-ingest timing logged
Ten recall / search scenarios and a graph snapshot

Flags and timeouts

Goal	How
Retrieval only (no writes)	`python -m engram_memory.cli.e2e_validate --skip-seed --user-id <id>` or set `E2E_USER_ID`
One LLM call for all seed text	`--batch-seed`
Tune wall-clock limits	`E2E_LLM_TIMEOUT_SEC` (default `120`), `E2E_INGEST_TIMEOUT_SEC` (default LLM timeout + 45s)

More options: python -m engram_memory.cli.e2e_validate --help.

Neo4j-only check

Bolt connectivity without the full SDK or LLM: python scripts/neo4j_verify_connectivity.py (same Neo4j env vars).

Quick Start

1. Set Environment Variables

# Required
export NEO4J_URI="bolt://localhost:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="your-password"

# LLM -- uses LiteLLM model naming (supports 100+ providers)
export LLM_MODEL="gpt-4o-mini"           # OpenAI
export LLM_API_KEY="sk-..."

# Or Anthropic:
#   LLM_MODEL="anthropic/claude-sonnet-4-20250514"
#   LLM_API_KEY="sk-ant-..."

# Or Azure OpenAI:
#   LLM_MODEL="azure/my-gpt4-deployment"
#   LLM_API_KEY="azure-key"
#   LLM_API_BASE="https://myresource.openai.azure.com/"
#   LLM_API_VERSION="2024-02-01"

# Optional LLM HTTP timeout (seconds) — forwarded to LiteLLM / httpx on ingest
# export LLM_REQUEST_TIMEOUT="120"

# Optional (all have sensible defaults)
export EMBEDDING_PROVIDER="local"         # or "openai"
export CACHE_ENABLED="true"
export ENABLE_BACKGROUND_TASKS="true"
export LOG_FORMAT="json"                  # or "text"

2. Async Usage (Recommended)

import asyncio
from engram_memory import AsyncMemoryClient, Config

async def main():
    config = Config()  # reads from environment variables
    async with AsyncMemoryClient(config) as client:
        # await client.health_check(ping_llm=True)  # optional wiring check

        # Ingest a message (1 LLM call, batched Neo4j writes)
        result = await client.ingest(
            user_id="user-123",
            text="I work at Google as a senior engineer in the ML team.",
            reference_id="msg-001",
        )
        print(f"Created {len(result.nodes_created)} nodes, "
              f"{result.relationships_created} relationships, "
              f"{result.tokens_total} tokens used")

        # Recall relevant context (0 LLM calls)
        context = await client.recall(
            user_id="user-123",
            query="What does the user do for work?",
            top_k=5,
        )
        for node in context.nodes:
            print(f"  [{node.score:.2f}] {node.summary}")

asyncio.run(main())

3. Sync Usage

from engram_memory import MemoryClient, Config

client = MemoryClient(Config())
result = client.ingest(user_id="user-123", text="I love hiking in the mountains.")
context = client.recall(user_id="user-123", query="hobbies")
client.close()

How It Works

Ingestion Pipeline

Every call to ingest(text) follows this optimised path:

User text
  |
  v
Trivial filter ------> skip ("hi", "ok", "thanks")   [0 LLM calls]
  |
  v
Rate limiter (token bucket)
  |
  v
Step 1: embed(text) -> query_vector                   [1 embedder call, reused below]
  |
  v
Step 2: vector_search(query_vector, top_k=5)          [1 Neo4j call]
         returns: elementId, label, summary, rel_types
         (NO raw properties, NO embeddings -- slim context)
  |
  v
Step 3: build_user_prompt(text + slim context)
         ~50-100 tokens for 5 context nodes
  |
  v
Step 4: LLM extraction -> nodes[] + rels[]            [1 LLM call]
         token usage captured for cost tracking
  |
  v
Step 5: Batch node upsert (UNWIND per label group)    [~2 Neo4j calls]
         reuse text embedding when summary == text
  |
  v
Step 6: Batch relationship MERGE (UNWIND per type)    [~1 Neo4j call]
         resolve temp_N -> real elementIds
  |
  v
Invalidate user cache
  |
  v
Return IngestResult (with token counts)

Key design decisions:

Embedding reuse -- the text embedding from step 1 is used for both context lookup and node storage; fresh embeddings are only computed for nodes whose summary differs from the input text.
Slim LLM context -- only node summaries and relationship type names are sent to the LLM, keeping prompt tokens minimal (~735 tokens/ingest on GPT-4-32k in benchmarks).
Batched writes -- nodes are grouped by label and written via UNWIND queries; relationships are grouped by type. A typical 4-node + 3-relationship ingest uses ~4 Neo4j round-trips instead of 7.
Update-aware extraction -- the LLM prompt explicitly instructs the model to emit "operation": "update" for entities already present in the context, preventing node duplication as the graph grows.

Recall Pipeline

Query text
  |
  v
Check per-user LRU cache ------> cache hit? return immediately  [0 calls]
  |
  v
Step 1: embed(query) -> query_vector          [1 embedder call]
  |
  v
Step 2: Neo4j vector search (top-K seeds)     [1 Neo4j call]
         properties cleaned: _embedding, _version, etc. stripped
  |
  v
Step 3: Single variable-length Cypher          [1 Neo4j call]
         MATCH path = (seed)-[*1..3]-(m)
         returns ALL reachable nodes in 1 round-trip
         (replaces N+1 per-node BFS queries)
  |
  v
Step 4: Composite scoring
         final_score = a * similarity + b * decay^hops + g * strength
  |
  v
Rank and return top-K ScoredNode[]
  |
  v
Cache result, return RecallResult

Zero LLM calls on the read path. All intelligence was front-loaded at ingestion.

Core Concepts

Strength Decay

Every node has a strength property (starts at 1.0). A background task periodically multiplies it by a decay factor, simulating memory fading over time.

strength_new = strength_old × decay_factor

Default: decay_factor=0.95, runs every 24 hours.

Nodes that drop below the archive_threshold (default: 0.01) are archived -- they get _archived=true and isCurrent=false, excluding them from recall queries.

Why this matters: Frequently reinforced memories (re-ingested facts) stay strong because each MERGE resets strength to 1.0. Stale facts naturally fade away.

# Configure decay via environment variables or Config
config = Config(
    decay_factor=0.90,           # faster decay (memories fade quicker)
    archive_threshold=0.05,      # archive earlier
    decay_interval_hours=12,     # run decay every 12 hours
)

Composite Scoring

Retrieval ranks nodes using three weighted signals:

Signal	Weight	Description
`vector_similarity`	α (0.50)	Cosine similarity between query embedding and node embedding
`decay^hops`	β (0.35)	Proximity penalty -- nodes further from seed get lower scores
`strength`	γ (0.15)	Memory strength -- recently reinforced facts score higher

final_score = α × vector_similarity + β × (decay ^ hops) + γ × strength

Example: A node with 0.9 similarity, 1 hop away, strength 0.8:

score = 0.5 × 0.9 + 0.35 × (0.5^1) + 0.15 × 0.8
      = 0.45  +  0.175  +  0.12
      = 0.745

# Customise weights
config = Config(
    score_alpha=0.6,    # prioritise semantic similarity
    score_beta=0.25,    # less weight on graph distance
    score_gamma=0.15,   # keep strength weight
)

Traversal

After vector search returns seed nodes, a single variable-length Cypher query expands outward through relationships in one Neo4j round-trip:

MATCH (seed) WHERE elementId(seed) IN $seedIds
MATCH path = (seed)-[*1..3]-(m)
WHERE m.userId = $userId AND m.isCurrent = true
WITH DISTINCT m, min(length(path)) AS hops
RETURN elementId(m) AS elementId, hops, m.strength AS strength, labels(m)[0] AS label

Each hop multiplies the score by a decay factor (score = decay^hops). Nodes whose score drops below min_score are filtered out in Python.

config = Config(
    traversal_decay=0.5,       # each hop halves the score
    traversal_max_depth=3,     # limit to 3 hops
    traversal_min_score=0.15,  # prune weak paths earlier
)

This replaces the previous per-node BFS approach (which could generate 100+ individual queries) with a single round-trip regardless of graph size.

Hierarchical Summary Tree

Nodes are grouped into clusters (e.g., "Work", "Hobbies", "Education"). Each cluster has a ClusterSummary node at Level 1 with an aggregated embedding.

Query modes:

Mode	Behaviour
`broad`	Only search Level 0-1 (cluster summaries)
`detailed`	Search all levels including leaf nodes
`auto`	Start broad; if best score < 0.5, fall through to detailed

result = await client.search(
    user_id="user-123",
    query="Tell me about their career",
    detail_level="auto",  # recommended
    top_k=10,
)

Trivial Filter

Before calling the LLM, Engram checks if the text is worth extracting. Short greetings, acknowledgements, and filler messages are skipped automatically, saving LLM costs.

Automatically skipped: "hi", "ok thanks", "lol", "sounds good", "bye", etc.

Not skipped: "I work at Google as a software engineer" (contains entities + facts).

# Add custom trivial patterns
from engram_memory.extractors.trivial_filter import is_trivial
is_trivial("skip this", custom_trivial_patterns=[r"skip this"])  # True

Two-Tier Embeddings

Enable coarse (int8 quantized) + fine (float32) embeddings. The coarse embedding can be used for fast approximate filtering before re-ranking with fine embeddings.

config = Config(
    two_tier_embedding=True,
    embedding_provider="local",
    embedding_model="all-MiniLM-L6-v2",
)

Optimistic Locking

Every node carries a _version counter. When updating a node, you can pass expected_version to ensure no concurrent write has happened since you last read it. If the version doesn't match, the update is a no-op, preventing data corruption.

Plugin Hooks

Extend Engram's lifecycle with custom hooks:

from engram_memory.hooks.base import Hook
from engram_memory.models import IngestResult, RecallResult

class AuditHook:
    """Log all ingest/recall events to an audit service."""

    async def pre_ingest(self, user_id: str, text: str) -> str | None:
        print(f"[AUDIT] Ingesting for {user_id}: {text[:50]}...")
        return None  # return modified text, or None to keep original

    async def post_ingest(self, user_id: str, result: IngestResult) -> None:
        print(f"[AUDIT] Ingested: {len(result.nodes_created)} nodes created")

    async def pre_recall(self, user_id: str, query: str) -> str | None:
        return None

    async def post_recall(self, user_id: str, result: RecallResult) -> None:
        print(f"[AUDIT] Recalled {len(result.nodes)} nodes (cached={result.from_cache})")

A built-in LoggerHook is included that logs all lifecycle events at INFO level.

Rate Limiting

LLM calls are rate-limited using a token-bucket algorithm to prevent cost overruns:

config = Config(
    llm_rate_limit_rpm=60,    # 60 requests per minute
    llm_rate_limit_burst=10,  # allow bursts of up to 10
)

When the bucket is empty, RateLimitExceededError is raised.

Circuit Breaker

LLM adapters include a circuit breaker. After N consecutive failures, the breaker opens and all subsequent calls immediately raise CircuitOpenError instead of hitting the API. This prevents cascading failures and excessive costs.

# Configured per LLM adapter
config = Config(
    llm_max_retries=3,  # retry up to 3 times with exponential backoff
)

Caching

Recall results are cached per-user with an LRU eviction policy and TTL:

config = Config(
    cache_enabled=True,
    cache_max_size=256,        # max entries in cache
    cache_ttl_seconds=300,     # 5-minute TTL
)

Cache is automatically invalidated when new data is ingested for a user.

Background Tasks

Three periodic tasks run in the background:

Task	Default Interval	Purpose
Strength Decay	24 hours	Multiply all node strengths by `decay_factor`, archive weak nodes
Hierarchy Rebuild	6 hours	Re-compute cluster summary embeddings from member nodes
Weight Learning	12 hours	Log traversal statistics for scoring weight optimisation

config = Config(
    enable_background_tasks=True,
    decay_interval_hours=24,
    hierarchy_rebuild_interval_hours=6,
    weight_learning_interval_hours=12,
)

API Reference

`AsyncMemoryClient`

Method	Description
`ingest(user_id, text, reference_id=None)`	Embed text, fetch slim context, LLM extraction, batched graph upsert; returns token counts
`ingest_batch(user_id, items)`	Ingest multiple messages (each `{"text": "...", "reference_id": "..."}`)
`recall(user_id, query, top_k=10)`	Vector search + single-query traversal + composite scoring; 0 LLM calls
`search(user_id, query, top_k=10, detail_level="auto")`	Hierarchical search through summary tree
`get_graph(user_id, page=1, page_size=100)`	Paginated snapshot of a user's memory graph
`delete_memory(user_id, node_id, cascade=False)`	Delete a node (cascade removes relationships too)
`get_node_history(user_id, node_id)`	Get the supersession chain for a node
`health_check(ping_llm=True)`	Check Neo4j, embedder, vector index, schema; optionally ping the LLM via LiteLLM

Key Models

Model	Purpose
`IngestResult`	Response from `ingest()`: `skipped`, `nodes_created`, `nodes_updated`, `relationships_created`, `tokens_prompt`, `tokens_completion`, `tokens_total`
`RecallResult`	Response from `recall()`: `nodes` (list of `ScoredNode`), `total_candidates`, `from_cache`
`ScoredNode`	A node with `element_id`, `label`, `summary`, `score`, `hops_from_seed`, `properties`
`GraphSnapshot`	Paginated graph view: `nodes`, `relationships`, `total_nodes`, pagination fields
`HealthStatus`	Aggregated health: `neo4j_connected`, `llm_reachable`, `embedding_model_loaded`, etc.

Exception Hierarchy

All exceptions inherit from EngramError:

Exception	When
`ConfigurationError`	Invalid SDK configuration
`ExtractionError`	LLM extraction fails after all retries
`CircuitOpenError`	LLM circuit breaker is open
`InvalidUserIdError`	user_id fails pattern validation
`HasRelationshipsError`	Non-cascade delete on node with relationships
`EmbeddingDimensionMismatchError`	Vector index dimensions differ from config
`RateLimitExceededError`	Token bucket exhausted
`MigrationError`	Schema migration failed
`ConcurrentModificationError`	Optimistic locking conflict

Configuration Reference

All fields can be set via environment variables (case-insensitive):

Variable	Type	Default	Description
`NEO4J_URI`	str	required	Neo4j connection URI
`NEO4J_USER`	str	required	Neo4j username
`NEO4J_PASSWORD`	str	required	Neo4j password
`NEO4J_DATABASE`	str	`neo4j`	Neo4j database name
`NEO4J_MAX_POOL_SIZE`	int	`50`	Connection pool size
`LLM_MODEL`	str	`gpt-4o-mini`	LiteLLM model string (e.g. `gpt-4o`, `anthropic/claude-sonnet-4-20250514`, `azure/<deployment>`)
`LLM_API_KEY`	str	None	API key for the LLM provider
`LLM_API_BASE`	str	None	Base URL (required for Azure, optional otherwise)
`LLM_API_VERSION`	str	None	API version (required for Azure)
`LLM_MAX_TOKENS`	int	`4096`	Max tokens per LLM response
`LLM_MAX_RETRIES`	int	`3`	Max LLM retry attempts
`LLM_RATE_LIMIT_RPM`	int	`60`	Requests per minute limit
`LLM_RATE_LIMIT_BURST`	int	`10`	Burst capacity
`LLM_REQUEST_TIMEOUT`	float	None	Optional HTTP timeout (seconds) for LLM calls (LiteLLM / httpx)
`EMBEDDING_PROVIDER`	str	`local`	`local` or `openai`
`EMBEDDING_MODEL`	str	`all-MiniLM-L6-v2`	Embedding model name
`EMBEDDING_DIMENSIONS`	int	`384`	Embedding vector dimensions
`EMBEDDING_API_KEY`	str	None	API key for OpenAI embeddings
`TWO_TIER_EMBEDDING`	bool	`false`	Enable coarse+fine embeddings
`SCORE_ALPHA`	float	`0.50`	Vector similarity weight
`SCORE_BETA`	float	`0.35`	Hop decay weight
`SCORE_GAMMA`	float	`0.15`	Strength weight
`TRAVERSAL_DECAY`	float	`0.5`	Score multiplier per hop
`TRAVERSAL_MAX_DEPTH`	int	`5`	Max BFS hops
`TRAVERSAL_MIN_SCORE`	float	`0.1`	Prune below this score
`DECAY_FACTOR`	float	`0.95`	Strength multiplier per cycle
`ARCHIVE_THRESHOLD`	float	`0.01`	Archive nodes below this
`DECAY_INTERVAL_HOURS`	int	`24`	Hours between decay runs
`CACHE_ENABLED`	bool	`true`	Enable recall caching
`CACHE_MAX_SIZE`	int	`100`	Max cache entries
`CACHE_TTL_SECONDS`	int	`300`	Cache entry lifetime
`AUTO_MIGRATE`	bool	`true`	Run schema migrations on start
`LOG_LEVEL`	str	`INFO`	Logging level
`LOG_FORMAT`	str	`text`	`json` or `text`
`ENABLE_TRACING`	bool	`false`	Enable OpenTelemetry tracing

Architecture

engram_memory/
├── __init__.py               # Public exports + lazy imports
├── _version.py               # "0.1.0"
├── client.py                 # AsyncMemoryClient + MemoryClient (sync wrapper)
├── config.py                 # Pydantic BaseSettings configuration
├── constants.py              # SDK-wide defaults
├── exceptions.py             # EngramError hierarchy
├── models.py                 # Pydantic data contracts
├── rate_limiter.py           # Token-bucket rate limiter
├── graph/
│   ├── driver.py             # Async Neo4j driver wrapper
│   ├── engine.py             # Dynamic Cypher generator (single + batched UNWIND)
│   ├── indexes.py            # Vector index management
│   ├── migrations.py         # Schema versioning
│   ├── sanitise.py           # Label/type sanitisation
│   ├── traversal.py          # Single-query variable-length path traversal
│   ├── scorer.py             # Composite scoring
│   └── hierarchy.py          # Cluster summary tree
├── embeddings/
│   ├── base.py               # ABC for embedders
│   ├── sentence_transformer.py  # Local (SentenceTransformers)
│   ├── openai_embedding.py   # OpenAI API
│   └── two_tier.py           # Coarse + fine embeddings
├── llm/
│   ├── base.py               # ABC with retry + circuit breaker
│   ├── litellm_adapter.py    # Production path: LiteLLM (100+ providers)
│   ├── openai_adapter.py     # Legacy reference (not wired by default)
│   └── anthropic_adapter.py  # Legacy reference (not wired by default)
├── extractors/
│   ├── base.py               # ABC for extractors
│   ├── prompts.py            # LLM prompt templates
│   ├── llm_extractor.py      # LLM-powered extraction
│   └── trivial_filter.py     # Skip greetings/filler
├── cache/
│   └── lru_cache.py          # Per-user LRU with TTL
├── hooks/
│   ├── base.py               # Hook protocol
│   └── logger_hook.py        # Built-in logging hook
├── observability/
│   ├── logging.py            # JSON structured logging
│   ├── metrics.py            # Counters + histograms
│   └── tracing.py            # Optional OpenTelemetry
├── health/
│   └── checks.py             # Aggregated health checks
└── background/
    ├── runner.py             # Asyncio task scheduler
    ├── decay_task.py         # Strength decay + archival
    ├── hierarchy_task.py     # Cluster summary rebuild
    └── weight_learning_task.py  # Scoring weight telemetry

Performance & Cost Model

Resource Consumption per Operation

Operation	LLM Calls	Embedder Calls	Neo4j Calls	Typical Latency
Ingest (trivial)	0	0	0	~1 ms
Ingest (factual)	1	1 + N (summary differs)	~4 (batched)	~5 s
Recall (cache hit)	0	0	0	< 1 ms
Recall (cache miss)	0	1	2	~200 ms
Search (hierarchical)	0	1	1	~100 ms
get_graph	0	0	2	~50 ms

Benchmark Results (12-document corpus, Azure GPT-4-32k)

Metric	Value
Avg tokens per ingest	735
Total tokens (12 docs)	8,823
Prompt / Completion split	4,335 / 4,488
Avg nodes per ingest	2.67
Ingest p50 / p95	4,982 ms / 6,680 ms
Recall p50 / p95	200 ms / 368 ms
MRR	0.83
Precision@3	0.72
Recall@3	0.67

Token usage is logged per-ingest in IngestResult.tokens_prompt, tokens_completion, and tokens_total, enabling precise cost tracking in production.

Cost Estimation

The benchmark includes configurable per-model pricing. Example with Azure GPT-4-32k:

Metric	Value
Prompt cost	$0.06 / 1K tokens
Completion cost	$0.12 / 1K tokens
Cost per ingest	~$0.07
Cost per 1K documents	~$66.86

Switch to a cheaper model (GPT-4o-mini, Claude Haiku) and these numbers drop by 10-50x.

🤝 Contributing

Engram is open source. We want you to use it in production, report rough edges, and ship improvements.

Issues — bugs, design questions, or provider-specific LiteLLM quirks (include model id, env vars you set, and redacted logs).
Pull requests — keep changes focused; add or extend tests; clone the repo and use pip install -e ".[dev]" for pytest/ruff, then pytest tests/ -v. Match existing style and typing.
New LLM backends — the supported integration is LiteLLMAdapter. If you need a path LiteLLM does not cover, open an issue first; we welcome clean adapters that follow engram_memory/llm/base.py and include tests with mocks.

Thank you for helping make agent memory structured, fast, and boringly reliable.

📄 License

Engram is distributed under the MIT License. You may use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software for commercial or non-commercial purposes, provided that you include the original copyright notice and permission notice in all substantial copies.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
engram_memory		engram_memory
images		images
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Engram

Next-generation graph memory for agentic workflows

🚀 Why Engram? (Next-generation memory)

Documentation

✅ Verify your model with LiteLLM first

✨ Features

📦 Installation

End-to-end validation (production smoke)

How to run

What the default run does

Flags and timeouts

Neo4j-only check

Quick Start

1. Set Environment Variables

2. Async Usage (Recommended)

3. Sync Usage

How It Works

Ingestion Pipeline

Recall Pipeline

Core Concepts

Strength Decay

Composite Scoring

Traversal

Hierarchical Summary Tree

Trivial Filter

Two-Tier Embeddings

Optimistic Locking

Plugin Hooks

Rate Limiting

Circuit Breaker

Caching

Background Tasks

API Reference

AsyncMemoryClient

Key Models

Exception Hierarchy

Configuration Reference

Architecture

Performance & Cost Model

Resource Consumption per Operation

Benchmark Results (12-document corpus, Azure GPT-4-32k)

Cost Estimation

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`AsyncMemoryClient`

Packages