Skip to content

feat: Mem0-style two-phase extraction+update pipeline#2

Merged
rolandpg merged 7 commits into
masterfrom
feat/mem0-two-phase-pipeline
Apr 9, 2026
Merged

feat: Mem0-style two-phase extraction+update pipeline#2
rolandpg merged 7 commits into
masterfrom
feat/mem0-two-phase-pipeline

Conversation

@rolandpg
Copy link
Copy Markdown
Owner

@rolandpg rolandpg commented Apr 9, 2026

Summary

  • Add Mem0-style two-phase memory pipeline replacing append-only storage with intelligent extraction and update operations
  • Phase 1 (FactExtractor): LLM distills raw content into scored candidate facts with importance (1-10)
  • Phase 2 (MemoryUpdater): Compares each fact against existing notes via vector search; LLM decides ADD/UPDATE/DELETE/NOOP
  • New remember_with_extraction() method on MemoryManager orchestrates both phases
  • Bumps version to v1.4.0

Why

ZettelForge scored 14% on LOCOMO benchmark vs. Mem0's 66.9% SOTA. Root cause: append-only storage creates redundancy, contradictions, and noise. This pipeline adds selective extraction (only important facts stored) and coherence operations (UPDATE supersedes stale info, DELETE corrects contradictions, NOOP skips duplicates).

New files

  • src/zettelforge/fact_extractor.py — Phase 1 extraction
  • src/zettelforge/memory_updater.py — Phase 2 operations
  • tests/test_fact_extractor.py — 11 unit tests
  • tests/test_memory_updater.py — 14 unit tests
  • tests/test_two_phase_e2e.py — 5 integration tests

Test plan

  • 42/42 tests passing (test_basic, test_fact_extractor, test_memory_updater, test_two_phase_e2e)
  • LLM parsing handles: valid JSON, markdown-wrapped, garbage input, non-numeric importance
  • Mocked ollama tests verify extraction, all 4 operations, and error fallback
  • E2E tests verify full pipeline: extraction+storage, low-importance filtering, UPDATE supersession, NOOP dedup
  • Re-run LOCOMO benchmark after merge to measure improvement

🤖 Generated with Claude Code

rolandpg and others added 7 commits April 9, 2026 09:48
Implements FactExtractor class and ExtractedFact dataclass using ollama
qwen2.5:3b to distill raw content into importance-scored facts (1-10 scale),
with markdown fence stripping, JSON parsing, descending sort, and graceful
error fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wires FactExtractor (Phase 1) and MemoryUpdater (Phase 2) together via a
new remember_with_extraction() method on MemoryManager, enabling Mem0-style
ADD/UPDATE/DELETE/NOOP decisions per extracted fact.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove dead min_importance param from FactExtractor (filtering done in MemoryManager)
- Harden importance parsing to handle non-numeric LLM output
- Rename DELETE status from "deleted" to "corrected" (soft-delete, not actual deletion)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rolandpg
Copy link
Copy Markdown
Owner Author

rolandpg commented Apr 9, 2026

LGTM. Clean implementation, tests pass. Next step: wire remember_with_extraction() into the LOCOMO benchmark harness and re-run to measure improvement.

@rolandpg rolandpg merged commit b301d31 into master Apr 9, 2026
@rolandpg rolandpg deleted the feat/mem0-two-phase-pipeline branch April 10, 2026 16:59
rolandpg added a commit that referenced this pull request Apr 15, 2026
…mark design

Performance Benchmarker analysis covering:
- Survey of 20+ benchmarks across 7 categories
- Priority matrix: CTIBench ATE fix (#1), CTI-specific v2 (#2),
  scale 10K (#3), evolve quality (#4), RAGAS CTI (#5)
- Custom CTI benchmark design with alias-aware scoring
- Harness architecture for CI-integrated regression detection
- Bug found: auto_ralph.py numpy import at wrong scope

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rolandpg added a commit that referenced this pull request Apr 15, 2026
Data Engineer analysis of live OpenCTI (114K rels, 140K indicators,
73K observables) vs ZettelForge data model reveals ZettelForge cannot
represent the dominant IOC workload. Key gaps: no indicator type, no
observable types (hashes/IPs/domains), no ATT&CK pattern entity, no
TLP markings, no STIX pattern parser.

6 prioritized fixes identified. IOC regex extractors (#1) and
AttackPattern entity (#2) are highest impact at lowest effort.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rolandpg added a commit that referenced this pull request Apr 26, 2026
Three blockers found in code review of the RFC-015 web GUI:

1. config_page (/config HTML route) called _to_dict, which was defined as
   a nested function inside get_config_endpoint. Every render hit a
   NameError that was silently swallowed by a bare except, leaving
   config_yaml = "" and forcing the SPA to fall back to /api/config.
   Promoted _to_dict to a module-level _config_to_dict helper used by
   both routes.

2. update_config compared top-level payload keys against a set of
   dotted-path restart-required fields, so {"embedding": {"provider": "x"}}
   was reported as applied: ["embedding"], pending_restart: [], silently
   telling operators a restart-required change had taken effect.
   Added _flatten_keys to walk the nested payload to dotted leaf paths
   and lifted the restart set to module scope as _RESTART_REQUIRED_FIELDS.

3. /config HTML route had no auth guard. /api/config was protected, but
   the page shell (and once #1 was fixed, the server-rendered YAML body)
   was reachable without an API key. Added Depends(require_api_guard)
   to the route and made the YAML body redact secrets before serializing.

Tests:

Added four regression tests in tests/test_web_api.py covering nested
restart-required flags, nested non-restart application, /config auth
gate (via monkeypatch of API_KEY because it is captured at module import
time), and /config server-side YAML render. 24 passed, 2 skipped.

Note on the original review's blocker #2 (multi-tenancy regression on
graph and entity endpoints): not a regression. There is no per-tenant
MemoryManager._knowledge_graph anywhere in community or enterprise.
get_knowledge_graph() is a process-wide singleton; get_mm_for_request
is a stub returning the default MemoryManager. The old code's
getattr(tenant_mm, "_knowledge_graph", None) always returned None,
so the new singleton call is a functional improvement, not a regression.
No code change needed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant