Skip to content

feat: Memory Extraction System (#356)#361

Merged
rjroy merged 12 commits intomainfrom
feat/356-memory-discovery
Jan 19, 2026
Merged

feat: Memory Extraction System (#356)#361
rjroy merged 12 commits intomainfrom
feat/356-memory-discovery

Conversation

@rjroy
Copy link
Copy Markdown
Owner

@rjroy rjroy commented Jan 19, 2026

Summary

Implements the Memory Extraction System as specified in Issue #356. This feature enables automatic extraction of facts from meeting transcripts and stores them in a memory file that provides context to Claude.

Key Features:

  • Overnight batch processing of transcripts from all vaults
  • Incremental extraction using SHA-256 checksums to avoid reprocessing
  • Duplicate detection with Levenshtein distance similarity (0.9 threshold)
  • 50KB memory file limit with automatic pruning
  • Customizable extraction prompts with user override support
  • Catch-up extraction when last run was >24 hours ago
  • Settings UI for viewing/editing memory and extraction prompts

Changes

Backend

  • Extraction Pipeline (backend/src/extraction/)

    • extraction-state.ts: State persistence with checksum tracking
    • transcript-reader.ts: Discover transcripts from vault inboxes
    • fact-extractor.ts: Claude Agent SDK integration for fact extraction
    • memory-writer.ts: Sandbox pattern, size limits, duplicate detection
    • extraction-manager.ts: Scheduler with cron-based overnight runs
  • WebSocket Handlers (backend/src/handlers/memory-handlers.ts)

    • get_memory / save_memory: Read/write memory file
    • get_extraction_prompt / save_extraction_prompt / reset_extraction_prompt: Prompt management
    • trigger_extraction: Manual extraction trigger

Frontend

  • Settings Dialog (frontend/src/components/SettingsDialog.tsx)
    • Tabbed interface with Memory and Extraction Prompt tabs
  • Memory Editor (frontend/src/components/MemoryEditor.tsx)
    • Size indicator bar (45KB warning, 50KB limit)
    • Save/discard functionality
  • Extraction Prompt Editor (frontend/src/components/ExtractionPromptEditor.tsx)
    • Default/Custom badge indicator
    • Reset to default functionality

Protocol

  • Added memory-related message types to shared/src/protocol.ts

Test plan

  • All 31 E2E acceptance tests pass (extraction-e2e.test.ts)
  • Memory handlers tests pass (369 lines)
  • Frontend component tests pass (SettingsDialog, MemoryEditor, ExtractionPromptEditor)
  • Full pre-commit suite passes (3121 tests, typecheck, lint)

🤖 Generated with Claude Code

rjroy and others added 11 commits January 18, 2026 20:45
Add TypeScript interfaces and Zod schemas for memory extraction state tracking:
- ExtractionState and ProcessedTranscript data models
- State file persistence at ~/.config/memory-loop/extraction-state.json
- SHA-256 checksum calculation for transcript change detection
- Helper functions for transcript tracking (isTranscriptProcessed, findUnprocessedTranscripts)
- Atomic writes using temp file + rename pattern

Satisfies REQ-F-8 (track processed transcripts) and REQ-NF-2 (idempotent extraction).

Tests: 60 unit tests covering schemas, persistence, and round-trip validation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create the default prompt for memory extraction defining:
- Five fact categories: Identity, Goals, Preferences, Project Context, Recurring Insights
- Hybrid narrative/list output format with examples
- Merge behavior (add, update, preserve, consolidate)
- Security restrictions (never extract credentials or sensitive data)
- Tool guidance for extraction process

Per REQ-F-10 and TD-3 from the Memory Extraction System spec.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extend WebSocket protocol with messages for Settings dialog operations:

Client messages:
- get_memory: Request memory.md content
- save_memory: Write updated memory content
- get_extraction_prompt: Request prompt with override status
- save_extraction_prompt: Write prompt (creates override if needed)
- trigger_extraction: Manual extraction trigger

Server messages:
- memory_content: Memory text with sizeBytes and exists flag
- extraction_prompt_content: Prompt text with isOverride status
- memory_saved: Write confirmation with size
- extraction_prompt_saved: Write confirmation with override status
- extraction_status: Extraction progress (idle/running/complete/error)

Tests: 40 new test cases covering all schemas.

Per TD-10 from the Memory Extraction System plan.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Discovers and reads transcript markdown files from vault {inbox}/chats/
directories for the memory extraction pipeline. Key features:

- Parse YAML frontmatter and markdown content
- Filter to unprocessed transcripts using checksum-based state
- Handle malformed files gracefully (log warning, skip)
- Reuses getTranscriptsDirectory() from transcript-manager.ts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Calls Claude Agent SDK to analyze transcripts and extract durable facts.
Key features:

- Uses Haiku model for cost efficiency
- Enables tools: Glob, Grep, Read, Edit, Write, Task
- Loads extraction prompt from user override or codebase default
- Handles SDK errors with single retry and 2s backoff
- Builds prompt with specific transcript paths for processing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements TASK-005 from the Memory Extraction System spec.

Key functionality:
- Sandbox pattern: copies memory.md to VAULTS_DIR/.memory-extraction/
  before extraction, copies back after (per TD-12)
- Atomic writes: uses temp file + rename for data integrity
- 50KB limit enforcement with adaptive pruning algorithm that removes
  lines from the largest sections first
- Vault CLAUDE.md section management: isolated "## Memory Loop Insights"
  section for per-vault facts (per TD-7)
- Recovery logic for crashed extractions

Spec requirements satisfied:
- REQ-F-1: Store facts in ~/.claude/rules/memory.md
- REQ-F-3: Write vault-specific insights to CLAUDE.md
- REQ-NF-1: Enforce 50KB memory file limit
- REQ-NF-3: Privacy/safety via sandboxing

Known limitation: commitSandbox and checkAndRecover tests are incomplete
due to hardcoded MEMORY_FILE_PATH. Tracked as technical debt.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements TASK-006 from the Memory Extraction System spec.

Key functionality:
- Text normalization: lowercase, trim, remove punctuation, normalize whitespace
- Levenshtein distance calculation with O(min(m,n)) space optimization
- Similarity ratio (0-1) using normalized edit distance
- Duplicate detection with 0.9 threshold for near-duplicates
- Self-deduplication during batch filtering
- Merge function that filters duplicates before adding new facts

Spec requirements satisfied:
- REQ-F-9: Prevent duplicate facts
- REQ-NF-2: Idempotent extraction (combined with transcript tracking)

Per TD-4: Normalized text comparison with fuzzy matching, 0.9 similarity
threshold. Runs client-side on extracted facts before merge, keeping logic
testable without LLM calls.

Test coverage: 30 new tests for duplicate detection functions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements TASK-007 from the Memory Extraction System spec.

Key functionality:
- runExtraction() orchestrates reader → extractor → writer flow
- startScheduler() with node-cron for daily 3am runs (configurable)
- Catch-up detection: triggers extraction if lastRunAt > 24h ago
- Concurrent run prevention with isRunning mutex
- Recovery check on startup for crashed extractions

Configuration:
- EXTRACTION_SCHEDULE: cron expression (default: "0 3 * * *")
- EXTRACTION_CATCHUP_HOURS: catch-up threshold (default: 24)

Spec requirements satisfied:
- REQ-F-4: Overnight batch processing
- REQ-F-5: Process transcripts from all vaults
- REQ-NF-2: Idempotent extraction

Per TD-1: Scheduled batch via node-cron, TD-12: Sandbox pattern.

Dependencies added: cron, @types/cron

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Completes Phase 3 of the Memory Extraction System implementation:

Backend:
- Add memory WebSocket handlers (get/save memory, get/save/reset prompt, trigger extraction)
- Wire up extraction scheduler startup in index.ts with catch-up check
- Add reset_extraction_prompt message type to protocol
- Add resetManagerState() for test isolation

Frontend:
- Create SettingsDialog component with tabbed interface
- Implement MemoryEditor with size indicator (45KB warning, 50KB limit)
- Implement ExtractionPromptEditor with Default/Custom badge and reset functionality

Testing:
- Add 31 E2E acceptance tests covering all spec requirements
- Add memory-handlers tests for WebSocket message handling
- Fix test isolation issues in extraction-manager and memory-writer tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 19, 2026

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive Memory Extraction System that automatically extracts facts from meeting transcripts and stores them in a memory file for Claude context injection. The implementation includes overnight batch processing, incremental extraction with checksum tracking, duplicate detection, size management, and a settings UI.

Changes:

  • Backend extraction pipeline with scheduler, transcript reader, fact extractor, memory writer, and state management
  • WebSocket handlers for memory file and extraction prompt management
  • Frontend settings dialog with memory and extraction prompt editors
  • Comprehensive test coverage (31 E2E tests, unit tests for all components)

Reviewed changes

Copilot reviewed 32 out of 33 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
shared/src/protocol.ts Adds 6 client and 6 server message schemas for memory extraction operations
shared/src/tests/protocol.test.ts Adds 400+ lines of protocol schema tests with full coverage
backend/src/extraction/* Core extraction pipeline modules (state, reader, extractor, writer, manager)
backend/src/handlers/memory-handlers.ts WebSocket handlers for memory/prompt get/save/reset/trigger operations
backend/src/websocket-handler.ts Integrates memory handlers into WebSocket router
backend/src/index.ts Starts extraction scheduler on server startup
backend/src/prompts/extraction-prompt.md Default extraction prompt with security guidelines
frontend/src/components/SettingsDialog.tsx Tabbed settings dialog with keyboard navigation and accessibility
frontend/src/components/MemoryEditor.tsx Memory file editor with size indicator and 50KB limit enforcement
frontend/src/components/ExtractionPromptEditor.tsx Extraction prompt editor with default/custom badge
backend/package.json Adds cron dependency for scheduling

const generator = (async function* () {
await Promise.resolve();
throw new Error(errorMessage);
yield undefined as never;
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unreachable.

Copilot uses AI. Check for mistakes.
const generator = (async function* () {
await Promise.resolve();
throw new Error("First attempt failed");
yield undefined as never;
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unreachable.

Copilot uses AI. Check for mistakes.
- Add comprehensive tests for scheduler lifecycle (startScheduler/stopScheduler)
- Add tests for environment variable edge cases (empty string, whitespace, floats)
- Add tests for needsCatchUp edge cases (exactly at threshold, future dates)
- Add tests for runExtraction pipeline (duration tracking, isRunning state)
- Fix unreachable yield statements flagged by Copilot review
- Add eslint-disable comments for require-yield on intentionally throwing generators

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@rjroy rjroy merged commit 402a240 into main Jan 19, 2026
2 checks passed
@rjroy rjroy deleted the feat/356-memory-discovery branch January 19, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants