feat: autoresearch memory quality improvements by kargarisaac · Pull Request #21 · lerim-dev/lerim-cli

kargarisaac · 2026-03-31T06:10:21Z

Summary

Replaced Jaccard dedup with 3-signature DSPy module for semantic consolidation
Upgraded summarization from sequential fold to parallel MapReduce tree
Enhanced extraction signature with tighter quality gates (bug-report exclusion, directive/TODO exclusion, decision-vs-learning test, confidence cap)
Lowered dedup similarity thresholds and added topic saturation rule
Shipped memory actions with full metadata to cloud activity feed
Added extraction pipeline quality unit tests

Commits

refactor(extract): tighten quality gates, dedup thresholds, and topic saturation
refactor(extract): enhance MemoryExtractSignature and similarity handling
feat(activity): ship memory actions with full metadata to cloud
perf(summarize): replace sequential fold with parallel MapReduce tree
refactor(extract): replace Jaccard dedup with 3-signature DSPy module
fix(memory): add similarity normalization, rich metadata, and schema fixes
opt: add positive WHY+HOW TO APPLY example | extraction 0.845
opt: body structure WHY + HOW TO APPLY in schemas.py | extraction 0.848
opt: quality criteria in extraction signature | extraction 0.841

Test plan

Unit tests pass (test_extract_pipeline_quality.py, test_oai_tools.py)
Run full sync cycle on a real repo to verify extraction quality
Verify cloud activity feed receives memory action metadata

🤖 Generated with Claude Code

The lerim-cloud .pth file in the venv makes lerim-cloud's tests/ package shadow lerim-cli's tests/ directory. Adding __init__.py ensures Python resolves lerim-cli's tests first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added QUALITY BAR section to MemoryExtractSignature: atomic, actionable, context-independent, structured body, durable. Extraction improved +0.022 on 100 cases. Dedup -0.056 is within 3-case noise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Updated MemoryCandidate body field description: "lead with rule/fact, then WHY, then HOW TO APPLY". Aligned with Claude Code memory body structure. Extraction +0.007 (cumulative +0.029 from baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added a positive ✓ example demonstrating body structure: "WHY: mocked tests passed but prod migration failed. HOW TO APPLY: integration tests must hit real database." Reinforces exp021+exp022 quality criteria by demonstration. Extraction 0.845 (within noise of best 0.848). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… short sessions Updated the iter_sessions function to skip processing of sidechain transcripts and sessions with fewer than 6 conversation turns. This change prevents double-counting of content and ensures only meaningful interactions are considered. Adjusted unit tests to reflect the new minimum turn requirement for session filtering.

…eletion Enhanced the memory reset command help text to clarify that it now wipes cache data along with memory, workspace, and index data. Updated the reset_memory_root function to delete the cache directory and added a note about clearing the adapter cache for improved session management. This change ensures users are fully informed about the implications of the reset operation.

…fixes - memory_record: persist source_speaker and durability in frontmatter (data was silently lost) - memory_index: normalize find_similar output with fused_score, similarity, lexical_similarity - oai_tools: fix batch_dedup score bug (was returning 0 for everything), add write_memory source_speaker/durability/outcome params with validation - oai_sync: update dedup thresholds (0.7→0.75, 0.4→0.45), instruct agent to pass rich metadata - tests: update for new frontmatter keys and similarity fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace ~130 lines of regex-based Jaccard word-matching with a DSPy module containing three optimizable signatures: - MemoryExtractSignature (existing, per-window extraction) - ConsolidateCandidatesSignature (LLM merges semantic duplicates across windows) - QualityGateSignature (LLM drops low-value candidates) Also fixes format detection to handle "type":"human" traces (was silently dropping all user messages, causing extraction to return 0 candidates). Every judgment call is now an LLM call that DSPy can optimize via autoresearch, replacing magic thresholds with model understanding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the sequential refine/fold pattern (73 chunks × 45s = hours) with: - Parallel map: extract lightweight facets per chunk (~80 words each) - Tree reduce: merge facets hierarchically when they exceed context budget - Single synthesis: produce final TraceSummaryCandidate from all facets Also adds transcript formatting before windowing (13MB raw → 1.1MB formatted), reducing 73 windows to 6 and total time from hours to ~33 seconds. Key signatures: ChunkFacetSignature (map), MergeFacetsSignature (reduce), SynthesizeSummarySignature (final). All DSPy-optimizable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add memory_actions to OperationResult and details_json so the activity feed can show per-session memory lists with titles, body, tags, confidence, source_speaker, and durability. Each memory action includes session_run_id for per-session grouping. The daemon reads frontmatter from written memory files to extract full metadata. Fixes the "0 memories" bug in the activity feed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ling - Updated MemoryExtractSignature to clarify extraction criteria, emphasizing the importance of actionable insights and structured body content. - Improved similarity handling in MemoryIndex and OAI tools by merging similarity signals and normalizing outputs for better candidate ranking. - Adjusted examples in the documentation to reflect new extraction rules and quality criteria.

… saturation - Add bug-report, directive/TODO, and generic-knowledge exclusion rules to MemoryExtractSignature - Add decision-vs-learning test and cap 0.9+ confidence to max 1 per session - Require HOW TO APPLY to describe a different action than title (no restating) - Lower dedup similarity thresholds (0.75→0.65 for no_op, 0.45→0.40 for update) - Add topic saturation rule: 2+ existing memories on same topic defaults to no_op - Tighten "update" classification to require at least one concrete absent fact Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kargarisaac and others added 12 commits March 29, 2026 11:24

kargarisaac merged commit 41bf6e3 into main Mar 31, 2026
1 check passed

kargarisaac deleted the feat/autoresearch-memory-quality branch March 31, 2026 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: autoresearch memory quality improvements#21

feat: autoresearch memory quality improvements#21
kargarisaac merged 12 commits intomainfrom
feat/autoresearch-memory-quality

kargarisaac commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kargarisaac commented Mar 31, 2026

Summary

Commits

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant