Skip to content

feat: autoresearch memory quality improvements#21

Merged
kargarisaac merged 12 commits intomainfrom
feat/autoresearch-memory-quality
Mar 31, 2026
Merged

feat: autoresearch memory quality improvements#21
kargarisaac merged 12 commits intomainfrom
feat/autoresearch-memory-quality

Conversation

@kargarisaac
Copy link
Copy Markdown
Contributor

Summary

  • Replaced Jaccard dedup with 3-signature DSPy module for semantic consolidation
  • Upgraded summarization from sequential fold to parallel MapReduce tree
  • Enhanced extraction signature with tighter quality gates (bug-report exclusion, directive/TODO exclusion, decision-vs-learning test, confidence cap)
  • Lowered dedup similarity thresholds and added topic saturation rule
  • Shipped memory actions with full metadata to cloud activity feed
  • Added extraction pipeline quality unit tests

Commits

  • refactor(extract): tighten quality gates, dedup thresholds, and topic saturation
  • refactor(extract): enhance MemoryExtractSignature and similarity handling
  • feat(activity): ship memory actions with full metadata to cloud
  • perf(summarize): replace sequential fold with parallel MapReduce tree
  • refactor(extract): replace Jaccard dedup with 3-signature DSPy module
  • fix(memory): add similarity normalization, rich metadata, and schema fixes
  • opt: add positive WHY+HOW TO APPLY example | extraction 0.845
  • opt: body structure WHY + HOW TO APPLY in schemas.py | extraction 0.848
  • opt: quality criteria in extraction signature | extraction 0.841

Test plan

  • Unit tests pass (test_extract_pipeline_quality.py, test_oai_tools.py)
  • Run full sync cycle on a real repo to verify extraction quality
  • Verify cloud activity feed receives memory action metadata

🤖 Generated with Claude Code

kargarisaac and others added 12 commits March 29, 2026 11:24
The lerim-cloud .pth file in the venv makes lerim-cloud's tests/
package shadow lerim-cli's tests/ directory. Adding __init__.py
ensures Python resolves lerim-cli's tests first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added QUALITY BAR section to MemoryExtractSignature: atomic, actionable,
context-independent, structured body, durable. Extraction improved +0.022
on 100 cases. Dedup -0.056 is within 3-case noise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updated MemoryCandidate body field description: "lead with rule/fact,
then WHY, then HOW TO APPLY". Aligned with Claude Code memory body
structure. Extraction +0.007 (cumulative +0.029 from baseline).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added a positive ✓ example demonstrating body structure: "WHY: mocked
tests passed but prod migration failed. HOW TO APPLY: integration tests
must hit real database." Reinforces exp021+exp022 quality criteria by
demonstration. Extraction 0.845 (within noise of best 0.848).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… short sessions

Updated the iter_sessions function to skip processing of sidechain transcripts and sessions with fewer than 6 conversation turns. This change prevents double-counting of content and ensures only meaningful interactions are considered. Adjusted unit tests to reflect the new minimum turn requirement for session filtering.
…eletion

Enhanced the memory reset command help text to clarify that it now wipes cache data along with memory, workspace, and index data. Updated the reset_memory_root function to delete the cache directory and added a note about clearing the adapter cache for improved session management. This change ensures users are fully informed about the implications of the reset operation.
…fixes

- memory_record: persist source_speaker and durability in frontmatter (data was silently lost)
- memory_index: normalize find_similar output with fused_score, similarity, lexical_similarity
- oai_tools: fix batch_dedup score bug (was returning 0 for everything), add write_memory
  source_speaker/durability/outcome params with validation
- oai_sync: update dedup thresholds (0.7→0.75, 0.4→0.45), instruct agent to pass rich metadata
- tests: update for new frontmatter keys and similarity fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ~130 lines of regex-based Jaccard word-matching with a DSPy module
containing three optimizable signatures:
- MemoryExtractSignature (existing, per-window extraction)
- ConsolidateCandidatesSignature (LLM merges semantic duplicates across windows)
- QualityGateSignature (LLM drops low-value candidates)

Also fixes format detection to handle "type":"human" traces (was silently
dropping all user messages, causing extraction to return 0 candidates).

Every judgment call is now an LLM call that DSPy can optimize via
autoresearch, replacing magic thresholds with model understanding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the sequential refine/fold pattern (73 chunks × 45s = hours) with:
- Parallel map: extract lightweight facets per chunk (~80 words each)
- Tree reduce: merge facets hierarchically when they exceed context budget
- Single synthesis: produce final TraceSummaryCandidate from all facets

Also adds transcript formatting before windowing (13MB raw → 1.1MB formatted),
reducing 73 windows to 6 and total time from hours to ~33 seconds.

Key signatures: ChunkFacetSignature (map), MergeFacetsSignature (reduce),
SynthesizeSummarySignature (final). All DSPy-optimizable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add memory_actions to OperationResult and details_json so the activity
feed can show per-session memory lists with titles, body, tags,
confidence, source_speaker, and durability.

Each memory action includes session_run_id for per-session grouping.
The daemon reads frontmatter from written memory files to extract
full metadata. Fixes the "0 memories" bug in the activity feed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ling

- Updated MemoryExtractSignature to clarify extraction criteria, emphasizing the importance of actionable insights and structured body content.
- Improved similarity handling in MemoryIndex and OAI tools by merging similarity signals and normalizing outputs for better candidate ranking.
- Adjusted examples in the documentation to reflect new extraction rules and quality criteria.
… saturation

- Add bug-report, directive/TODO, and generic-knowledge exclusion rules to MemoryExtractSignature
- Add decision-vs-learning test and cap 0.9+ confidence to max 1 per session
- Require HOW TO APPLY to describe a different action than title (no restating)
- Lower dedup similarity thresholds (0.75→0.65 for no_op, 0.45→0.40 for update)
- Add topic saturation rule: 2+ existing memories on same topic defaults to no_op
- Tighten "update" classification to require at least one concrete absent fact

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kargarisaac kargarisaac merged commit 41bf6e3 into main Mar 31, 2026
1 check passed
@kargarisaac kargarisaac deleted the feat/autoresearch-memory-quality branch March 31, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant