feat: improve memory data quality#166
Conversation
- Add anti-patterns to extraction prompt (reject PR statuses, fix plans, deployment states) - Tighten type definitions (decision = architectural choices only, relation = stable entity relationships) - Add post-validation filters: reject content < 15 chars, reject transient patterns, demote state-without-TTL to fact - Auto-promote L0 → L1 when evidence_count reaches 3 (natural confirmation via repeated extraction) - Lower supersede similarity threshold from 0.85 to 0.78 for better dedup - Add one-time data migration (v1): normalize workspace_dir fragments and invalidate transient memories Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test transient content filtering (PR statuses, deployments) - Test min content length validation - Test state-without-TTL demotion to fact - Test evidence auto-promotion L0 → L1 - Test workspace_dir normalization migration - Test transient memory invalidation migration - Update existing tests for new content length requirements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Claude finished @lishuceo's task in 1m 26s —— View job PR Review: Memory Data Quality Improvements
|
| Severity | File | Issue | Confidence |
|---|---|---|---|
| 🟡 Warning | src/memory/store.ts:173 |
Stale JSDoc — says "similarity > 0.85 / distance < 0.15" but threshold was changed to 0.78/0.22 | 90 |
Overall Assessment
Solid PR. The changes are well-structured and follow existing patterns:
- Extraction quality: The expanded prompt with anti-examples and the code-level guards (
TRANSIENT_PATTERNS,MIN_CONTENT_LENGTH, state→fact demotion) form a good defense-in-depth — the LLM prompt steers extraction, andvalidateMemories()catches what leaks through. - Auto-promotion: The L0→L1 promotion logic in
updateEvidence()is clean — reads after the synchronousdb.updateEvidence()call, so no race conditions with better-sqlite3's synchronous API. - Data migrations: The
PRAGMA user_versionguard is the right approach. Migration is wrapped in a transaction with idempotent checks. Test coverage for the "should not re-run migrations on subsequent opens" case is a nice touch. - Similarity threshold: Lowering from 0.85→0.78 is a reasonable tuning — just update the JSDoc to match (see inline comment).
- Tests: 12 new test cases cover all new logic paths well. Test content was updated to be ≥15 chars to match the new
MIN_CONTENT_LENGTH, which is good attention to detail.
| for (const vr of vecResults) { | ||
| // cosine distance < 0.15 ≈ similarity > 0.85 | ||
| if (vr.distance >= 0.15) continue; | ||
| // cosine distance < 0.22 ≈ similarity > 0.78 |
There was a problem hiding this comment.
🟡 Stale JSDoc (confidence: 90): The JSDoc on line 173 still says "similarity > 0.85 (distance < 0.15)" but this line changed the threshold to 0.78/0.22. Please update the doc to match.
There was a problem hiding this comment.
cosine distance < 0.22 ≈ similarity > 0.78), but the JSDoc on line 147 still reads similarity > 0.85 (distance < 0.15 for cosine). Please update the JSDoc to match.
There was a problem hiding this comment.
similarity > 0.85 (distance < 0.15 for cosine) but the threshold on line 164-165 is 0.78/0.22.
…rsede chains Remove confidence multiplier from search scoring formula (was `searchScore × typeBoost × recencyDecay × confidence`, now without confidence). All memories compete equally regardless of L0/L1/L2 level. Remove evidence auto-promotion (L0→L1 at count>=3) since data shows only 5/1230 memories ever accumulated evidence. Keep evidence_count as a stat field. Remove bidirectional supersede pointers (supersedes, supersede_reason) and chain walking (getSupersedChain). Keep forward pointer (superseded_by) and invalid_at for basic supersede functionality. Schema fields retained for future product insights layer (see docs). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove tests for auto-promotion, supersede chains, confidence scoring, and supersedeReason. Update assertions to match simplified behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Claude finished @lishuceo's task in 2m 3s —— View job PR Review: Memory Data Quality Improvements (Re-review)
|
| Severity | File | Issue | Confidence |
|---|---|---|---|
| 🟡 Warning | src/memory/store.ts:147 |
Stale JSDoc (still unfixed): JSDoc says "similarity > 0.85 (distance < 0.15 for cosine)" but the threshold was changed to 0.78/0.22 on line 164-165 | 90 |
1 previous issue persists — replied on the existing thread.
Overall Assessment
The two new commits (e6fe310 simplify + cd501b8 test updates) are clean. The supersedes/supersede_reason column removal and confidence-from-scoring removal are well-executed:
- Column removal is safe — existing DBs keep the unused columns silently; fresh DBs never create them.
SELECT *ingetMemory()returns whatever columns exist, androwToMemory()no longer maps them, so no breakage either way. _store: unknownparameter increateMemorySearchMcpServermaintains API compatibility with the caller inexecutor.ts:422while correctly signaling the param is unused.- Score simplification (
search.ts:142) — removing* memory.confidencefromfinalScoreis consistent with the broader theme of dropping confidence as a ranking signal. - Migration guard —
PRAGMA user_versionis the right approach. Existing DBs getuser_version=0by default, so the v1 migration runs exactly once on upgrade. The transaction wrapping is correct. - Tests updated — all test files properly adapted to the simplified API (no
supersedes, nosupersedeReason, nogetSupersedChain).
Only action item: update the JSDoc on store.ts:147 from 0.85/0.15 to 0.78/0.22.
* feat: add skill-creator and nano-pdf skills from openclaw Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: support PDF file messages from Feishu chat - Add DocumentAttachment type and DocumentContentBlock for PDF support - Add downloadMessageFile() to Feishu client (im.messageResource.get with type='file') - Handle 'file' message type in parseMessage() — download PDF, reject non-PDF with hint - Wire documents through queue → executor → buildMultimodalPrompt pipeline - PDF files sent as document content blocks in Anthropic API format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing documents parameter to executeDirectTask/executeClaudeTask calls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
Claude finished @lishuceo's task in 1m 39s —— View job PR Review: Memory Data Quality Improvements (3rd review)
|
| Severity | File | Issue | Confidence |
|---|---|---|---|
| 🟡 Warning | src/memory/store.ts:147 |
Stale JSDoc (3rd review, still unfixed): JSDoc says similarity > 0.85 (distance < 0.15 for cosine) but threshold is 0.78/0.22 on lines 164-165 |
90 |
0 previous issues fixed — the stale JSDoc from review #1 persists. Replied on the existing thread.
Overall Assessment
The PR is solid. This review covers 5 commits across 29 files (including the skill-creator/nano-pdf additions from the base branch merge).
Memory changes (core of this PR):
- Extraction quality guards are well-layered: prompt anti-examples +
TRANSIENT_PATTERNSregex +MIN_CONTENT_LENGTH+ state→fact demotion. Defense-in-depth is the right approach. - Supersede chain removal is clean —
supersedes/supersede_reasoncolumns dropped fromMemoryRow,Memory,MemoryCreateInput, and all downstream code. Existing DBs with these columns are unaffected (SQLite ignores extra columns on SELECT). - Data migrations via
PRAGMA user_versionare correctly guarded and wrapped in a transaction. The workspace normalization and transient memory invalidation are idempotent. - Similarity threshold (0.85→0.78) and score simplification (removing
* memory.confidence) are consistent with the broader theme.
Document support (new in this sync):
DocumentAttachmenttype,downloadMessageFile(), and the file parsing inparseMessage()are correctly wired through the entire pipeline:parseMessage→taskQueue.enqueue→processQueue→executeClaudeTask/executeDirectTask→ClaudeExecutor. All call sites match function signatures.- 30MB file size limit and PDF-only restriction are sensible guards.
buildMultimodalPromptcorrectly handles the newdocumentsparameter with proper content block ordering (documents → images → text).
Only action item: Update the JSDoc on store.ts:147 from 0.85/0.15 to 0.78/0.22.
Summary
Test plan
🤖 Generated with Claude Code