Skip to content

feat: improve memory data quality#166

Closed
lishuceo wants to merge 5 commits into
refactor/rename-bootstrap-to-setup-claudefrom
refactor/memory-quality-improvements
Closed

feat: improve memory data quality#166
lishuceo wants to merge 5 commits into
refactor/rename-bootstrap-to-setup-claudefrom
refactor/memory-quality-improvements

Conversation

@lishuceo
Copy link
Copy Markdown
Owner

Summary

  • 提取质量优化: 在 extractor prompt 中增加反例规则(拒绝 PR 状态、修复方案、部署信息等临时事务),收紧类型定义(decision 仅用于架构决策),增加代码层面校验(最短长度、事务性模式过滤、无 TTL 的 state 降级为 fact)
  • 置信度自然升级: evidence_count >= 3 时自动将 L0 提升为 L1(confidence 上限从 0.7 → 0.9),无需 UI 交互
  • 存量数据治理: 一次性迁移脚本归一化碎片化的 workspace_dir(5 种本地路径 → 1 个规范标识),批量失效约 80 条事务性记忆
  • 超替阈值调优: 相似度阈值从 0.85 降到 0.78,提升语义相近记忆的合并率

Test plan

  • 3 个测试文件新增 12 个测试用例,覆盖所有新逻辑
  • 全量测试 1123/1124 通过(1 个预期的 quality.test.ts 向量搜索质量测试失败,与本次改动无关)

🤖 Generated with Claude Code

lishuceo and others added 2 commits March 18, 2026 13:43
- Add anti-patterns to extraction prompt (reject PR statuses, fix plans,
  deployment states)
- Tighten type definitions (decision = architectural choices only,
  relation = stable entity relationships)
- Add post-validation filters: reject content < 15 chars, reject
  transient patterns, demote state-without-TTL to fact
- Auto-promote L0 → L1 when evidence_count reaches 3 (natural
  confirmation via repeated extraction)
- Lower supersede similarity threshold from 0.85 to 0.78 for better
  dedup
- Add one-time data migration (v1): normalize workspace_dir fragments
  and invalidate transient memories

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test transient content filtering (PR statuses, deployments)
- Test min content length validation
- Test state-without-TTL demotion to fact
- Test evidence auto-promotion L0 → L1
- Test workspace_dir normalization migration
- Test transient memory invalidation migration
- Update existing tests for new content length requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Claude finished @lishuceo's task in 1m 26s —— View job


PR Review: Memory Data Quality Improvements

  • Check for previous review comments
  • Read changed files and understand context
  • Review code for bugs, security, and architecture issues
  • Post review summary

⚠️ Issues Found

Severity File Issue Confidence
🟡 Warning src/memory/store.ts:173 Stale JSDoc — says "similarity > 0.85 / distance < 0.15" but threshold was changed to 0.78/0.22 90

Overall Assessment

Solid PR. The changes are well-structured and follow existing patterns:

  • Extraction quality: The expanded prompt with anti-examples and the code-level guards (TRANSIENT_PATTERNS, MIN_CONTENT_LENGTH, state→fact demotion) form a good defense-in-depth — the LLM prompt steers extraction, and validateMemories() catches what leaks through.
  • Auto-promotion: The L0→L1 promotion logic in updateEvidence() is clean — reads after the synchronous db.updateEvidence() call, so no race conditions with better-sqlite3's synchronous API.
  • Data migrations: The PRAGMA user_version guard is the right approach. Migration is wrapped in a transaction with idempotent checks. Test coverage for the "should not re-run migrations on subsequent opens" case is a nice touch.
  • Similarity threshold: Lowering from 0.85→0.78 is a reasonable tuning — just update the JSDoc to match (see inline comment).
  • Tests: 12 new test cases cover all new logic paths well. Test content was updated to be ≥15 chars to match the new MIN_CONTENT_LENGTH, which is good attention to detail.

Comment thread src/memory/store.ts
for (const vr of vecResults) {
// cosine distance < 0.15 ≈ similarity > 0.85
if (vr.distance >= 0.15) continue;
// cosine distance < 0.22 ≈ similarity > 0.78
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Stale JSDoc (confidence: 90): The JSDoc on line 173 still says "similarity > 0.85 (distance < 0.15)" but this line changed the threshold to 0.78/0.22. Please update the doc to match.

Fix this →

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Still exists. The inline comment on line 164 was updated correctly (cosine distance < 0.22 ≈ similarity > 0.78), but the JSDoc on line 147 still reads similarity > 0.85 (distance < 0.15 for cosine). Please update the JSDoc to match.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Still unfixed after 3 reviews. Line 147 JSDoc says similarity > 0.85 (distance < 0.15 for cosine) but the threshold on line 164-165 is 0.78/0.22.

lishuceo and others added 2 commits March 18, 2026 15:21
…rsede chains

Remove confidence multiplier from search scoring formula (was `searchScore ×
typeBoost × recencyDecay × confidence`, now without confidence). All memories
compete equally regardless of L0/L1/L2 level.

Remove evidence auto-promotion (L0→L1 at count>=3) since data shows only
5/1230 memories ever accumulated evidence. Keep evidence_count as a stat field.

Remove bidirectional supersede pointers (supersedes, supersede_reason) and
chain walking (getSupersedChain). Keep forward pointer (superseded_by) and
invalid_at for basic supersede functionality.

Schema fields retained for future product insights layer (see docs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove tests for auto-promotion, supersede chains, confidence scoring,
and supersedeReason. Update assertions to match simplified behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Claude finished @lishuceo's task in 2m 3s —— View job


PR Review: Memory Data Quality Improvements (Re-review)

  • Check previous review comments and resolve fixed issues
  • Read changed files and review new commits
  • Post review summary

⚠️ Issues Found

Severity File Issue Confidence
🟡 Warning src/memory/store.ts:147 Stale JSDoc (still unfixed): JSDoc says "similarity > 0.85 (distance < 0.15 for cosine)" but the threshold was changed to 0.78/0.22 on line 164-165 90

1 previous issue persists — replied on the existing thread.

Overall Assessment

The two new commits (e6fe310 simplify + cd501b8 test updates) are clean. The supersedes/supersede_reason column removal and confidence-from-scoring removal are well-executed:

  • Column removal is safe — existing DBs keep the unused columns silently; fresh DBs never create them. SELECT * in getMemory() returns whatever columns exist, and rowToMemory() no longer maps them, so no breakage either way.
  • _store: unknown parameter in createMemorySearchMcpServer maintains API compatibility with the caller in executor.ts:422 while correctly signaling the param is unused.
  • Score simplification (search.ts:142) — removing * memory.confidence from finalScore is consistent with the broader theme of dropping confidence as a ranking signal.
  • Migration guardPRAGMA user_version is the right approach. Existing DBs get user_version=0 by default, so the v1 migration runs exactly once on upgrade. The transaction wrapping is correct.
  • Tests updated — all test files properly adapted to the simplified API (no supersedes, no supersedeReason, no getSupersedChain).

Only action item: update the JSDoc on store.ts:147 from 0.85/0.15 to 0.78/0.22.

* feat: add skill-creator and nano-pdf skills from openclaw

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: support PDF file messages from Feishu chat

- Add DocumentAttachment type and DocumentContentBlock for PDF support
- Add downloadMessageFile() to Feishu client (im.messageResource.get with type='file')
- Handle 'file' message type in parseMessage() — download PDF, reject non-PDF with hint
- Wire documents through queue → executor → buildMultimodalPrompt pipeline
- PDF files sent as document content blocks in Anthropic API format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add missing documents parameter to executeDirectTask/executeClaudeTask calls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Claude finished @lishuceo's task in 1m 39s —— View job


PR Review: Memory Data Quality Improvements (3rd review)

  • Check previous review comments and resolve fixed issues
  • Review current diff for new issues
  • Post review summary

⚠️ Issues Found

Severity File Issue Confidence
🟡 Warning src/memory/store.ts:147 Stale JSDoc (3rd review, still unfixed): JSDoc says similarity > 0.85 (distance < 0.15 for cosine) but threshold is 0.78/0.22 on lines 164-165 90

0 previous issues fixed — the stale JSDoc from review #1 persists. Replied on the existing thread.

Overall Assessment

The PR is solid. This review covers 5 commits across 29 files (including the skill-creator/nano-pdf additions from the base branch merge).

Memory changes (core of this PR):

  • Extraction quality guards are well-layered: prompt anti-examples + TRANSIENT_PATTERNS regex + MIN_CONTENT_LENGTH + state→fact demotion. Defense-in-depth is the right approach.
  • Supersede chain removal is clean — supersedes/supersede_reason columns dropped from MemoryRow, Memory, MemoryCreateInput, and all downstream code. Existing DBs with these columns are unaffected (SQLite ignores extra columns on SELECT).
  • Data migrations via PRAGMA user_version are correctly guarded and wrapped in a transaction. The workspace normalization and transient memory invalidation are idempotent.
  • Similarity threshold (0.85→0.78) and score simplification (removing * memory.confidence) are consistent with the broader theme.

Document support (new in this sync):

  • DocumentAttachment type, downloadMessageFile(), and the file parsing in parseMessage() are correctly wired through the entire pipeline: parseMessagetaskQueue.enqueueprocessQueueexecuteClaudeTask/executeDirectTaskClaudeExecutor. All call sites match function signatures.
  • 30MB file size limit and PDF-only restriction are sensible guards.
  • buildMultimodalPrompt correctly handles the new documents parameter with proper content block ordering (documents → images → text).

Only action item: Update the JSDoc on store.ts:147 from 0.85/0.15 to 0.78/0.22.

@lishuceo lishuceo closed this Mar 18, 2026
@lishuceo lishuceo deleted the refactor/memory-quality-improvements branch March 18, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant