docs: Phase 0.5 attribution artifact (Vigil v2.4.1 log analysis)#91
Conversation
Companion artifact to PR #90 (which shipped the phase_timings_ms instrumentation). Analyses 961 real remember() calls from Vigil's live OCSF log and attributes 98.4% of remember() wall-clock to one LanceDB notes_cti Update. Root cause: 7,356 uncompacted fragments on the CTI shard vs 458 on the healthy general shard. Artifact proposes expanding RFC-009 §1.1 F03 to cover periodic compaction. Confidence: high on attribution, medium on mechanism — to be falsified or refined by phase_timings_ms data post-v2.4.2 deploy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4d19e3944d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember") | .duration_ms] | sum) as $R | | ||
| ([.[] | select(.event=="ocsf_file_activity" and .file.path=="notes_cti" | ||
| and .duration_ms != null) | .duration_ms] | sum) as $L | | ||
| {remember_ms: $R, lance_cti_ms: $L, lance_share_pct: ($L*100/$R)}' "$LOG" |
There was a problem hiding this comment.
Join file events to remember calls by note_id
The reproduction query labeled “LanceDB share of remember()” sums all ocsf_file_activity rows where .file.path=="notes_cti", but it never restricts those rows to the same remember calls (despite the Method section saying events are matched by note_id). If the log contains any other notes_cti writes (for example scripts/rebuild_index.py calling _index_in_lance()), this inflates $L and can overstate the reported 98.42% attribution.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a Phase 0.5 research artifact documenting preliminary remember() latency attribution from Vigil v2.4.1 OCSF logs, intended to inform RFC-009 prioritization (and follow up after v2.4.2 phase_timings_ms lands).
Changes:
- Introduces a new research note summarizing observed latency distribution and attributing most wall-clock time to LanceDB
notes_ctiupdates. - Documents a root-cause hypothesis (fragment accumulation / lack of compaction) and recommended next actions.
- Includes an appendix with
jq/shell commands to reproduce summary metrics and fragment counts.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| jq -s 'def sum: add; | ||
| ([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember") | .duration_ms] | sum) as $R | | ||
| ([.[] | select(.event=="ocsf_file_activity" and .file.path=="notes_cti" | ||
| and .duration_ms != null) | .duration_ms] | sum) as $L | |
There was a problem hiding this comment.
Appendix "LanceDB share of remember()" jq snippet sums all ocsf_file_activity events for file.path=="notes_cti" without joining them to ocsf_api_activity remember via note_id, which doesn't match the Method section description and could over/under-count if there are unrelated file events in the log. Consider either updating the snippet to perform the note_id correlation (or adjacency match), or clarifying in the Method/Appendix that the log contains only notes_cti file_activity events emitted during remember() so the simple sum is valid.
| jq -s 'def sum: add; | |
| ([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember") | .duration_ms] | sum) as $R | | |
| ([.[] | select(.event=="ocsf_file_activity" and .file.path=="notes_cti" | |
| and .duration_ms != null) | .duration_ms] | sum) as $L | | |
| jq -s 'def sum: (add // 0); | |
| ([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember") | |
| | .duration_ms] | sum) as $R | | |
| ([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember" | |
| and .note_id != null) | |
| | .note_id] | unique) as $remember_note_ids | | |
| ([.[] | . as $e | |
| | select(.event=="ocsf_file_activity" and .file.path=="notes_cti" | |
| and .duration_ms != null and .note_id != null | |
| and (($remember_note_ids | index($e.note_id)) != null)) | |
| | .duration_ms] | sum) as $L | |
| Two facts in combination: | ||
|
|
||
| 1. **Every LanceDB write creates a new fragment.** This is by design — Lance is an append-optimized columnar store. | ||
| 2. **ZettelForge never calls `compact_files()`.** A grep for `compact_files`, `optimize`, or any equivalent in `src/zettelforge/` returns zero matches. |
There was a problem hiding this comment.
The claim that a grep for compact_files, optimize, "or any equivalent" returns zero matches is stronger than what can be validated by a text search ("any equivalent" is subjective and not grep-verifiable). Suggest tightening this to the specific terms searched (e.g., compact_files / optimize) or listing the exact search commands used so the statement remains auditable over time.
| 2. **ZettelForge never calls `compact_files()`.** A grep for `compact_files`, `optimize`, or any equivalent in `src/zettelforge/` returns zero matches. | |
| 2. **ZettelForge never calls `compact_files()`.** A grep for `compact_files` and `optimize` in `src/zettelforge/` returns zero matches. |
Patch release bundling the RFC-010 enrichment-pipeline hotfix with the RFC-009 Phase 0.5 latency-attribution instrumentation. Highlights: - fix(enrichment): RFC-010 — wire OllamaProvider timeout + guard consolidation shutdown race (#88) - feat(telemetry): RFC-009 Phase 0.5 — per-phase timers in remember() (#90) - docs: Phase 0.5 preliminary attribution — 98.4% of remember() wall-clock is LanceDB notes_cti writes; 7,356 uncompacted fragments identified (#91) Does NOT yet address the ~2,329 daily enrichment-job drops or the LanceDB fragmentation itself. Outbox + circuit breaker + compaction are in RFC-009 Phases 1-6 (v2.5.0). See CHANGELOG.md for the full set of changes. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Companion doc to PR #90 (Phase 0.5 instrumentation). Published as a separate PR because the artifact was authored after #90 merged and didn't make it into that squash.
Analyses Vigil's live v2.4.1 OCSF log (9,510 events, 961
remember()calls). Attributes 98.4% ofremember()wall-clock time to one LanceDBUpdateon thenotes_ctishard. Root cause candidate: 7,356 uncompacted fragment files on that shard vs 458 on the healthynotes_generalshard.Why this matters for RFC-009
The current Phase 1–6 ordering in RFC-009 targets the LLM hang path, queue overflow, and consolidation race. None of those touch the write-path latency that's actually driving the 5.7s average. The artifact proposes expanding RFC-009 §1.1 F03 to cover periodic LanceDB compaction.
Status
Preliminary. Refine or falsify once v2.4.2
phase_timings_msdata is in hand — the remaining 1.6% ofremember()time will be attributed to named phases (construct, write_note, consolidation, KG).Test plan
🤖 Generated with Claude Code