Skip to content

docs: Phase 0.5 attribution artifact (Vigil v2.4.1 log analysis)#91

Merged
rolandpg merged 1 commit into
masterfrom
docs/phase-0.5-attribution
Apr 24, 2026
Merged

docs: Phase 0.5 attribution artifact (Vigil v2.4.1 log analysis)#91
rolandpg merged 1 commit into
masterfrom
docs/phase-0.5-attribution

Conversation

@rolandpg
Copy link
Copy Markdown
Owner

Summary

Companion doc to PR #90 (Phase 0.5 instrumentation). Published as a separate PR because the artifact was authored after #90 merged and didn't make it into that squash.

Analyses Vigil's live v2.4.1 OCSF log (9,510 events, 961 remember() calls). Attributes 98.4% of remember() wall-clock time to one LanceDB Update on the notes_cti shard. Root cause candidate: 7,356 uncompacted fragment files on that shard vs 458 on the healthy notes_general shard.

Why this matters for RFC-009

The current Phase 1–6 ordering in RFC-009 targets the LLM hang path, queue overflow, and consolidation race. None of those touch the write-path latency that's actually driving the 5.7s average. The artifact proposes expanding RFC-009 §1.1 F03 to cover periodic LanceDB compaction.

Status

Preliminary. Refine or falsify once v2.4.2 phase_timings_ms data is in hand — the remaining 1.6% of remember() time will be attributed to named phases (construct, write_note, consolidation, KG).

Test plan

  • Docs-only (no code change, no CI risk)
  • Merge → include in v2.4.2 release notes

🤖 Generated with Claude Code

Companion artifact to PR #90 (which shipped the phase_timings_ms
instrumentation). Analyses 961 real remember() calls from Vigil's
live OCSF log and attributes 98.4% of remember() wall-clock to one
LanceDB notes_cti Update. Root cause: 7,356 uncompacted fragments
on the CTI shard vs 458 on the healthy general shard. Artifact
proposes expanding RFC-009 §1.1 F03 to cover periodic compaction.

Confidence: high on attribution, medium on mechanism — to be
falsified or refined by phase_timings_ms data post-v2.4.2 deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 24, 2026 22:21
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d19e3944d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +112 to +115
([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember") | .duration_ms] | sum) as $R |
([.[] | select(.event=="ocsf_file_activity" and .file.path=="notes_cti"
and .duration_ms != null) | .duration_ms] | sum) as $L |
{remember_ms: $R, lance_cti_ms: $L, lance_share_pct: ($L*100/$R)}' "$LOG"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Join file events to remember calls by note_id

The reproduction query labeled “LanceDB share of remember()” sums all ocsf_file_activity rows where .file.path=="notes_cti", but it never restricts those rows to the same remember calls (despite the Method section saying events are matched by note_id). If the log contains any other notes_cti writes (for example scripts/rebuild_index.py calling _index_in_lance()), this inflates $L and can overstate the reported 98.42% attribution.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Phase 0.5 research artifact documenting preliminary remember() latency attribution from Vigil v2.4.1 OCSF logs, intended to inform RFC-009 prioritization (and follow up after v2.4.2 phase_timings_ms lands).

Changes:

  • Introduces a new research note summarizing observed latency distribution and attributing most wall-clock time to LanceDB notes_cti updates.
  • Documents a root-cause hypothesis (fragment accumulation / lack of compaction) and recommended next actions.
  • Includes an appendix with jq/shell commands to reproduce summary metrics and fragment counts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +111 to +114
jq -s 'def sum: add;
([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember") | .duration_ms] | sum) as $R |
([.[] | select(.event=="ocsf_file_activity" and .file.path=="notes_cti"
and .duration_ms != null) | .duration_ms] | sum) as $L |
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appendix "LanceDB share of remember()" jq snippet sums all ocsf_file_activity events for file.path=="notes_cti" without joining them to ocsf_api_activity remember via note_id, which doesn't match the Method section description and could over/under-count if there are unrelated file events in the log. Consider either updating the snippet to perform the note_id correlation (or adjacency match), or clarifying in the Method/Appendix that the log contains only notes_cti file_activity events emitted during remember() so the simple sum is valid.

Suggested change
jq -s 'def sum: add;
([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember") | .duration_ms] | sum) as $R |
([.[] | select(.event=="ocsf_file_activity" and .file.path=="notes_cti"
and .duration_ms != null) | .duration_ms] | sum) as $L |
jq -s 'def sum: (add // 0);
([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember")
| .duration_ms] | sum) as $R |
([.[] | select(.event=="ocsf_api_activity" and .activity_name=="remember"
and .note_id != null)
| .note_id] | unique) as $remember_note_ids |
([.[] | . as $e
| select(.event=="ocsf_file_activity" and .file.path=="notes_cti"
and .duration_ms != null and .note_id != null
and (($remember_note_ids | index($e.note_id)) != null))
| .duration_ms] | sum) as $L |

Copilot uses AI. Check for mistakes.
Two facts in combination:

1. **Every LanceDB write creates a new fragment.** This is by design — Lance is an append-optimized columnar store.
2. **ZettelForge never calls `compact_files()`.** A grep for `compact_files`, `optimize`, or any equivalent in `src/zettelforge/` returns zero matches.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The claim that a grep for compact_files, optimize, "or any equivalent" returns zero matches is stronger than what can be validated by a text search ("any equivalent" is subjective and not grep-verifiable). Suggest tightening this to the specific terms searched (e.g., compact_files / optimize) or listing the exact search commands used so the statement remains auditable over time.

Suggested change
2. **ZettelForge never calls `compact_files()`.** A grep for `compact_files`, `optimize`, or any equivalent in `src/zettelforge/` returns zero matches.
2. **ZettelForge never calls `compact_files()`.** A grep for `compact_files` and `optimize` in `src/zettelforge/` returns zero matches.

Copilot uses AI. Check for mistakes.
@rolandpg rolandpg merged commit 5bb0cc5 into master Apr 24, 2026
15 checks passed
@rolandpg rolandpg deleted the docs/phase-0.5-attribution branch April 24, 2026 22:23
rolandpg added a commit that referenced this pull request Apr 24, 2026
Patch release bundling the RFC-010 enrichment-pipeline hotfix with
the RFC-009 Phase 0.5 latency-attribution instrumentation.

Highlights:
- fix(enrichment): RFC-010 — wire OllamaProvider timeout + guard
  consolidation shutdown race (#88)
- feat(telemetry): RFC-009 Phase 0.5 — per-phase timers in
  remember() (#90)
- docs: Phase 0.5 preliminary attribution — 98.4% of remember()
  wall-clock is LanceDB notes_cti writes; 7,356 uncompacted
  fragments identified (#91)

Does NOT yet address the ~2,329 daily enrichment-job drops or the
LanceDB fragmentation itself. Outbox + circuit breaker + compaction
are in RFC-009 Phases 1-6 (v2.5.0).

See CHANGELOG.md for the full set of changes.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants