docs: v2.4.2 live-test evidence log (2026-04-24) by rolandpg · Pull Request #93 · rolandpg/zettelforge

rolandpg · 2026-04-24T23:54:10Z

Summary

Durable evidence doc capturing everything observed during the post-release v2.4.2 test session against Vigil's live CTI workload. Written so the data survives context loss and can be forwarded to Nexus (or any reviewer) to re-verify each claim against the same log files.

Why this exists

Patrick asked for notes + proof in case the session is lost or Nexus needs to review. This commits the session's key findings to the repo with full reproduction queries.

What's in it

Test timeline (v2.4.2 release → Qwen stop → v2.4.2 first emission → DEBUG activation → Nemotron traffic → data cutoff)
Config under test + deployment-path note (openclaw-clone, not PyPI)
Phase 0.5 attribution CONFIRMED on first-party phase_timings_ms data: 98.1% aggregate, 99.66–99.98% on 8 tail events
RFC-010 verified: 0 consolidation_failed events post-v2.4.2 vs 4 on v2.4.1 same day
Root cause for tails: notes_cti has 7,576 uncompacted fragments (grew +220 in ~6h)
Tooling issues found in-session: OCSF version hardcoded (fix: intent classifier — CTI relational queries get graph traversal #35), log.py config.yaml level ignored (Add example: Jupyter notebook CTI analysis workflow #41), graph retriever firing on 1.3% of recalls (Add ASN (Autonomous System Number) extraction #42)
Zero-hit rate unchanged by LLM swap — falsifies the "Qwen is the culprit" hypothesis and refocuses CrewAI tool wrapper #40
Reproduction jq queries for every number claimed

Review hints (in the doc under "For the reviewer")

Falsify Phase 0.5: rerun phase-share jq; anything wildly off 98% would falsify
Falsify RFC-010: jq '[.[] | select(.event=="consolidation_failed" and .time > "2026-04-24T23:11:00")] | length' — any >0 means regression
Hidden issue worth spot-checking: graph retriever at 1.3% utilization is likely a bigger product issue than any single Phase 0.5 number

Test plan

Docs-only (no code change, no CI risk)
Merge to preserve evidence in repo history

🤖 Generated with Claude Code

Durable record of the post-release test session observing v2.4.2 + Nemotron-3-nano under Vigil's CTI workload. Captures: - Test timeline + config + deployment path (openclaw separate clone) - Phase 0.5 attribution CONFIRMED: lance_index = 98.1% aggregate across 83 remember() calls, 99.66-99.98% across 8 tail events - RFC-010 hotfix verified: 0 consolidation_failed post-v2.4.2 - Root-cause evidence: notes_cti has 7,576 uncompacted fragments (grew +220 in ~6h), vs 458 on healthy notes_general - Observability issues found: OCSF version string hardcoded (#35), log.py ignoring config.yaml (#41), graph retriever firing on only 1.3% of recalls (#42 updated) - Zero-hit rate unchanged by LLM swap (28% Qwen → 22% Nemotron, within noise) — falsifies LLM-driven hypothesis, refocuses #40 - Reproduction jq queries for every claim Written so Nexus or a future claude-code session can verify every number against the same log files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a durable, repo-persisted evidence log for the v2.4.2 post-release live test against Vigil’s CTI workload, intended to preserve observations and enable later verification against the same log/telemetry artifacts.

Changes:

Introduces a timestamped live-test observation document with timeline, config/deploy notes, and key findings (Phase 0.5 attribution, RFC-010 verification, LanceDB fragment tail evidence).
Captures operational findings and follow-up task list discovered during the session.
Provides a small set of reproduction commands (jq/find) for some of the reported metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+## Reproduction queries
+
+```bash
+LOG=/home/rolandpg/.openclaw/workspace-vigil/.zettelforge_vigil/logs/zettelforge.log
+TEL=/home/rolandpg/.amem/telemetry/telemetry_$(date -u +%F).jsonl
+SINCE="2026-04-24T23:11:00"


+```bash
+LOG=/home/rolandpg/.openclaw/workspace-vigil/.zettelforge_vigil/logs/zettelforge.log
+TEL=/home/rolandpg/.amem/telemetry/telemetry_$(date -u +%F).jsonl
+SINCE="2026-04-24T23:11:00"
+


Copilot AI review requested due to automatic review settings April 24, 2026 23:54

Copilot started reviewing on behalf of rolandpg April 24, 2026 23:54 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

This was referenced Apr 25, 2026

feat(scripts): compact_lance — one-shot LanceDB shard compaction tool #94

Merged

feat: v2.4.3 prep — OCSF version self-correct + log-level env var + fastembed preload #96

Merged

rolandpg closed this Apr 25, 2026

rolandpg mentioned this pull request Apr 25, 2026

docs: Phase 0.5 attribution PRELIMINARY → CONFIRMED (mechanism corrected) #97

Merged

3 tasks

rolandpg deleted the docs/v242-live-test-evidence branch April 28, 2026 04:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: v2.4.2 live-test evidence log (2026-04-24)#93

docs: v2.4.2 live-test evidence log (2026-04-24)#93
rolandpg wants to merge 1 commit into
masterfrom
docs/v242-live-test-evidence

rolandpg commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rolandpg commented Apr 24, 2026

Summary

Why this exists

What's in it

Review hints (in the doc under "For the reviewer")

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants