Skip to content

docs: v2.4.2 live-test evidence log (2026-04-24)#93

Closed
rolandpg wants to merge 1 commit into
masterfrom
docs/v242-live-test-evidence
Closed

docs: v2.4.2 live-test evidence log (2026-04-24)#93
rolandpg wants to merge 1 commit into
masterfrom
docs/v242-live-test-evidence

Conversation

@rolandpg
Copy link
Copy Markdown
Owner

Summary

Durable evidence doc capturing everything observed during the post-release v2.4.2 test session against Vigil's live CTI workload. Written so the data survives context loss and can be forwarded to Nexus (or any reviewer) to re-verify each claim against the same log files.

Why this exists

Patrick asked for notes + proof in case the session is lost or Nexus needs to review. This commits the session's key findings to the repo with full reproduction queries.

What's in it

Review hints (in the doc under "For the reviewer")

  • Falsify Phase 0.5: rerun phase-share jq; anything wildly off 98% would falsify
  • Falsify RFC-010: jq '[.[] | select(.event=="consolidation_failed" and .time > "2026-04-24T23:11:00")] | length' — any >0 means regression
  • Hidden issue worth spot-checking: graph retriever at 1.3% utilization is likely a bigger product issue than any single Phase 0.5 number

Test plan

  • Docs-only (no code change, no CI risk)
  • Merge to preserve evidence in repo history

🤖 Generated with Claude Code

Durable record of the post-release test session observing v2.4.2 +
Nemotron-3-nano under Vigil's CTI workload. Captures:

- Test timeline + config + deployment path (openclaw separate clone)
- Phase 0.5 attribution CONFIRMED: lance_index = 98.1% aggregate
  across 83 remember() calls, 99.66-99.98% across 8 tail events
- RFC-010 hotfix verified: 0 consolidation_failed post-v2.4.2
- Root-cause evidence: notes_cti has 7,576 uncompacted fragments
  (grew +220 in ~6h), vs 458 on healthy notes_general
- Observability issues found: OCSF version string hardcoded (#35),
  log.py ignoring config.yaml (#41), graph retriever firing on
  only 1.3% of recalls (#42 updated)
- Zero-hit rate unchanged by LLM swap (28% Qwen → 22% Nemotron,
  within noise) — falsifies LLM-driven hypothesis, refocuses #40
- Reproduction jq queries for every claim

Written so Nexus or a future claude-code session can verify every
number against the same log files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 24, 2026 23:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a durable, repo-persisted evidence log for the v2.4.2 post-release live test against Vigil’s CTI workload, intended to preserve observations and enable later verification against the same log/telemetry artifacts.

Changes:

  • Introduces a timestamped live-test observation document with timeline, config/deploy notes, and key findings (Phase 0.5 attribution, RFC-010 verification, LanceDB fragment tail evidence).
  • Captures operational findings and follow-up task list discovered during the session.
  • Provides a small set of reproduction commands (jq/find) for some of the reported metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +156 to +161
## Reproduction queries

```bash
LOG=/home/rolandpg/.openclaw/workspace-vigil/.zettelforge_vigil/logs/zettelforge.log
TEL=/home/rolandpg/.amem/telemetry/telemetry_$(date -u +%F).jsonl
SINCE="2026-04-24T23:11:00"
Comment on lines +158 to +162
```bash
LOG=/home/rolandpg/.openclaw/workspace-vigil/.zettelforge_vigil/logs/zettelforge.log
TEL=/home/rolandpg/.amem/telemetry/telemetry_$(date -u +%F).jsonl
SINCE="2026-04-24T23:11:00"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants