fix(tests): stabilize 3 CI regressions exposed by test-suite audit sprint by rolandpg · Pull Request #67 · rolandpg/zettelforge

rolandpg · 2026-04-17T22:19:51Z

Summary

Fixes three CI-only test failures on master, each introduced or exposed by the AI test-suite audit sprint (commits af080ec, 7fdf013, 0136888). All three are test-layer fixes — zero source code (src/) changes. No new pytest plugins, no xfail/skipif reintroductions.

Background: docs/superpowers/research/2026-04-17-test-suite-audit.md.

Regression 1 — `tests/test_causal_extraction.py::test_causal_extraction` (line 86)

Root cause: commit af080ec (AI-8) replaced return False with assert len(causal_edges) > 0. Under CI env (ZETTELFORGE_LLM_PROVIDER=mock + ZETTELFORGE_BACKEND=sqlite) this can never pass: (a) MockProvider returns unparseable text, so extract_causal_triples returns []; (b) even with a seeded-JSON mock, store_causal_edges writes to SQLite via backend=self.store, while the test reads get_knowledge_graph()._edges — the JSONL singleton SQLite never touches.
Fix chosen: user's fallback — assert the ingestion pipeline ran (assert note.id) instead of asserting edge count. Diagnostic prints retained for local real-LLM runs.
Evidence: 1 passed under CI-sim env (empty AMEM_DATA_DIR and polluted).

Regression 2 — `tests/test_core.py::TestVectorRecall::test_vector_recall_latency_reasonable` (line 134)

Root cause: same enrichment-queue race as AI-6; seed mm.remember() loop does not block on background enrichment before recall() runs.
Fix chosen: apply AI-6 pattern — sync=True on each of the 10 seed remembers. Mirrors af080ec exactly.
Evidence: 20/20 passed with pytest-repeat --count=20.

Regression 3 — `tests/test_langchain_retriever.py::TestZettelForgeRetriever::test_metadata_fields` (line 66)

Root cause: embedding-similarity mismatch, not a race. The fuzzy query "XZ Utils CVE" does not score highly enough on fastembed (nomic-embed-text-v1.5-Q) against the 3-note seed corpus, so mm.recall returns 0 notes. Other tests in the class pass because they use stronger-matching APT28 queries. Deterministic failure (10/10) — not the flake described in the audit's Regression 3 write-up; the actual failure is assert len(docs) > 0 on line 66, not the "security_ops" in domains on line 79.
Fix chosen: test-only change — swap the fuzzy query for the exact CVE ID "CVE-2024-3094", which hits the CVE-regex recall path and reliably returns the seeded security_ops note. The test's intent is metadata-field presence; the specific query is incidental.
Evidence: 10/10 passed with pytest-repeat --count=10 under CI-sim env.

Rerun evidence (combined)

CI=true ZETTELFORGE_LLM_PROVIDER=mock ZETTELFORGE_BACKEND=sqlite \
  ZETTELFORGE_EMBEDDING_PROVIDER=fastembed pytest \
  tests/test_causal_extraction.py \
  tests/test_core.py::TestVectorRecall::test_vector_recall_latency_reasonable \
  tests/test_langchain_retriever.py::TestZettelForgeRetriever::test_metadata_fields -v
# => 3 passed

# 20x latency rerun: 20/20 passed

Test plan

Run each failing test solo under CI-sim env and observe it now passes
Run latency test with --count=20 to confirm race fix
Run metadata_fields with --count=10 to confirm query fix is deterministic
Run the full tests/test_core.py, tests/test_causal_extraction.py, tests/test_langchain_retriever.py (26 tests) to confirm no regressions
GitHub Actions CI on this PR passes

References

Audit doc: docs/superpowers/research/2026-04-17-test-suite-audit.md
Prior commits: af080ec (AI-6, AI-8 — source of Regression 1), 0136888 (master HEAD at sprint end)

🤖 Generated with Claude Code

…-ran The `assert len(causal_edges) > 0` introduced in af080ec (AI-8) fails deterministically in CI under ZETTELFORGE_LLM_PROVIDER=mock + BACKEND=sqlite. Causal edges require: 1. An LLM producing a parseable JSON array of triples (mock provider returns "mock response" which fails extract_json(expect="array")). 2. Routing through the JSONL KG singleton the test inspects -- but SQLiteBackend.add_kg_edge writes to SQLite, not the JSONL singleton. Neither holds in CI, so the assertion can never pass regardless of the LLM. Minimal fix: assert the ingestion pipeline ran (note.id produced) instead of asserting causal-edge presence. Diagnostic prints remain useful when this test is run locally against a real LLM. Refs audit doc docs/superpowers/research/2026-04-17-test-suite-audit.md and the AI-8 regression introduced in af080ec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…y test Same enrichment-queue race as AI-6: seed remember() calls queue LLM NER enrichment to a background worker; subsequent recall() can race ahead and observe zero results. Applies the AI-6 pattern (sync=True on seed calls) to the 10-note seed loop in test_vector_recall_latency_reasonable. Verified 20/20 with pytest-repeat --count=20 under CI-simulation env. Refs audit doc docs/superpowers/research/2026-04-17-test-suite-audit.md and the AI-6 fix pattern in af080ec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ns docs Root cause: the fuzzy query "XZ Utils CVE" does not score highly enough on fastembed (nomic-embed-text-v1.5-Q) against the 3-note seed corpus. Recall returns 0 results, so `assert len(docs) > 0` fails deterministically. This is an embedding-similarity mismatch, not a race or source-layer bug (other tests in the same file pass with APT28-centric queries that DO match their seed note). Minimal test-only fix: change the query to the exact CVE ID "CVE-2024-3094" so the CVE-regex recall path in MemoryManager reliably returns the seeded security_ops note. The test's intent is to validate metadata-field presence, so the specific query is incidental. Verified 10/10 with pytest-repeat --count=10 under CI-simulation env. Refs audit doc docs/superpowers/research/2026-04-17-test-suite-audit.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 594a258363

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

Stabilizes three CI-only test regressions uncovered during the recent test-suite audit sprint by adjusting test expectations and inputs to align with CI’s mock/SQLite/fastembed execution paths (no src/ changes).

Changes:

Update causal-extraction test to assert end-to-end ingestion succeeded rather than asserting LLM-derived causal edges under mock/SQLite.
Remove enrichment-queue race in vector recall latency test by using sync=True during seeding.
Make LangChain retriever metadata test deterministic by querying an exact CVE ID that reliably matches the seeded corpus.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`tests/test_causal_extraction.py`	Switches assertion from causal-edge count to ingestion success; adjusts `__main__` execution path.
`tests/test_core.py`	Seeds notes with `sync=True` to avoid recall racing background enrichment.
`tests/test_langchain_retriever.py`	Uses an exact CVE query to make retrieval deterministic for metadata assertions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Agent-Logs-Url: https://github.com/rolandpg/zettelforge/sessions/0933c342-dae0-4598-abf9-b0ed4b6bc462 Co-authored-by: rolandpg <48327651+rolandpg@users.noreply.github.com>

Copilot · 2026-04-17T22:33:58Z

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

huggingface.co
- Triggering command: /home/REDACTED/.local/bin/pytest pytest tests/test_causal_extraction.py -v (dns block)
- Triggering command: /home/REDACTED/.local/bin/pytest pytest tests/test_causal_extraction.py::test_causal_extraction -v (dns block)
- Triggering command: /home/REDACTED/.local/bin/pytest pytest tests/test_core.py::TestVectorRecall::test_vector_recall_latency_reasonable -v (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

…d mock Addresses PR #67 review feedback (chatgpt-codex-connector P2): the prior `assert note.id` passed even when causal extraction, JSON parsing, or KG edge persistence was completely broken, defeating the test's purpose. Now drives the full pipeline via a prompt-routed mock (same pattern as AI-3/AI-4 in tests/test_two_phase_e2e.py) and queries the SQLite backend directly via `get_causal_edges("intrusion_set", "apt28")` — where `store_causal_edges(..., backend=self.store)` actually writes, not the JSONL KnowledgeGraph singleton that the old assertion targeted. Uses `sync=True` on `mm.remember` so causal extraction runs inline and the backend write completes before the read — no enrichment-queue race. Verified: 5/5 consecutive passes under the CI-sim env, plus the full tri-file suite from PR #67 (tests/test_causal_extraction.py, tests/test_core.py::TestVectorRecall::test_vector_recall_latency_reasonable, tests/test_langchain_retriever.py) = 8 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Detection-rules-as-memory, MCP Registry publication, SQLite concurrency hardening, test-suite hygiene, and brand/docs polish. Highlights: - feat: Sigma + YARA as first-class memory entities with LLM rule explainer (#70) + Detection Rules as Memory README section (#74) - feat: MCP Registry publication (server.json + mcp-name tag) (#75) - fix: SQLite reader concurrency — 16 methods now hold _write_lock (closes #68, fixes a production read-during-write race) (#69) - fix: 3 CI test regressions stabilized (#67) - chore: test-suite hygiene — 280→305 passing, 17→10 skipped, 2→0 xfailed; migrated langchain_retriever to Pydantic V2 ConfigDict (#62, #63, #64, #65) - brand: neural-chain architecture diagram + light/dark parity, canonical security channels, refreshed social preview (#61) See CHANGELOG.md for details. Bumps: pyproject.toml, src/zettelforge/__init__.py, mkdocs.yml, server.json, SECURITY.md.

rolandpg and others added 3 commits April 17, 2026 17:13

Copilot AI review requested due to automatic review settings April 17, 2026 22:19

Copilot started reviewing on behalf of rolandpg April 17, 2026 22:20 View session

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/test_causal_extraction.py Outdated

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/test_causal_extraction.py Outdated

Copilot started work on behalf of rolandpg April 17, 2026 22:29 View session

fix(test): isolate causal extraction KG diagnostics to temp AMEM dir

598fdd2

Agent-Logs-Url: https://github.com/rolandpg/zettelforge/sessions/0933c342-dae0-4598-abf9-b0ed4b6bc462 Co-authored-by: rolandpg <48327651+rolandpg@users.noreply.github.com>

Copilot finished work on behalf of rolandpg April 17, 2026 22:34

rolandpg self-assigned this Apr 17, 2026

rolandpg merged commit ba558a3 into master Apr 17, 2026
11 checks passed

rolandpg deleted the stabilize/master-regressions branch April 17, 2026 22:39

rolandpg mentioned this pull request Apr 17, 2026

AI-5: SQLiteBackend.get_note_by_id reads without _write_lock (flaky test_apply_delete_marks_superseded) #68

Closed

rolandpg mentioned this pull request Apr 19, 2026

chore(release): v2.4.0 #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tests): stabilize 3 CI regressions exposed by test-suite audit sprint#67

fix(tests): stabilize 3 CI regressions exposed by test-suite audit sprint#67
rolandpg merged 5 commits into
masterfrom
stabilize/master-regressions

rolandpg commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI commented Apr 17, 2026

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rolandpg commented Apr 17, 2026

Summary

Regression 1 — tests/test_causal_extraction.py::test_causal_extraction (line 86)

Regression 2 — tests/test_core.py::TestVectorRecall::test_vector_recall_latency_reasonable (line 134)

Regression 3 — tests/test_langchain_retriever.py::TestZettelForgeRetriever::test_metadata_fields (line 66)

Rerun evidence (combined)

Test plan

References

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI commented Apr 17, 2026

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Regression 1 — `tests/test_causal_extraction.py::test_causal_extraction` (line 86)

Regression 2 — `tests/test_core.py::TestVectorRecall::test_vector_recall_latency_reasonable` (line 134)

Regression 3 — `tests/test_langchain_retriever.py::TestZettelForgeRetriever::test_metadata_fields` (line 66)