docs: graph retriever silence — root cause + remediation paths (#42) by rolandpg · Pull Request #105 · rolandpg/zettelforge

rolandpg · 2026-04-25T04:05:46Z

Summary

Closes the investigation phase of task #42 (graph retriever firing on only 1.3% of recalls in Vigil's 2026-04-24 live test). Static code trace, no behaviour change.

Root cause

`memory_manager.recall()` calls `extractor.extract_all(query, use_llm=False)` — regex only. Natural-language queries lack regex-matchable CTI tokens, so `query_entities = {}`, and `graph_retriever.retrieve_note_ids` short-circuits on line 36 returning `[]` in microseconds. The graph code path is correct; the gap is the seed.

Three remediation options

A — Run LLM-NER on the query when intent is relational/temporal/exploratory. Preferred semantic fix but depends on RFC-009 Phase 1 (LLM hardening).
B — Seed BFS from top-K vector neighbors' entities. No LLM hop, no new failure modes. Recommended.
C — Telemetry marker "no graph signal — query entities empty" so the silence becomes diagnosable. Free; one-line change.

Doc recommends B + C for v2.5.0; A is a v2.5.x experiment.

What this DOES NOT explain

The 21-22% zero-hit recall rate (task #40) is orthogonal — different failure mode. Doc explicitly says don't conflate them.

Test plan

Docs-only (no CI risk)
Merge — closes Add ASN (Autonomous System Number) extraction #42 investigation; implementation becomes a v2.5.0 scope item

🤖 Generated with Claude Code

…#42) Static code trace identifies why Vigil's 2026-04-24 live test showed kg=0ms on 148 of 150 recalls. The graph retriever path is correct; the gap is upstream: memory_manager.recall() calls extractor.extract_all(query, use_llm=False) ← regex only The query-time entity extractor is regex-only (intrusion_set, cve, actor short-list, MITRE TID, IOCs). Natural-language queries like "recent Citrix exploitation" or "who used what tool last week?" match none of those patterns, so query_entities = {}. graph_retriever.retrieve_note_ids() then short-circuits at line 36 (`if not query_entities: return []`) and the BFS never seeds — kg returns in microseconds. The 2 of 150 events that DID fire graph almost certainly contained regex-matchable tokens (test queries like "What malware did APT28 use?" — matches the apt28 intrusion_set pattern). Three fix options ranked in the doc: A. LLM-NER on the query — preferred semantic fix; small change, gated by intent so cost is bounded; depends on RFC-009 Phase 1 (LLM hardening) landing first. B. Seed BFS from top-K vector neighbors' entities — no LLM hop, no new failure modes; produces graph signal even on sparse queries. C. Telemetry tweak — log a 'no graph signal — query entities empty' marker so the silence is diagnosable instead of looking identical to a 0-result graph walk. Recommended: combine B (fix) + C (diagnostic). A is a v2.5.x experiment after Phase 1 LLM hardening. The doc explicitly notes graph-silence is ORTHOGONAL to the 21-22% zero-hit recall rate (task #40) — those are different failure modes and should not be conflated. Closes the investigation phase of task #42; the implementation choice goes into a follow-up RFC update or v2.5.x scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5adaab359

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T04:07:52Z

+- `attack_pattern` — `T\d{4}(?:\.\d{3})?`
+- IOCs — `ipv4`, `domain`, `url`, `md5`, `sha1`, `sha256`, `email`
+
+A natural-language query like *"recent Citrix exploitation"* matches **none** of these. `query_entities` returns `{}` and `resolved` is `{}`.


Fix claim that regex extractor returns an empty dict

EntityExtractor.extract_all(..., use_llm=False) does not return {} for non-matching queries; it returns a dict containing all entity types with empty lists, and memory_manager.recall() propagates that shape into resolved. Because of that, GraphRetriever.retrieve_note_ids usually skips work by iterating empty value lists, not by taking the if not query_entities early-return branch, so the root-cause narrative and follow-on remediation around "query entities empty" are inaccurate and can lead to telemetry that never triggers in the intended scenario.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Adds a research note documenting why graph retrieval contributes kg=0ms for most recalls in the Vigil live test, and proposes remediation options (LLM-NER on query, seeding traversal from vector neighbors, and adding telemetry markers). This is intended as investigation close-out / design guidance, with no runtime behavior change.

Changes:

Add a new research document tracing the recall → entity extraction → graph retrieval call chain.
Document suspected root cause for “graph silence” and rank three remediation options (A/B/C).
Add verification next-steps and cross-references for follow-on implementation work.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T04:08:41Z

+  - "Vigil DEBUG telemetry 2026-04-24 23:44Z+ — 150 recalls, 2 with kg>0"
+  - "src/zettelforge/memory_manager.py:580-624 (recall path)"
+  - "src/zettelforge/graph_retriever.py:31-93 (retrieve_note_ids + _bfs_collect)"
+  - "src/zettelforge/entity_indexer.py:69-130 (EntityExtractor regex set)"


The frontmatter data_sources line references look out of date (e.g., memory_manager.py:580-624 but recall() starts earlier, and entity_indexer.py:69-130 doesn’t cover the REGEX_PATTERNS block). Consider updating these line ranges or using stable links/identifiers so the citations don’t drift as code moves.

Copilot · 2026-04-25T04:08:41Z

+The graph retriever fires **only when the query string itself contains a regex-matchable CTI token** (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., *"recent Citrix exploitation"*, *"what tools did the actor use?"*, *"timeline of last week's incidents"*), the query-time extractor returns an empty entity dict, the BFS in `_bfs_collect` is never seeded, and `retrieve_note_ids` returns `[]` in microseconds — exactly the `kg=0ms` signal Vigil's telemetry recorded.
+
+This is **not a bug in the graph retrieval path**. The graph code works correctly when given entities. The gap is upstream: the query-time entity extraction is regex-only, while *recall queries* are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked *"What malware did APT28 use?"* (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`).


This TL;DR asserts extract_all(query, use_llm=False) returns an empty dict ({}), but EntityExtractor.extract_all() currently returns a dict with many keys and empty lists when there are no matches. In the current recall() path, resolved will also be a non-empty dict of empty lists, so the graph “silence” is due to having no seed values (and graph_latency_ms being truncated to int), not retrieve_note_ids() short-circuiting on an empty dict.

Suggested change

The graph retriever fires **only when the query string itself contains a regex-matchable CTI token** (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., *"recent Citrix exploitation"*, *"what tools did the actor use?"*, *"timeline of last week's incidents"*), the query-time extractor returns an empty entity dict, the BFS in `_bfs_collect` is never seeded, and `retrieve_note_ids` returns `[]` in microseconds — exactly the `kg=0ms` signal Vigil's telemetry recorded.

This is **not a bug in the graph retrieval path**. The graph code works correctly when given entities. The gap is upstream: the query-time entity extraction is regex-only, while *recall queries* are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked *"What malware did APT28 use?"* (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`).

The graph retriever fires **only when the query string itself contains a regex-matchable CTI token** (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., *"recent Citrix exploitation"*, *"what tools did the actor use?"*, *"timeline of last week's incidents"*), the query-time extractor still returns the normal entity-type dict, but with empty lists for every type. In the current `recall()` path, `resolved` therefore becomes a similarly populated dict of empty lists, `_bfs_collect` receives no seed values, and `retrieve_note_ids` returns `[]` quickly because there is nothing to traverse.

This is **not evidence that `retrieve_note_ids()` is short-circuiting on an empty dict**. The graph code is still being called; the observed “silence” comes from having no extracted seed entities for most natural-language recalls, and the `kg=0ms` telemetry is further explained by graph latency being truncated to `int`. The upstream gap remains the same: query-time entity extraction is regex-only, while *recall queries* are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked *"What malware did APT28 use?"* (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`).

Copilot · 2026-04-25T04:08:41Z

+`graph_retriever.py:31-37`:
+
+```python
+def retrieve_note_ids(self, query_entities, max_depth=2) -> List[ScoredResult]:
+    if not query_entities:
+        return []
+    ...
+```
+
+When `query_entities` is `{}`, `not query_entities` is `True` and we return `[]` immediately. No `kg.get_node` calls, no BFS, no walk-through. `_graph_latency_ms` measured around the call therefore lands at ~0ms because the function returns in microseconds.


In “Step 2”, the quoted if not query_entities: return [] block is correct, but it won’t trigger with the current recall() → extract_all() output (dict is non-empty, values are empty). To keep this trace accurate, describe the actual path (loop over empty value lists) and/or note that a more precise guard would be if not any(query_entities.values()): ... (and that telemetry currently logs int(_graph_latency_ms), which can show 0ms even when a small amount of work occurs).

Copilot · 2026-04-25T04:08:41Z

+### C. **Tighten the blender contract — accept `query_entities=None`**
+
+A no-op fallback that explicitly logs *"no graph signal — query entities empty"* so future operators see the gap in the telemetry stream rather than a silent 0. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event.


Option C mentions “Tighten the blender contract — accept query_entities=None”, but BlendedRetriever.blend() doesn’t take query_entities (only vector_results, graph_results, etc.). This section would be clearer if it framed the change as a graph-retriever/telemetry contract (e.g., explicitly detect “no seed entities” and log/emit a marker) rather than a blender API change.

Suggested change

### C. **Tighten the blender contract — accept `query_entities=None`**

A no-op fallback that explicitly logs *"no graph signal — query entities empty"* so future operators see the gap in the telemetry stream rather than a silent 0. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event.

### C. **Tighten the graph-retriever/telemetry contract — explicitly mark empty seed entities**

A no-op fallback that explicitly detects when `query_entities` is empty or absent before `retrieve_note_ids` runs, then logs/emits *"no graph signal — no seed entities"* so future operators see the gap in the telemetry stream rather than a silent 0. This is not a `BlendedRetriever.blend()` API change; it's a graph-retrieval / telemetry marker that distinguishes "graph path skipped because there were no seed entities" from ordinary 0-result cases. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event.

Copilot · 2026-04-25T04:08:42Z

+## Cross-references
+
+- Task #42 in this session's task list (now closeable as "investigation complete; implementation tracked separately")
+- Task #40 (zero-hit recall) — orthogonal; do not bundle
+- RFC-009 Phase 1 (LLM hardening) — soft prerequisite for fix A


This doc repeatedly refers to “task #42” for the graph-retriever investigation, but GitHub Issue #42 in this repo is about ASN extraction. To avoid confusion for future readers (and accidental auto-closing of the wrong issue), consider renaming the reference (e.g., “Vigil task 42” or link the correct issue/PR) or adding a one-line disambiguation here.

Copilot AI review requested due to automatic review settings April 25, 2026 04:05

Copilot started reviewing on behalf of rolandpg April 25, 2026 04:06 View session

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

Copilot AI reviewed Apr 25, 2026

View reviewed changes

rolandpg merged commit 942b746 into master Apr 25, 2026
15 checks passed

rolandpg deleted the docs/graph-retriever-investigation branch April 25, 2026 04:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: graph retriever silence — root cause + remediation paths (#42)#105

docs: graph retriever silence — root cause + remediation paths (#42)#105
rolandpg merged 1 commit into
masterfrom
docs/graph-retriever-investigation

rolandpg commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		The graph retriever fires only when the query string itself contains a regex-matchable CTI token (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., "recent Citrix exploitation", "what tools did the actor use?", "timeline of last week's incidents"), the query-time extractor returns an empty entity dict, the BFS in `_bfs_collect` is never seeded, and `retrieve_note_ids` returns `[]` in microseconds — exactly the `kg=0ms` signal Vigil's telemetry recorded.

		This is not a bug in the graph retrieval path. The graph code works correctly when given entities. The gap is upstream: the query-time entity extraction is regex-only, while recall queries are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked "What malware did APT28 use?" (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`).

		### C. Tighten the blender contract — accept `query_entities=None`

		A no-op fallback that explicitly logs "no graph signal — query entities empty" so future operators see the gap in the telemetry stream rather than a silent 0. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event.

Conversation

rolandpg commented Apr 25, 2026

Summary

Root cause

Three remediation options

What this DOES NOT explain

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants