docs: graph retriever silence — root cause + remediation paths (#42)#105
Conversation
…#42) Static code trace identifies why Vigil's 2026-04-24 live test showed kg=0ms on 148 of 150 recalls. The graph retriever path is correct; the gap is upstream: memory_manager.recall() calls extractor.extract_all(query, use_llm=False) ← regex only The query-time entity extractor is regex-only (intrusion_set, cve, actor short-list, MITRE TID, IOCs). Natural-language queries like "recent Citrix exploitation" or "who used what tool last week?" match none of those patterns, so query_entities = {}. graph_retriever.retrieve_note_ids() then short-circuits at line 36 (`if not query_entities: return []`) and the BFS never seeds — kg returns in microseconds. The 2 of 150 events that DID fire graph almost certainly contained regex-matchable tokens (test queries like "What malware did APT28 use?" — matches the apt28 intrusion_set pattern). Three fix options ranked in the doc: A. LLM-NER on the query — preferred semantic fix; small change, gated by intent so cost is bounded; depends on RFC-009 Phase 1 (LLM hardening) landing first. B. Seed BFS from top-K vector neighbors' entities — no LLM hop, no new failure modes; produces graph signal even on sparse queries. C. Telemetry tweak — log a 'no graph signal — query entities empty' marker so the silence is diagnosable instead of looking identical to a 0-result graph walk. Recommended: combine B (fix) + C (diagnostic). A is a v2.5.x experiment after Phase 1 LLM hardening. The doc explicitly notes graph-silence is ORTHOGONAL to the 21-22% zero-hit recall rate (task #40) — those are different failure modes and should not be conflated. Closes the investigation phase of task #42; the implementation choice goes into a follow-up RFC update or v2.5.x scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5adaab359
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - `attack_pattern` — `T\d{4}(?:\.\d{3})?` | ||
| - IOCs — `ipv4`, `domain`, `url`, `md5`, `sha1`, `sha256`, `email` | ||
|
|
||
| A natural-language query like *"recent Citrix exploitation"* matches **none** of these. `query_entities` returns `{}` and `resolved` is `{}`. |
There was a problem hiding this comment.
Fix claim that regex extractor returns an empty dict
EntityExtractor.extract_all(..., use_llm=False) does not return {} for non-matching queries; it returns a dict containing all entity types with empty lists, and memory_manager.recall() propagates that shape into resolved. Because of that, GraphRetriever.retrieve_note_ids usually skips work by iterating empty value lists, not by taking the if not query_entities early-return branch, so the root-cause narrative and follow-on remediation around "query entities empty" are inaccurate and can lead to telemetry that never triggers in the intended scenario.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a research note documenting why graph retrieval contributes kg=0ms for most recalls in the Vigil live test, and proposes remediation options (LLM-NER on query, seeding traversal from vector neighbors, and adding telemetry markers). This is intended as investigation close-out / design guidance, with no runtime behavior change.
Changes:
- Add a new research document tracing the recall → entity extraction → graph retrieval call chain.
- Document suspected root cause for “graph silence” and rank three remediation options (A/B/C).
- Add verification next-steps and cross-references for follow-on implementation work.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - "Vigil DEBUG telemetry 2026-04-24 23:44Z+ — 150 recalls, 2 with kg>0" | ||
| - "src/zettelforge/memory_manager.py:580-624 (recall path)" | ||
| - "src/zettelforge/graph_retriever.py:31-93 (retrieve_note_ids + _bfs_collect)" | ||
| - "src/zettelforge/entity_indexer.py:69-130 (EntityExtractor regex set)" |
There was a problem hiding this comment.
The frontmatter data_sources line references look out of date (e.g., memory_manager.py:580-624 but recall() starts earlier, and entity_indexer.py:69-130 doesn’t cover the REGEX_PATTERNS block). Consider updating these line ranges or using stable links/identifiers so the citations don’t drift as code moves.
| The graph retriever fires **only when the query string itself contains a regex-matchable CTI token** (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., *"recent Citrix exploitation"*, *"what tools did the actor use?"*, *"timeline of last week's incidents"*), the query-time extractor returns an empty entity dict, the BFS in `_bfs_collect` is never seeded, and `retrieve_note_ids` returns `[]` in microseconds — exactly the `kg=0ms` signal Vigil's telemetry recorded. | ||
|
|
||
| This is **not a bug in the graph retrieval path**. The graph code works correctly when given entities. The gap is upstream: the query-time entity extraction is regex-only, while *recall queries* are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked *"What malware did APT28 use?"* (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`). |
There was a problem hiding this comment.
This TL;DR asserts extract_all(query, use_llm=False) returns an empty dict ({}), but EntityExtractor.extract_all() currently returns a dict with many keys and empty lists when there are no matches. In the current recall() path, resolved will also be a non-empty dict of empty lists, so the graph “silence” is due to having no seed values (and graph_latency_ms being truncated to int), not retrieve_note_ids() short-circuiting on an empty dict.
| The graph retriever fires **only when the query string itself contains a regex-matchable CTI token** (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., *"recent Citrix exploitation"*, *"what tools did the actor use?"*, *"timeline of last week's incidents"*), the query-time extractor returns an empty entity dict, the BFS in `_bfs_collect` is never seeded, and `retrieve_note_ids` returns `[]` in microseconds — exactly the `kg=0ms` signal Vigil's telemetry recorded. | |
| This is **not a bug in the graph retrieval path**. The graph code works correctly when given entities. The gap is upstream: the query-time entity extraction is regex-only, while *recall queries* are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked *"What malware did APT28 use?"* (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`). | |
| The graph retriever fires **only when the query string itself contains a regex-matchable CTI token** (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., *"recent Citrix exploitation"*, *"what tools did the actor use?"*, *"timeline of last week's incidents"*), the query-time extractor still returns the normal entity-type dict, but with empty lists for every type. In the current `recall()` path, `resolved` therefore becomes a similarly populated dict of empty lists, `_bfs_collect` receives no seed values, and `retrieve_note_ids` returns `[]` quickly because there is nothing to traverse. | |
| This is **not evidence that `retrieve_note_ids()` is short-circuiting on an empty dict**. The graph code is still being called; the observed “silence” comes from having no extracted seed entities for most natural-language recalls, and the `kg=0ms` telemetry is further explained by graph latency being truncated to `int`. The upstream gap remains the same: query-time entity extraction is regex-only, while *recall queries* are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked *"What malware did APT28 use?"* (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`). |
| `graph_retriever.py:31-37`: | ||
|
|
||
| ```python | ||
| def retrieve_note_ids(self, query_entities, max_depth=2) -> List[ScoredResult]: | ||
| if not query_entities: | ||
| return [] | ||
| ... | ||
| ``` | ||
|
|
||
| When `query_entities` is `{}`, `not query_entities` is `True` and we return `[]` immediately. No `kg.get_node` calls, no BFS, no walk-through. `_graph_latency_ms` measured around the call therefore lands at ~0ms because the function returns in microseconds. |
There was a problem hiding this comment.
In “Step 2”, the quoted if not query_entities: return [] block is correct, but it won’t trigger with the current recall() → extract_all() output (dict is non-empty, values are empty). To keep this trace accurate, describe the actual path (loop over empty value lists) and/or note that a more precise guard would be if not any(query_entities.values()): ... (and that telemetry currently logs int(_graph_latency_ms), which can show 0ms even when a small amount of work occurs).
| ### C. **Tighten the blender contract — accept `query_entities=None`** | ||
|
|
||
| A no-op fallback that explicitly logs *"no graph signal — query entities empty"* so future operators see the gap in the telemetry stream rather than a silent 0. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event. |
There was a problem hiding this comment.
Option C mentions “Tighten the blender contract — accept query_entities=None”, but BlendedRetriever.blend() doesn’t take query_entities (only vector_results, graph_results, etc.). This section would be clearer if it framed the change as a graph-retriever/telemetry contract (e.g., explicitly detect “no seed entities” and log/emit a marker) rather than a blender API change.
| ### C. **Tighten the blender contract — accept `query_entities=None`** | |
| A no-op fallback that explicitly logs *"no graph signal — query entities empty"* so future operators see the gap in the telemetry stream rather than a silent 0. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event. | |
| ### C. **Tighten the graph-retriever/telemetry contract — explicitly mark empty seed entities** | |
| A no-op fallback that explicitly detects when `query_entities` is empty or absent before `retrieve_note_ids` runs, then logs/emits *"no graph signal — no seed entities"* so future operators see the gap in the telemetry stream rather than a silent 0. This is not a `BlendedRetriever.blend()` API change; it's a graph-retrieval / telemetry marker that distinguishes "graph path skipped because there were no seed entities" from ordinary 0-result cases. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event. |
| ## Cross-references | ||
|
|
||
| - Task #42 in this session's task list (now closeable as "investigation complete; implementation tracked separately") | ||
| - Task #40 (zero-hit recall) — orthogonal; do not bundle | ||
| - RFC-009 Phase 1 (LLM hardening) — soft prerequisite for fix A |
There was a problem hiding this comment.
This doc repeatedly refers to “task #42” for the graph-retriever investigation, but GitHub Issue #42 in this repo is about ASN extraction. To avoid confusion for future readers (and accidental auto-closing of the wrong issue), consider renaming the reference (e.g., “Vigil task 42” or link the correct issue/PR) or adding a one-line disambiguation here.
Summary
Closes the investigation phase of task #42 (graph retriever firing on only 1.3% of recalls in Vigil's 2026-04-24 live test). Static code trace, no behaviour change.
Root cause
`memory_manager.recall()` calls `extractor.extract_all(query, use_llm=False)` — regex only. Natural-language queries lack regex-matchable CTI tokens, so `query_entities = {}`, and `graph_retriever.retrieve_note_ids` short-circuits on line 36 returning `[]` in microseconds. The graph code path is correct; the gap is the seed.
Three remediation options
Doc recommends B + C for v2.5.0; A is a v2.5.x experiment.
What this DOES NOT explain
The 21-22% zero-hit recall rate (task #40) is orthogonal — different failure mode. Doc explicitly says don't conflate them.
Test plan
🤖 Generated with Claude Code