Skip to content

docs: graph retriever silence — root cause + remediation paths (#42)#105

Merged
rolandpg merged 1 commit into
masterfrom
docs/graph-retriever-investigation
Apr 25, 2026
Merged

docs: graph retriever silence — root cause + remediation paths (#42)#105
rolandpg merged 1 commit into
masterfrom
docs/graph-retriever-investigation

Conversation

@rolandpg
Copy link
Copy Markdown
Owner

Summary

Closes the investigation phase of task #42 (graph retriever firing on only 1.3% of recalls in Vigil's 2026-04-24 live test). Static code trace, no behaviour change.

Root cause

`memory_manager.recall()` calls `extractor.extract_all(query, use_llm=False)` — regex only. Natural-language queries lack regex-matchable CTI tokens, so `query_entities = {}`, and `graph_retriever.retrieve_note_ids` short-circuits on line 36 returning `[]` in microseconds. The graph code path is correct; the gap is the seed.

Three remediation options

  • A — Run LLM-NER on the query when intent is relational/temporal/exploratory. Preferred semantic fix but depends on RFC-009 Phase 1 (LLM hardening).
  • B — Seed BFS from top-K vector neighbors' entities. No LLM hop, no new failure modes. Recommended.
  • C — Telemetry marker "no graph signal — query entities empty" so the silence becomes diagnosable. Free; one-line change.

Doc recommends B + C for v2.5.0; A is a v2.5.x experiment.

What this DOES NOT explain

The 21-22% zero-hit recall rate (task #40) is orthogonal — different failure mode. Doc explicitly says don't conflate them.

Test plan

🤖 Generated with Claude Code

…#42)

Static code trace identifies why Vigil's 2026-04-24 live test showed
kg=0ms on 148 of 150 recalls. The graph retriever path is correct;
the gap is upstream:

  memory_manager.recall() calls
    extractor.extract_all(query, use_llm=False)  ← regex only

The query-time entity extractor is regex-only (intrusion_set,
cve, actor short-list, MITRE TID, IOCs). Natural-language queries
like "recent Citrix exploitation" or "who used what tool last
week?" match none of those patterns, so query_entities = {}.
graph_retriever.retrieve_note_ids() then short-circuits at line 36
(`if not query_entities: return []`) and the BFS never seeds — kg
returns in microseconds.

The 2 of 150 events that DID fire graph almost certainly contained
regex-matchable tokens (test queries like "What malware did APT28
use?" — matches the apt28 intrusion_set pattern).

Three fix options ranked in the doc:

  A. LLM-NER on the query — preferred semantic fix; small change,
     gated by intent so cost is bounded; depends on RFC-009 Phase 1
     (LLM hardening) landing first.
  B. Seed BFS from top-K vector neighbors' entities — no LLM hop,
     no new failure modes; produces graph signal even on sparse
     queries.
  C. Telemetry tweak — log a 'no graph signal — query entities
     empty' marker so the silence is diagnosable instead of looking
     identical to a 0-result graph walk.

Recommended: combine B (fix) + C (diagnostic). A is a v2.5.x
experiment after Phase 1 LLM hardening.

The doc explicitly notes graph-silence is ORTHOGONAL to the 21-22%
zero-hit recall rate (task #40) — those are different failure modes
and should not be conflated.

Closes the investigation phase of task #42; the implementation
choice goes into a follow-up RFC update or v2.5.x scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 25, 2026 04:05
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5adaab359

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- `attack_pattern` — `T\d{4}(?:\.\d{3})?`
- IOCs — `ipv4`, `domain`, `url`, `md5`, `sha1`, `sha256`, `email`

A natural-language query like *"recent Citrix exploitation"* matches **none** of these. `query_entities` returns `{}` and `resolved` is `{}`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix claim that regex extractor returns an empty dict

EntityExtractor.extract_all(..., use_llm=False) does not return {} for non-matching queries; it returns a dict containing all entity types with empty lists, and memory_manager.recall() propagates that shape into resolved. Because of that, GraphRetriever.retrieve_note_ids usually skips work by iterating empty value lists, not by taking the if not query_entities early-return branch, so the root-cause narrative and follow-on remediation around "query entities empty" are inaccurate and can lead to telemetry that never triggers in the intended scenario.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a research note documenting why graph retrieval contributes kg=0ms for most recalls in the Vigil live test, and proposes remediation options (LLM-NER on query, seeding traversal from vector neighbors, and adding telemetry markers). This is intended as investigation close-out / design guidance, with no runtime behavior change.

Changes:

  • Add a new research document tracing the recall → entity extraction → graph retrieval call chain.
  • Document suspected root cause for “graph silence” and rank three remediation options (A/B/C).
  • Add verification next-steps and cross-references for follow-on implementation work.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +8 to +11
- "Vigil DEBUG telemetry 2026-04-24 23:44Z+ — 150 recalls, 2 with kg>0"
- "src/zettelforge/memory_manager.py:580-624 (recall path)"
- "src/zettelforge/graph_retriever.py:31-93 (retrieve_note_ids + _bfs_collect)"
- "src/zettelforge/entity_indexer.py:69-130 (EntityExtractor regex set)"
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frontmatter data_sources line references look out of date (e.g., memory_manager.py:580-624 but recall() starts earlier, and entity_indexer.py:69-130 doesn’t cover the REGEX_PATTERNS block). Consider updating these line ranges or using stable links/identifiers so the citations don’t drift as code moves.

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +20
The graph retriever fires **only when the query string itself contains a regex-matchable CTI token** (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., *"recent Citrix exploitation"*, *"what tools did the actor use?"*, *"timeline of last week's incidents"*), the query-time extractor returns an empty entity dict, the BFS in `_bfs_collect` is never seeded, and `retrieve_note_ids` returns `[]` in microseconds — exactly the `kg=0ms` signal Vigil's telemetry recorded.

This is **not a bug in the graph retrieval path**. The graph code works correctly when given entities. The gap is upstream: the query-time entity extraction is regex-only, while *recall queries* are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked *"What malware did APT28 use?"* (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`).
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TL;DR asserts extract_all(query, use_llm=False) returns an empty dict ({}), but EntityExtractor.extract_all() currently returns a dict with many keys and empty lists when there are no matches. In the current recall() path, resolved will also be a non-empty dict of empty lists, so the graph “silence” is due to having no seed values (and graph_latency_ms being truncated to int), not retrieve_note_ids() short-circuiting on an empty dict.

Suggested change
The graph retriever fires **only when the query string itself contains a regex-matchable CTI token** (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., *"recent Citrix exploitation"*, *"what tools did the actor use?"*, *"timeline of last week's incidents"*), the query-time extractor returns an empty entity dict, the BFS in `_bfs_collect` is never seeded, and `retrieve_note_ids` returns `[]` in microseconds — exactly the `kg=0ms` signal Vigil's telemetry recorded.
This is **not a bug in the graph retrieval path**. The graph code works correctly when given entities. The gap is upstream: the query-time entity extraction is regex-only, while *recall queries* are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked *"What malware did APT28 use?"* (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`).
The graph retriever fires **only when the query string itself contains a regex-matchable CTI token** (CVE, APTNN, MITRE TID, IP/domain/hash). For natural-language queries that don't (e.g., *"recent Citrix exploitation"*, *"what tools did the actor use?"*, *"timeline of last week's incidents"*), the query-time extractor still returns the normal entity-type dict, but with empty lists for every type. In the current `recall()` path, `resolved` therefore becomes a similarly populated dict of empty lists, `_bfs_collect` receives no seed values, and `retrieve_note_ids` returns `[]` quickly because there is nothing to traverse.
This is **not evidence that `retrieve_note_ids()` is short-circuiting on an empty dict**. The graph code is still being called; the observed “silence” comes from having no extracted seed entities for most natural-language recalls, and the `kg=0ms` telemetry is further explained by graph latency being truncated to `int`. The upstream gap remains the same: query-time entity extraction is regex-only, while *recall queries* are mostly written in natural language. The 2 of 150 calls that fired graph contained literal regex-matchable strings — likely the recall test that asked *"What malware did APT28 use?"* (matches the `apt28` regex pattern in `EntityExtractor.REGEX_PATTERNS["intrusion_set"]`).

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +57
`graph_retriever.py:31-37`:

```python
def retrieve_note_ids(self, query_entities, max_depth=2) -> List[ScoredResult]:
if not query_entities:
return []
...
```

When `query_entities` is `{}`, `not query_entities` is `True` and we return `[]` immediately. No `kg.get_node` calls, no BFS, no walk-through. `_graph_latency_ms` measured around the call therefore lands at ~0ms because the function returns in microseconds.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In “Step 2”, the quoted if not query_entities: return [] block is correct, but it won’t trigger with the current recall()extract_all() output (dict is non-empty, values are empty). To keep this trace accurate, describe the actual path (loop over empty value lists) and/or note that a more precise guard would be if not any(query_entities.values()): ... (and that telemetry currently logs int(_graph_latency_ms), which can show 0ms even when a small amount of work occurs).

Copilot uses AI. Check for mistakes.
Comment on lines +94 to +96
### C. **Tighten the blender contract — accept `query_entities=None`**

A no-op fallback that explicitly logs *"no graph signal — query entities empty"* so future operators see the gap in the telemetry stream rather than a silent 0. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option C mentions “Tighten the blender contract — accept query_entities=None”, but BlendedRetriever.blend() doesn’t take query_entities (only vector_results, graph_results, etc.). This section would be clearer if it framed the change as a graph-retriever/telemetry contract (e.g., explicitly detect “no seed entities” and log/emit a marker) rather than a blender API change.

Suggested change
### C. **Tighten the blender contract — accept `query_entities=None`**
A no-op fallback that explicitly logs *"no graph signal — query entities empty"* so future operators see the gap in the telemetry stream rather than a silent 0. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event.
### C. **Tighten the graph-retriever/telemetry contract — explicitly mark empty seed entities**
A no-op fallback that explicitly detects when `query_entities` is empty or absent before `retrieve_note_ids` runs, then logs/emits *"no graph signal — no seed entities"* so future operators see the gap in the telemetry stream rather than a silent 0. This is not a `BlendedRetriever.blend()` API change; it's a graph-retrieval / telemetry marker that distinguishes "graph path skipped because there were no seed entities" from ordinary 0-result cases. Doesn't fix the silence but makes it diagnosable. Free; can ship today as a one-line change to the OCSF event.

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +116
## Cross-references

- Task #42 in this session's task list (now closeable as "investigation complete; implementation tracked separately")
- Task #40 (zero-hit recall) — orthogonal; do not bundle
- RFC-009 Phase 1 (LLM hardening) — soft prerequisite for fix A
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc repeatedly refers to “task #42” for the graph-retriever investigation, but GitHub Issue #42 in this repo is about ASN extraction. To avoid confusion for future readers (and accidental auto-closing of the wrong issue), consider renaming the reference (e.g., “Vigil task 42” or link the correct issue/PR) or adding a one-line disambiguation here.

Copilot uses AI. Check for mistakes.
@rolandpg rolandpg merged commit 942b746 into master Apr 25, 2026
15 checks passed
@rolandpg rolandpg deleted the docs/graph-retriever-investigation branch April 25, 2026 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ASN (Autonomous System Number) extraction

2 participants