feat(observability): Tier 0/1/2 telemetry patches#95
Conversation
Surfaces blind spots discovered during 2026-04-24 Nemotron-3-nano debug
session. ZettelForge had heavy logging at the LLM call layer but parse
failures and empty completions were largely opaque - we could see WHICH
calls failed but never WHAT the model returned.
Tier 1: ollama_provider.py
- Every Ollama call logs duration_ms, prompt_chars, response_chars,
eval_count, prompt_eval_count, done_reason, json_mode, model
- Empty completions promoted to WARNING with full prompt+response
preview (was previously silent at this layer, then silently swallowed
downstream)
- Exceptions logged with classified error type before re-raise
Tier 2a: memory_evolver.py
- evolution_parse_retry and evolution_parse_failed now carry
raw_preview, raw_chars, prompt_preview, new_note_id - was logging
only neighbor_id, leaving the dominant pipeline failure mode
unanalyzable
Tier 2b: fact_extractor.py
- Empty completions now logged (was a silent return, hiding LLM failures
from the audit trail entirely)
- parse_failed gets reason classification: empty_completion vs
json_decode
Tier 2c: entity_indexer.py
- ner_output parse_failed gets reason classification + raw_chars
Tier 2d: memory_manager.py
- request_id (already generated for OCSF audit) now bound to
structlog.contextvars at remember() entry, propagating to every
downstream log line as trace_id automatically. Cleared in finally
block so it doesn't leak across calls. Background workers (evolution,
NER, causal) do NOT yet rebind from job - tracked in RFC-008.
log.py:
- Suppress httpcore/httpx DEBUG noise. Single 17-min test run produced
1,612 connect_tcp/send_request_headers debug lines for zero
diagnostic value.
Verification: Synthetic empty-response smoke test produces full payload
including propagated trace_id, domain, provider, duration_ms,
prompt_eval_count, done_reason, prompt_preview, response_preview.
RFC-008 (in parent repo) covers Tier 3: SQLite/LanceDB/embedding
instrumentation, enrichment-queue heartbeat, background-worker
trace_id rebinding, lifecycle events.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds Tier 0/1/2 observability across the synchronous remember() pipeline so LLM calls and downstream parsing failures can be correlated and diagnosed via structured telemetry (including a propagated trace_id).
Changes:
- Bind
trace_id(plusdomain/source_type) intostructlog.contextvarsatMemoryManager.remember()entry and clear on exit. - Add richer LLM-call and parse-failure diagnostics (previews, char counts, reason classification) across evolver, fact extraction, and NER parsing.
- Reduce log noise by raising
httpcore/httpxlogger levels and attempt to bootstrap log level from config.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/zettelforge/memory_manager.py | Binds/unbinds trace_id contextvars around remember() via a new _remember_inner() wrapper. |
| src/zettelforge/memory_evolver.py | Adds retry/failure telemetry payloads (raw/prompt previews, char counts, note IDs). |
| src/zettelforge/log.py | Suppresses noisy HTTP transport debug logs; attempts to set logging level from config during logger bootstrap. |
| src/zettelforge/llm_providers/ollama_provider.py | Logs per-call latency/metadata and escalates empty completions/exceptions with diagnostic fields. |
| src/zettelforge/fact_extractor.py | Classifies parse_failed reasons (empty vs JSON decode) and adds raw size context. |
| src/zettelforge/entity_indexer.py | Improves NER parse_failed telemetry with reason classification and raw size context. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca8c62f5f1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| structlog.contextvars.unbind_contextvars( | ||
| "trace_id", "domain", "source_type" |
There was a problem hiding this comment.
Restore parent trace context after nested remember calls
This unconditional unbind_contextvars(...) drops any previously-bound context when remember() is called reentrantly, which happens in the evolve flow (remember_with_extraction() via MemoryUpdater.apply() calls self.mm.remember(...)). In evolve=True requests that produce ADD/UPDATE/DELETE operations, the inner call removes the outer trace_id/domain/source_type, so later log lines from the parent pipeline can no longer be correlated to the original request. Use context restoration (bound_contextvars or token reset) instead of unbinding fixed keys.
Useful? React with 👍 / 👎.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Patrick Roland <48327651+rolandpg@users.noreply.github.com>
Agent-Logs-Url: https://github.com/rolandpg/zettelforge/sessions/36e27378-e4ad-4f8a-a987-ad95563e19a0 Co-authored-by: rolandpg <48327651+rolandpg@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds structured, end-to-end observability for synchronous remember() execution and LLM interactions, enabling correlation via trace_id and providing richer diagnostics for parse/empty-completion failures discovered during large ingestions.
Changes:
- Bind a per-
remember()trace_idintostructlog.contextvarsand unbind on exit to correlate downstream synchronous logs. - Add richer parse-failure telemetry (raw/prompt previews, char counts, reason classification) in evolution, fact extraction, and NER parsing.
- Add Ollama call telemetry (timings, token/eval metadata, empty-completion surfacing) and suppress noisy httpcore/httpx debug logs; attempt to set logging level from config.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/zettelforge/memory_manager.py | Wraps remember() with contextvar binding for trace_id propagation into downstream logs. |
| src/zettelforge/memory_evolver.py | Adds diagnostic context to evolution parse retry/failure logs. |
| src/zettelforge/log.py | Suppresses HTTP transport debug noise; attempts to derive logging level from config during lazy logger init. |
| src/zettelforge/llm_providers/ollama_provider.py | Adds per-call telemetry, duration metrics, and empty-response surfacing for Ollama calls. |
| src/zettelforge/fact_extractor.py | Logs and classifies empty completions vs JSON decode failures for fact extraction parsing. |
| src/zettelforge/entity_indexer.py | Classifies NER parse failures and includes raw size/preview diagnostics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Load config to get logging level (RFC-007 telemetry support) | ||
| try: | ||
| from zettelforge.config import get_config | ||
| cfg = get_config() | ||
| log_level = cfg.logging.level if hasattr(cfg, 'logging') else "INFO" | ||
| except Exception: | ||
| log_level = "INFO" | ||
|
|
||
| configure_logging(level=log_level, log_file=log_file, audit_log_file=audit_log_file) |
| # Load config to get logging level (RFC-007 telemetry support) | ||
| try: | ||
| from zettelforge.config import get_config | ||
| cfg = get_config() | ||
| log_level = cfg.logging.level if hasattr(cfg, 'logging') else "INFO" | ||
| except Exception: | ||
| log_level = "INFO" | ||
|
|
||
| configure_logging(level=log_level, log_file=log_file, audit_log_file=audit_log_file) |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Patrick Roland <48327651+rolandpg@users.noreply.github.com>
CI lint job 72977891617 failed on ruff format --check. Diff is pure cosmetic — quote style normalization, single-line wrap of unbind args, blank line after import. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Surfaces blind spots discovered during the 2026-04-24 Nemotron-3-nano debug session — 7,892 notes ingested, 651
evolution_parse_failedevents, 306 empty completions, and we could not see what the model was actually returning.After this PR, every LLM call logs full I/O telemetry, parse failures carry diagnostic context, and a propagating
trace_idlets you reconstruct a singleremember()call across the synchronous pipeline.Changes
llm_providers/ollama_provider.pymemory_evolver.pyevolution_parse_retry/_failedcarryraw_preview,raw_chars,prompt_preview,new_note_id.fact_extractor.pyempty_completionvsjson_decode.entity_indexer.pyparse_failedgets reason classification +raw_chars.memory_manager.pytrace_idbound tostructlog.contextvarsatremember()entry, auto-propagates to all downstream synchronous log lines. Background workers do not yet rebind — tracked in RFC-008.log.pyVerification
OllamaProvider,MemoryEvolver,FactExtractor,EntityIndexer,MemoryManager)llm_call_empty_responsewith full payload includingtrace_id,domain,provider,duration_ms,prompt_eval_count,done_reason,prompt_preview,response_preview✅Out of Scope
trace_idrebinding (sync path only)All deferred to RFC-008 (in parent repo:
rfc/rfc-008-zettelforge-observability-tier3.md).Test Plan
zettelforge.logtrace_idand confirm it links theremember()entry to its child events (fact extraction, entity indexing, evolution)connect_tcp.startedlines)llm_call_empty_responseevents include actionableprompt_previewfor the failing schemas (causal_triples,ner_output, evolution)🤖 Generated with Claude Code