feat(observability): Tier 0/1/2 telemetry patches by rolandpg · Pull Request #95 · rolandpg/zettelforge

rolandpg · 2026-04-25T00:45:11Z

Summary

Surfaces blind spots discovered during the 2026-04-24 Nemotron-3-nano debug session — 7,892 notes ingested, 651 evolution_parse_failed events, 306 empty completions, and we could not see what the model was actually returning.

After this PR, every LLM call logs full I/O telemetry, parse failures carry diagnostic context, and a propagating trace_id lets you reconstruct a single remember() call across the synchronous pipeline.

Changes

File	Change
`llm_providers/ollama_provider.py`	Every call logs duration, prompt/response chars, eval_count, prompt_eval_count, done_reason. Empty completions WARN with full preview. Exceptions classified before re-raise.
`memory_evolver.py`	`evolution_parse_retry`/`_failed` carry `raw_preview`, `raw_chars`, `prompt_preview`, `new_note_id`.
`fact_extractor.py`	Empty completions logged (was silent). Failures classified: `empty_completion` vs `json_decode`.
`entity_indexer.py`	NER `parse_failed` gets reason classification + `raw_chars`.
`memory_manager.py`	`trace_id` bound to `structlog.contextvars` at `remember()` entry, auto-propagates to all downstream synchronous log lines. Background workers do not yet rebind — tracked in RFC-008.
`log.py`	Suppress httpcore/httpx DEBUG (1,612 lines of noise per 17-min test run).

Verification

AST parse: clean on all 6 files
Import smoke test: clean (OllamaProvider, MemoryEvolver, FactExtractor, EntityIndexer, MemoryManager)
Functional smoke test: synthetic empty-response call emits llm_call_empty_response with full payload including trace_id, domain, provider, duration_ms, prompt_eval_count, done_reason, prompt_preview, response_preview ✅

Out of Scope

Background-worker trace_id rebinding (sync path only)
SQLite/LanceDB/embedding instrumentation
Enrichment-queue heartbeat (910/878/836 saturation events seen in test, no metrics)
Lifecycle events (start/stop config dump)

All deferred to RFC-008 (in parent repo: rfc/rfc-008-zettelforge-observability-tier3.md).

Test Plan

Re-run Nemotron-3-nano short ingestion (~10 min) and verify the new event names appear in zettelforge.log
Grep one trace_id and confirm it links the remember() entry to its child events (fact extraction, entity indexing, evolution)
Confirm httpcore noise is gone (no connect_tcp.started lines)
Spot-check that llm_call_empty_response events include actionable prompt_preview for the failing schemas (causal_triples, ner_output, evolution)

🤖 Generated with Claude Code

Surfaces blind spots discovered during 2026-04-24 Nemotron-3-nano debug session. ZettelForge had heavy logging at the LLM call layer but parse failures and empty completions were largely opaque - we could see WHICH calls failed but never WHAT the model returned. Tier 1: ollama_provider.py - Every Ollama call logs duration_ms, prompt_chars, response_chars, eval_count, prompt_eval_count, done_reason, json_mode, model - Empty completions promoted to WARNING with full prompt+response preview (was previously silent at this layer, then silently swallowed downstream) - Exceptions logged with classified error type before re-raise Tier 2a: memory_evolver.py - evolution_parse_retry and evolution_parse_failed now carry raw_preview, raw_chars, prompt_preview, new_note_id - was logging only neighbor_id, leaving the dominant pipeline failure mode unanalyzable Tier 2b: fact_extractor.py - Empty completions now logged (was a silent return, hiding LLM failures from the audit trail entirely) - parse_failed gets reason classification: empty_completion vs json_decode Tier 2c: entity_indexer.py - ner_output parse_failed gets reason classification + raw_chars Tier 2d: memory_manager.py - request_id (already generated for OCSF audit) now bound to structlog.contextvars at remember() entry, propagating to every downstream log line as trace_id automatically. Cleared in finally block so it doesn't leak across calls. Background workers (evolution, NER, causal) do NOT yet rebind from job - tracked in RFC-008. log.py: - Suppress httpcore/httpx DEBUG noise. Single 17-min test run produced 1,612 connect_tcp/send_request_headers debug lines for zero diagnostic value. Verification: Synthetic empty-response smoke test produces full payload including propagated trace_id, domain, provider, duration_ms, prompt_eval_count, done_reason, prompt_preview, response_preview. RFC-008 (in parent repo) covers Tier 3: SQLite/LanceDB/embedding instrumentation, enrichment-queue heartbeat, background-worker trace_id rebinding, lifecycle events. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds Tier 0/1/2 observability across the synchronous remember() pipeline so LLM calls and downstream parsing failures can be correlated and diagnosed via structured telemetry (including a propagated trace_id).

Changes:

Bind trace_id (plus domain/source_type) into structlog.contextvars at MemoryManager.remember() entry and clear on exit.
Add richer LLM-call and parse-failure diagnostics (previews, char counts, reason classification) across evolver, fact extraction, and NER parsing.
Reduce log noise by raising httpcore/httpx logger levels and attempt to bootstrap log level from config.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/zettelforge/memory_manager.py	Binds/unbinds `trace_id` contextvars around `remember()` via a new `_remember_inner()` wrapper.
src/zettelforge/memory_evolver.py	Adds retry/failure telemetry payloads (raw/prompt previews, char counts, note IDs).
src/zettelforge/log.py	Suppresses noisy HTTP transport debug logs; attempts to set logging level from config during logger bootstrap.
src/zettelforge/llm_providers/ollama_provider.py	Logs per-call latency/metadata and escalates empty completions/exceptions with diagnostic fields.
src/zettelforge/fact_extractor.py	Classifies `parse_failed` reasons (empty vs JSON decode) and adds raw size context.
src/zettelforge/entity_indexer.py	Improves NER `parse_failed` telemetry with reason classification and raw size context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca8c62f5f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T00:49:08Z

+            structlog.contextvars.unbind_contextvars(
+                "trace_id", "domain", "source_type"


Restore parent trace context after nested remember calls

This unconditional unbind_contextvars(...) drops any previously-bound context when remember() is called reentrantly, which happens in the evolve flow (remember_with_extraction() via MemoryUpdater.apply() calls self.mm.remember(...)). In evolve=True requests that produce ADD/UPDATE/DELETE operations, the inner call removes the outer trace_id/domain/source_type, so later log lines from the parent pipeline can no longer be correlated to the original request. Use context restoration (bound_contextvars or token reset) instead of unbinding fixed keys.

Useful? React with 👍 / 👎.

@copilot Make theses fixes

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Patrick Roland <48327651+rolandpg@users.noreply.github.com>

Agent-Logs-Url: https://github.com/rolandpg/zettelforge/sessions/36e27378-e4ad-4f8a-a987-ad95563e19a0 Co-authored-by: rolandpg <48327651+rolandpg@users.noreply.github.com>

Copilot

Pull request overview

Adds structured, end-to-end observability for synchronous remember() execution and LLM interactions, enabling correlation via trace_id and providing richer diagnostics for parse/empty-completion failures discovered during large ingestions.

Changes:

Bind a per-remember() trace_id into structlog.contextvars and unbind on exit to correlate downstream synchronous logs.
Add richer parse-failure telemetry (raw/prompt previews, char counts, reason classification) in evolution, fact extraction, and NER parsing.
Add Ollama call telemetry (timings, token/eval metadata, empty-completion surfacing) and suppress noisy httpcore/httpx debug logs; attempt to set logging level from config.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/zettelforge/memory_manager.py	Wraps `remember()` with contextvar binding for `trace_id` propagation into downstream logs.
src/zettelforge/memory_evolver.py	Adds diagnostic context to evolution parse retry/failure logs.
src/zettelforge/log.py	Suppresses HTTP transport debug noise; attempts to derive logging level from config during lazy logger init.
src/zettelforge/llm_providers/ollama_provider.py	Adds per-call telemetry, duration metrics, and empty-response surfacing for Ollama calls.
src/zettelforge/fact_extractor.py	Logs and classifies empty completions vs JSON decode failures for fact extraction parsing.
src/zettelforge/entity_indexer.py	Classifies NER parse failures and includes raw size/preview diagnostics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        # Load config to get logging level (RFC-007 telemetry support)
+        try:
+            from zettelforge.config import get_config
+            cfg = get_config()
+            log_level = cfg.logging.level if hasattr(cfg, 'logging') else "INFO"
+        except Exception:
+            log_level = "INFO"
+
+        configure_logging(level=log_level, log_file=log_file, audit_log_file=audit_log_file)


+        # Load config to get logging level (RFC-007 telemetry support)
+        try:
+            from zettelforge.config import get_config
+            cfg = get_config()
+            log_level = cfg.logging.level if hasattr(cfg, 'logging') else "INFO"
+        except Exception:
+            log_level = "INFO"
+
+        configure_logging(level=log_level, log_file=log_file, audit_log_file=audit_log_file)


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Patrick Roland <48327651+rolandpg@users.noreply.github.com>

CI lint job 72977891617 failed on ruff format --check. Diff is pure cosmetic — quote style normalization, single-line wrap of unbind args, blank line after import. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 25, 2026 00:45

Copilot started reviewing on behalf of rolandpg April 25, 2026 00:45 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Comment thread src/zettelforge/log.py Outdated

Comment thread src/zettelforge/log.py Outdated

Comment thread src/zettelforge/log.py Outdated

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

Potential fix for pull request finding

44112f1

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Patrick Roland <48327651+rolandpg@users.noreply.github.com>

Copilot started work on behalf of rolandpg April 25, 2026 01:02 View session

fix: remove trailing whitespace on blank line in log.py (W293)

423dee3

Agent-Logs-Url: https://github.com/rolandpg/zettelforge/sessions/36e27378-e4ad-4f8a-a987-ad95563e19a0 Co-authored-by: rolandpg <48327651+rolandpg@users.noreply.github.com>

Copilot finished work on behalf of rolandpg April 25, 2026 01:03

Copilot started work on behalf of rolandpg April 25, 2026 01:03 View session

Copilot finished work on behalf of rolandpg April 25, 2026 01:04

rolandpg requested a review from Copilot April 25, 2026 01:06

Copilot started reviewing on behalf of rolandpg April 25, 2026 01:07 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

rolandpg and others added 2 commits April 24, 2026 20:33

Potential fix for pull request finding

dc01525

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Patrick Roland <48327651+rolandpg@users.noreply.github.com>

rolandpg merged commit a7e91a6 into master Apr 25, 2026
11 checks passed

rolandpg deleted the feat/observability-tier-0-1-2 branch April 25, 2026 01:45

rolandpg mentioned this pull request Apr 25, 2026

docs: v2.4.3 CHANGELOG accuracy + RFC-009 Phase 1.5 spec #98

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): Tier 0/1/2 telemetry patches#95

feat(observability): Tier 0/1/2 telemetry patches#95
rolandpg merged 5 commits into
masterfrom
feat/observability-tier-0-1-2

rolandpg commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

rolandpg Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		structlog.contextvars.unbind_contextvars(
		"trace_id", "domain", "source_type"

Conversation

rolandpg commented Apr 25, 2026

Summary

Changes

Verification

Out of Scope

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

rolandpg Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants