Skip to content

feat(observability): Tier 0/1/2 telemetry patches#95

Merged
rolandpg merged 5 commits into
masterfrom
feat/observability-tier-0-1-2
Apr 25, 2026
Merged

feat(observability): Tier 0/1/2 telemetry patches#95
rolandpg merged 5 commits into
masterfrom
feat/observability-tier-0-1-2

Conversation

@rolandpg
Copy link
Copy Markdown
Owner

Summary

Surfaces blind spots discovered during the 2026-04-24 Nemotron-3-nano debug session — 7,892 notes ingested, 651 evolution_parse_failed events, 306 empty completions, and we could not see what the model was actually returning.

After this PR, every LLM call logs full I/O telemetry, parse failures carry diagnostic context, and a propagating trace_id lets you reconstruct a single remember() call across the synchronous pipeline.

Changes

File Change
llm_providers/ollama_provider.py Every call logs duration, prompt/response chars, eval_count, prompt_eval_count, done_reason. Empty completions WARN with full preview. Exceptions classified before re-raise.
memory_evolver.py evolution_parse_retry/_failed carry raw_preview, raw_chars, prompt_preview, new_note_id.
fact_extractor.py Empty completions logged (was silent). Failures classified: empty_completion vs json_decode.
entity_indexer.py NER parse_failed gets reason classification + raw_chars.
memory_manager.py trace_id bound to structlog.contextvars at remember() entry, auto-propagates to all downstream synchronous log lines. Background workers do not yet rebind — tracked in RFC-008.
log.py Suppress httpcore/httpx DEBUG (1,612 lines of noise per 17-min test run).

Verification

  • AST parse: clean on all 6 files
  • Import smoke test: clean (OllamaProvider, MemoryEvolver, FactExtractor, EntityIndexer, MemoryManager)
  • Functional smoke test: synthetic empty-response call emits llm_call_empty_response with full payload including trace_id, domain, provider, duration_ms, prompt_eval_count, done_reason, prompt_preview, response_preview

Out of Scope

  • Background-worker trace_id rebinding (sync path only)
  • SQLite/LanceDB/embedding instrumentation
  • Enrichment-queue heartbeat (910/878/836 saturation events seen in test, no metrics)
  • Lifecycle events (start/stop config dump)

All deferred to RFC-008 (in parent repo: rfc/rfc-008-zettelforge-observability-tier3.md).

Test Plan

  • Re-run Nemotron-3-nano short ingestion (~10 min) and verify the new event names appear in zettelforge.log
  • Grep one trace_id and confirm it links the remember() entry to its child events (fact extraction, entity indexing, evolution)
  • Confirm httpcore noise is gone (no connect_tcp.started lines)
  • Spot-check that llm_call_empty_response events include actionable prompt_preview for the failing schemas (causal_triples, ner_output, evolution)

🤖 Generated with Claude Code

Surfaces blind spots discovered during 2026-04-24 Nemotron-3-nano debug
session. ZettelForge had heavy logging at the LLM call layer but parse
failures and empty completions were largely opaque - we could see WHICH
calls failed but never WHAT the model returned.

Tier 1: ollama_provider.py
  - Every Ollama call logs duration_ms, prompt_chars, response_chars,
    eval_count, prompt_eval_count, done_reason, json_mode, model
  - Empty completions promoted to WARNING with full prompt+response
    preview (was previously silent at this layer, then silently swallowed
    downstream)
  - Exceptions logged with classified error type before re-raise

Tier 2a: memory_evolver.py
  - evolution_parse_retry and evolution_parse_failed now carry
    raw_preview, raw_chars, prompt_preview, new_note_id - was logging
    only neighbor_id, leaving the dominant pipeline failure mode
    unanalyzable

Tier 2b: fact_extractor.py
  - Empty completions now logged (was a silent return, hiding LLM failures
    from the audit trail entirely)
  - parse_failed gets reason classification: empty_completion vs
    json_decode

Tier 2c: entity_indexer.py
  - ner_output parse_failed gets reason classification + raw_chars

Tier 2d: memory_manager.py
  - request_id (already generated for OCSF audit) now bound to
    structlog.contextvars at remember() entry, propagating to every
    downstream log line as trace_id automatically. Cleared in finally
    block so it doesn't leak across calls. Background workers (evolution,
    NER, causal) do NOT yet rebind from job - tracked in RFC-008.

log.py:
  - Suppress httpcore/httpx DEBUG noise. Single 17-min test run produced
    1,612 connect_tcp/send_request_headers debug lines for zero
    diagnostic value.

Verification: Synthetic empty-response smoke test produces full payload
including propagated trace_id, domain, provider, duration_ms,
prompt_eval_count, done_reason, prompt_preview, response_preview.

RFC-008 (in parent repo) covers Tier 3: SQLite/LanceDB/embedding
instrumentation, enrichment-queue heartbeat, background-worker
trace_id rebinding, lifecycle events.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 25, 2026 00:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Tier 0/1/2 observability across the synchronous remember() pipeline so LLM calls and downstream parsing failures can be correlated and diagnosed via structured telemetry (including a propagated trace_id).

Changes:

  • Bind trace_id (plus domain/source_type) into structlog.contextvars at MemoryManager.remember() entry and clear on exit.
  • Add richer LLM-call and parse-failure diagnostics (previews, char counts, reason classification) across evolver, fact extraction, and NER parsing.
  • Reduce log noise by raising httpcore/httpx logger levels and attempt to bootstrap log level from config.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/zettelforge/memory_manager.py Binds/unbinds trace_id contextvars around remember() via a new _remember_inner() wrapper.
src/zettelforge/memory_evolver.py Adds retry/failure telemetry payloads (raw/prompt previews, char counts, note IDs).
src/zettelforge/log.py Suppresses noisy HTTP transport debug logs; attempts to set logging level from config during logger bootstrap.
src/zettelforge/llm_providers/ollama_provider.py Logs per-call latency/metadata and escalates empty completions/exceptions with diagnostic fields.
src/zettelforge/fact_extractor.py Classifies parse_failed reasons (empty vs JSON decode) and adds raw size context.
src/zettelforge/entity_indexer.py Improves NER parse_failed telemetry with reason classification and raw size context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/zettelforge/log.py Outdated
Comment thread src/zettelforge/log.py Outdated
Comment thread src/zettelforge/log.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca8c62f5f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/zettelforge/memory_manager.py Outdated
Comment on lines +203 to +204
structlog.contextvars.unbind_contextvars(
"trace_id", "domain", "source_type"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore parent trace context after nested remember calls

This unconditional unbind_contextvars(...) drops any previously-bound context when remember() is called reentrantly, which happens in the evolve flow (remember_with_extraction() via MemoryUpdater.apply() calls self.mm.remember(...)). In evolve=True requests that produce ADD/UPDATE/DELETE operations, the inner call removes the outer trace_id/domain/source_type, so later log lines from the parent pipeline can no longer be correlated to the original request. Use context restoration (bound_contextvars or token reset) instead of unbinding fixed keys.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Make theses fixes

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Patrick Roland <48327651+rolandpg@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds structured, end-to-end observability for synchronous remember() execution and LLM interactions, enabling correlation via trace_id and providing richer diagnostics for parse/empty-completion failures discovered during large ingestions.

Changes:

  • Bind a per-remember() trace_id into structlog.contextvars and unbind on exit to correlate downstream synchronous logs.
  • Add richer parse-failure telemetry (raw/prompt previews, char counts, reason classification) in evolution, fact extraction, and NER parsing.
  • Add Ollama call telemetry (timings, token/eval metadata, empty-completion surfacing) and suppress noisy httpcore/httpx debug logs; attempt to set logging level from config.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/zettelforge/memory_manager.py Wraps remember() with contextvar binding for trace_id propagation into downstream logs.
src/zettelforge/memory_evolver.py Adds diagnostic context to evolution parse retry/failure logs.
src/zettelforge/log.py Suppresses HTTP transport debug noise; attempts to derive logging level from config during lazy logger init.
src/zettelforge/llm_providers/ollama_provider.py Adds per-call telemetry, duration metrics, and empty-response surfacing for Ollama calls.
src/zettelforge/fact_extractor.py Logs and classifies empty completions vs JSON decode failures for fact extraction parsing.
src/zettelforge/entity_indexer.py Classifies NER parse failures and includes raw size/preview diagnostics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/zettelforge/log.py
Comment on lines +169 to +177
# Load config to get logging level (RFC-007 telemetry support)
try:
from zettelforge.config import get_config
cfg = get_config()
log_level = cfg.logging.level if hasattr(cfg, 'logging') else "INFO"
except Exception:
log_level = "INFO"

configure_logging(level=log_level, log_file=log_file, audit_log_file=audit_log_file)
Comment thread src/zettelforge/log.py
Comment on lines +169 to +177
# Load config to get logging level (RFC-007 telemetry support)
try:
from zettelforge.config import get_config
cfg = get_config()
log_level = cfg.logging.level if hasattr(cfg, 'logging') else "INFO"
except Exception:
log_level = "INFO"

configure_logging(level=log_level, log_file=log_file, audit_log_file=audit_log_file)
Comment thread src/zettelforge/llm_providers/ollama_provider.py Outdated
Comment thread src/zettelforge/llm_providers/ollama_provider.py Outdated
rolandpg and others added 2 commits April 24, 2026 20:33
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Patrick Roland <48327651+rolandpg@users.noreply.github.com>
CI lint job 72977891617 failed on ruff format --check. Diff is pure
cosmetic — quote style normalization, single-line wrap of unbind args,
blank line after import. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rolandpg rolandpg merged commit a7e91a6 into master Apr 25, 2026
11 checks passed
@rolandpg rolandpg deleted the feat/observability-tier-0-1-2 branch April 25, 2026 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants