Skip to content

Observation Events

synthaicode edited this page Nov 1, 2025 · 6 revisions

Purpose: Summarize the key points of Observation. It includes Hub functionality and is used for lag monitoring and resource allocation decisions, as well as controlling monitoring frequency at startup.

1-Second Summary

  • Observation uses *_1s_rows (row stream) / *_1s_rows_last (latest TABLE) and the live *_live TABLE.
  • Prior DDL preparation is generally unnecessary (missing objects are auto-created with create-if-missing).

Observation Components

  • 1-second row stream (*_1s_rows): Source rows per second; the smallest observation granularity.
  • rows_last (*_1s_rows_last): TABLE that keeps the most recent record. Used by pull queries to check latest/existence.
  • Live (*_live TABLE): Finalized rows after continuation fill. Applications primarily subscribe here.

Representative code (monitoring 1m Live)

  • DSL example: Use Tumbling POCO definitions, ForEachAsync, WaitForEntityReadyAsync, etc.

When to use rows_last

  • Anchor for initial fill after startup (latest bucket/existence check).
  • Determine if data already exists to avoid duplicate generation (idempotent).
  • Confirm a single row can be fetched via a pull query before switching to live subscription.

Start/Stop Control for Row Monitoring and Boot-time Event Interface

  • Creation: KsqlContext.SchemaRegistration prepares *_1s_rows and *_1s_rows_last with create-if-missing.
  • Start/Stop: IRowMonitorCoordinator orchestrates the monitoring lifecycle (StartForResults(results, ct) / StopAsync()).
  • Boot-time event interface: The OSS introduces RowMonitorCoordinator to start monitoring derived entities in bulk at startup.
    • Replacement: Inject a custom coordinator via KsqlContextBuilder.WithRowMonitorCoordinator(...).
    • Subscribe to monitoring events via RowMonitorEvent (internal interface) to understand lag/backlog and graceful delay.
    • References: diff_ksqlcontext_refactor_20251020.md / diff_ksqlcontext_refactor_20251021.md
    • Event list and practical examples: see “Runtime Events” in Operations-Startup-and-Monitoring.

Hub Inclusion and Lag Observation

  • Observation includes Hub functionality (bridging row streams). Monitor consumer lag and leverage it for decisions on CPU/memory/replica allocation.
  • Defensive implementation: Detect and drop old events (time-series inversion) on the hub side (see diff_hub_monotonic_guard_20251019.md).

Monitoring Streamiz/Kafka

  • Streamiz (Kafka Streams.NET)
    • Monitor thread/task state, rebalances, and unhandled exceptions, reflecting them in logs/metrics.
    • Define recovery policies (retry/restart) in the operations runbook.
  • Kafka
    • Periodically observe consumer group lag, partition skew, broker reachability/error rate.
    • Ensure topic partition/replication settings match expectations (harmonize via startup Ensure).

Related

Clone this wiki locally