Skip to content

feat: Observability — OutboxProcessorListener + okapi-micrometer module (KOJAK-44)#27

Merged
endrju19 merged 12 commits intomainfrom
observability
Apr 24, 2026
Merged

feat: Observability — OutboxProcessorListener + okapi-micrometer module (KOJAK-44)#27
endrju19 merged 12 commits intomainfrom
observability

Conversation

@endrju19
Copy link
Copy Markdown
Collaborator

Summary

  • Sealed event hierarchy (OutboxProcessingEvent: Delivered, RetryScheduled, Failed) in okapi-core — enables exhaustive when in Kotlin, compiler warns on missing handlers
  • OutboxProcessorListener interface with default no-op methods — callbacks for per-entry and per-batch processing events
  • OutboxProcessor accepts optional listener + clock — notifies with try-catch isolation (listener exceptions never break processing)
  • New okapi-micrometer moduleMicrometerOutboxListener (counters + timer) and MicrometerOutboxMetrics (gauges polling OutboxStore with TransactionRunner + NaN on failure)
  • OkapiMicrometerAutoConfiguration — top-level Spring Boot autoconfiguration, auto-detects MeterRegistry, wires read-only TransactionRunner for gauge queries
  • README updated with Observability section and module table

Metrics

Metric Type Source
okapi.entries.delivered Counter Listener event
okapi.entries.retry_scheduled Counter Listener event
okapi.entries.failed Counter Listener event
okapi.batch.duration Timer Listener event
okapi.entries.count Gauge (tag: status) DB poll
okapi.entries.lag.seconds Gauge (tag: status) DB poll

Design decisions

  • Sealed events over separate callbacks — new event types cause compiler warnings, not silent misses
  • java.time.Duration over Long millis — type-safe, Micrometer Timer.record(Duration) native
  • Per-entry duration excludes DB write — measures delivery time only, not store.updateAfterProcessing()
  • Top-level OkapiMicrometerAutoConfiguration — inner @Configuration classes don't reliably see @ConditionalOnBean from other autoconfigs in Spring Boot 4
  • RetryScheduled name (not Retried) — semantically correct even on first attempt ("attempt failed, scheduled for retry")

Test plan

  • OutboxProcessorTest — listener events (Delivered, RetryScheduled, Failed), exception isolation, null listener, retry exhaustion → Failed
  • MicrometerOutboxListenerTest — counters per event type, batch timer
  • MicrometerOutboxMetricsTest — gauges per status, lag calculation, TransactionRunner wrapping, store exception → NaN
  • OutboxProcessorAutoConfigurationTest — listener autowired when MeterRegistry present
  • ObservabilityEndToEndTest — full pipeline on live Postgres + WireMock (Testcontainers)
  • Verified on standalone demo app with Spring Boot Actuator + Prometheus endpoint

endrju19 added 11 commits April 15, 2026 15:34
Add Micrometer to version catalog, register okapi-micrometer module in
settings and BOM. Module depends on okapi-core with micrometer-core as
compileOnly.
OutboxProcessor accepts an optional listener and clock. After each entry
is processed, it emits a sealed OutboxProcessingEvent (Delivered, Retried,
Failed) with per-entry Duration. After the batch, it calls onBatchProcessed.
Exceptions in the listener are caught and logged — they never break processing.
Implements OutboxProcessorListener with Micrometer counters for
delivered/retried/failed entries and a timer for batch duration.
Registers count-per-status and lag-per-status gauges that poll
OutboxStore on each Prometheus scrape. Gauge suppliers are wrapped
in an optional TransactionRunner (required for Exposed-backed stores)
with try-catch returning NaN on failure.
…K-44)

Add MicrometerConfiguration inner class that creates MicrometerOutboxListener
and MicrometerOutboxMetrics beans when MeterRegistry is on the classpath.
OutboxProcessor bean now accepts an optional OutboxProcessorListener.
"Retried" (past tense) implied the retry already happened, but the
event is emitted when a failed delivery attempt is rescheduled for
another try — even on the very first attempt. "RetryScheduled" is
semantically accurate regardless of the attempt number.

Renamed across: sealed event, OutboxProcessor mapping, MicrometerOutboxListener
counter (okapi.entries.retried → okapi.entries.retry_scheduled), and all tests.
…y (KOJAK-44)

outboxProcessor bean now injects ObjectProvider<Clock>, consistent with
all other beans in OutboxAutoConfiguration. Previously it silently fell
back to Clock.systemUTC() even when a custom Clock bean was present.

Per-entry duration now captures only the delivery attempt time
(entryProcessor.process), excluding store.updateAfterProcessing().
This prevents DB write latency from inflating delivery metrics.
…eMock (KOJAK-44)

Verifies the full observability pipeline against real infrastructure:
- Retry-then-succeed: RetryScheduled counter + Delivered counter + gauges
- Permanent failure: Failed counter + gauge reflects FAILED status
- Batch duration: timer records realistic HTTP delivery time (50ms stub)
- Lag gauge: reflects real time difference for pending entries in Postgres
…el (KOJAK-44)

Inner @configuration classes inside @autoConfiguration do not reliably
see beans from other autoconfigurations via @ConditionalOnBean. This
caused MicrometerConfiguration to never activate because MeterRegistry
was not yet available when the condition was evaluated.

Fix: extract to a separate top-level @autoConfiguration with its own
@AutoConfigureAfter targeting the correct Spring Boot 4 package
(org.springframework.boot.micrometer.metrics.autoconfigure).
…n (KOJAK-44)

Add Observability section with metrics table and quick-start snippet.
Update module diagram and table to include okapi-micrometer.
Rename okapi.entries.retry_scheduled to okapi.entries.retry.scheduled
(dots-only follows Micrometer naming convention). Clarify README
observability section, add tag names to gauge descriptions, document
duration excludes DB write, single-listener note, autoconfig override.
…ics (KOJAK-44)

Addresses PR #27 review comments by ramafasa: gauges previously called
countByStatuses() and findOldestCreatedAt(setOf(status)) once per status,
producing N queries per scrape with inconsistent snapshots between status tags.

Switch from supplier-per-status (pull) to MultiGauge + push refresh, the
canonical Micrometer pattern for DB-backed gauges (per Micrometer docs):

  - MicrometerOutboxMetrics now exposes refresh() which performs a single
    transaction containing both store queries and atomically registers all
    status rows on each MultiGauge — one query per metric per refresh,
    snapshot-consistent across status tags.
  - OutboxMetricsRefresher (new, framework-agnostic): single-thread daemon
    scheduler for non-Spring users (Ktor, plain JVM). Wraps refresh().
  - okapi-spring-boot autoconfig wires a refresher bean with start/close
    lifecycle; refresh interval configurable via okapi.metrics.refresh-interval
    (Duration, default PT15S). No @EnableScheduling required.

okapi-core untouched. okapi-micrometer has zero Spring dependencies.

Multi-instance behaviour documented in README: each instance publishes
identical gauge values (shared DB state); aggregate with max by (status)
in PromQL, not sum. Polling cost: 2 queries per refresh-interval per pod.
@endrju19 endrju19 merged commit 474c6c8 into main Apr 24, 2026
8 checks passed
@endrju19 endrju19 deleted the observability branch April 24, 2026 06:09
endrju19 added a commit that referenced this pull request Apr 24, 2026
…terval (#29)

## Summary

Follow-up to #27. Two fixes discovered while testing the merged
observability changes against a standalone Spring Boot demo app.

### 1. `okapi-micrometer` was not being published

The module's `build.gradle.kts` was missing
`id(\"buildsrc.convention.publish\")`. Without this, the module compiles
and ships in source but is **not published to Maven Central**, so
downstream users declaring
`com.softwaremill.okapi:okapi-micrometer:0.1.0` would get an
unresolvable dependency.

Verified by reproducing the issue with `./gradlew publishToMavenLocal`:
before the fix, every other module appeared in
`~/.m2/repository/com/softwaremill/okapi/` except `okapi-micrometer`.
After the fix, all modules publish.

### 2. `okapi.metrics.refresh-interval` lacked IDE autocomplete metadata

The new property was not registered in
`spring-configuration-metadata.json`, so users in IntelliJ / VS Code
wouldn't get autocomplete or hover docs in `application.yml`. Same
pattern as existing `okapi.processor.*` and `okapi.purger.*` entries.

Also added KDoc on `OkapiMetricsProperties` and a Configuration table to
the README Observability section.

## Test plan

- [x] `./gradlew publishToMavenLocal -PskipSigning=true` produces
`okapi-micrometer-0.1.0.jar` containing both `MicrometerOutboxMetrics`
and `OutboxMetricsRefresher`
- [x] Demo app at consumer-side (Spring Boot + Postgres + Prometheus
actuator) successfully imports `okapi-micrometer:0.1.0` and renders all
expected metrics on `/actuator/prometheus`:
- Counters: `okapi_entries_delivered_total`,
`okapi_entries_retry_scheduled_total`, `okapi_entries_failed_total`
    - Timer: `okapi_batch_duration_seconds_*`
- MultiGauge: `okapi_entries_count{status=...}`,
`okapi_entries_lag_seconds{status=...}` (3 status rows each, one query
per refresh)
- [x] `./gradlew ktlintCheck` clean

## Notes

The publish-plugin omission would surface at the next Maven Central
release of okapi (i.e. when `0.1.0` artifacts are pushed). Worth
catching before then.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants