Skip to content

v0.5.21

Choose a tag to compare

@github-actions github-actions released this 04 May 17:37
· 583 commits to main since this release

What's new in v0.5.21

v0.5.21 instruments the runtime ack endpoints introduced in v0.5.20 with two Prometheus counters on /metrics, so operators running the daemon ack workflow get trend lines on operator-driven activity and a structured signal for failure modes (auth misconfiguration, store saturation, oversized payloads). No HTTP-shape change ships: POST / DELETE / GET /api/findings/{signature}/ack and GET /api/acks keep their v0.5.20 status codes and JSON shapes byte-for-byte.

The two new counters are perf_sentinel_ack_operations_total{action="ack"|"unack"} for successful operations and perf_sentinel_ack_operations_failed_total{action,reason} for failures. The reason label covers nine documented values, with file_too_large (per-daemon JSONL saturation) and entry_too_large (per-request oversized by or reason payload) intentionally separated so an operator dashboard can dispatch the two failure modes to different runbooks. Pre-warming covers fifteen reachable (action, reason) combinations at startup (two success series, thirteen failure series), so dashboards can build with rate() queries without absent() guards. Impossible combinations such as action="ack",reason="not_acked" are left out, the alternative would publish series that can never grow and mislead operators.

The release also ships a small internal refactor: check_ack_preconditions factors the auth-then-store-presence guard shared by handle_ack and handle_unack, register_int_counter_vec deduplicates the create-clone-register boilerplate across the three IntCounterVec registration sites, and #[inline] hints land on the counter-bumping helpers (including the v0.5.19 record_otlp_reject for consistency). Counter-bumping stays branchless on the success path (cached IntCounter children, single relaxed atomic add per call), the failure path takes the label-hashmap lookup since failures are by definition rare. The release-binary size target is relaxed from < 10 MB to < 15 MB to account for the musl Linux statically-linked binary with mimalloc reaching 10.1 MB after recent additions. lto = "thin", strip = true, and panic = "abort" remain unchanged.

Helm chart 0.2.24 ships in lockstep, bumping the default daemon image tag to ghcr.io/robintra/perf-sentinel:0.5.21.

Added

  • Two Prometheus counters on the daemon /metrics endpoint:
    • perf_sentinel_ack_operations_total{action} for successful ack and unack operations. Cached IntCounter children at struct level, branchless match on the two handles, single relaxed atomic add per call.
    • perf_sentinel_ack_operations_failed_total{action,reason} for failures. The reason label covers already_acked, not_acked, unauthorized, no_store, invalid_signature, limit_reached, file_too_large, entry_too_large, internal_error. Failures are rare so no hot-path child cache, the lookup goes through prometheus's label hashmap per call (still O(1) on the read lock).
  • Pre-warmed series for the fifteen reachable (action, reason) combinations at startup: two success (action=ack, action=unack) plus thirteen failure (eight reasons on action=ack, five on action=unack). Impossible combinations are intentionally not pre-warmed so absent() guards become unnecessary in rate() queries.
  • AckFailureReason::EntryTooLarge as a distinct variant from FileTooLarge. entry_too_large flags per-request misuse (the caller-supplied by or reason field exceeds the 4 KiB per-record cap), file_too_large flags per-daemon saturation (the next append would push the JSONL above the 64 MiB cap and a restart-time compaction is needed). Both still return HTTP 507 Insufficient Storage, the operator-visible HTTP status is unchanged.
  • AckAction::as_str and AckFailureReason::as_str helpers returning &'static str for stable Prometheus label strings, mirroring the v0.5.19 OtlpRejectReason::as_str pattern.
  • #[inline] on counter-bumping methods: record_ack_success, record_ack_failure, and the v0.5.19 record_otlp_reject for consistency. The compiler likely inlines them already at opt-level = 3, the explicit annotation matches the project inlining policy on critical helpers and pre-empts future regressions.
  • register_int_counter_vec helper in crates/sentinel-core/src/report/metrics.rs factors the create-clone-register pattern across the three IntCounterVec registration sites (otlp_rejected_total, ack_operations_total, ack_operations_failed_total).
  • check_ack_preconditions helper in crates/sentinel-core/src/daemon/query_api.rs factors the auth-then-store-presence guard shared by handle_ack and handle_unack. Records the matching AckFailureReason (Unauthorized or NoStore) before returning, so every error exit stays observable in /metrics.
  • Six unit tests in crates/sentinel-core/src/report/metrics.rs: as_str round-trip across all variants, success-path increments per action, failure-path increments per (action, reason), pre-warmed-zero contract on both counters, impossible-combinations-not-pre-warmed contract, and the /metrics rendered-output contract.
  • Three integration tests in crates/sentinel-core/src/daemon/query_api.rs: a no-store failure increments reason="no_store", a TOML conflict bumps the same reason="already_acked" series as a daemon-side double-ack, and a malformed signature increments reason="invalid_signature". The four pre-existing ack tests gain counter assertions on the success and unauthorized paths.
  • docs/METRICS.md and docs/FR/METRICS-FR.md: new "Ack metrics (since 0.5.21)" section with the label table, the per-reason HTTP-status mapping, the pre-warming contract, and three sample PromQL queries (trend rate, per-reason failure rate, alert-worthy combinations).
  • docs/HELM-DEPLOYMENT.md and docs/FR/HELM-DEPLOYMENT-FR.md: new "Daemon ack runtime store" subsection covering the four operator decisions when running the ack store under Kubernetes (api_key when bound non-loopback, persistence path remap to a PVC, securityContext mode-floor caveat, TOML ConfigMap mount). The existing ### StatefulSet block is repurposed from "reserved for future use" to the live ack-persistence guidance. A new ServiceMonitor warning paragraph notes the v0.5.20 default-filter behavior change for dashboards that scrape /api/findings.

Changed

  • AckError::FileTooLarge and AckError::EntryTooLarge no longer fold into the same metric label. The handler match arms in handle_ack map them to distinct reason="file_too_large" and reason="entry_too_large" series. The HTTP error message text also differentiates them ("ack file size cap reached" vs "ack entry size cap reached"). The HTTP status (507 Insufficient Storage) stays the same on both.
  • Binary size target relaxed from < 10 MB to < 15 MB in docs/LIMITATIONS.md, docs/design/02-NORMALIZATION.md, and the FR mirrors. The musl Linux statically-linked binary with mimalloc currently sits at 10.1 MB and the previous target was tight enough that small additions (the new counters, the ack store, the v0.5.20 query API surface) would have pushed it over.
  • Helm chart 0.2.23 to 0.2.24, appVersion 0.5.20 to 0.5.21, default daemon image tag points at ghcr.io/robintra/perf-sentinel:0.5.21. The artifacthub.io/images annotation is updated in lockstep.

Behavior

  • No HTTP-shape change. The three ack endpoints (POST / DELETE /api/findings/{signature}/ack and GET /api/acks) plus GET /api/findings, GET /api/findings/{trace_id}, GET /api/explain/{trace_id}, GET /api/correlations, GET /api/status, GET /api/export/report keep their v0.5.20 JSON shapes byte-for-byte. The only HTTP-visible delta is the error-message text on the two storage-cap failures, where "ack file size cap reached" (which previously covered both FileTooLarge and EntryTooLarge) splits into two distinct strings. Clients matching on the status code (507) are unaffected.
  • /metrics endpoint authentication is unchanged. Default --listen-address stays 127.0.0.1. Operators who bind the metrics endpoint to a non-loopback address for cluster-wide scraping should keep the NetworkPolicy plus Prometheus-side mTLS posture from v0.5.19. The new counters carry no PII, no signature labels, no by field. Only the bounded action and reason enum strings.
  • Auth-presence inference via the unauthorized series. perf_sentinel_ack_operations_failed_total{reason="unauthorized"} is pre-warmed to zero unconditionally at startup, but only ever increments when [daemon.ack] api_key is set and a request fails auth. A non-zero value therefore confirms api_key is configured. Documented in docs/METRICS.md. Mitigated by the loopback-by-default posture and the Prometheus-side network-policy guidance.
  • Constant-time X-API-Key comparison preserved. The check_ack_auth body is unchanged, the subtle::ConstantTimeEq::ct_eq call still gates auth. The check_ack_preconditions refactor extracts the call but keeps the comparison itself untouched, the counter increment fires strictly after the comparison returns its result, no new timing side channel is introduced.
  • Counter integrity under authenticated abuse. A holder of the api_key could trigger record_ack_failure arbitrarily and skew dashboards. Pre-existing risk, not a v0.5.21 regression: any holder can already write or revoke acks at will.

Documentation

  • New "Ack metrics (since 0.5.21)" section in docs/METRICS.md and docs/FR/METRICS-FR.md with the full label set, the reason-to-HTTP-status mapping including the entry_too_large distinction, the pre-warming contract, three sample PromQL queries, and a paragraph on the auth-presence inference signal.
  • New "Daemon ack runtime store" subsection in docs/HELM-DEPLOYMENT.md and docs/FR/HELM-DEPLOYMENT-FR.md covering the four operator decisions for running the ack store under Kubernetes, plus a ServiceMonitor warning on the v0.5.20 default-filter behavior of /api/findings.
  • Binary size target relaxed from < 10 MB to < 15 MB in docs/LIMITATIONS.md, docs/design/02-NORMALIZATION.md, and the FR mirrors.

Install

Prebuilt binaries (Linux amd64 / arm64, macOS arm64, Windows amd64):

curl -LO https://github.com/robintra/perf-sentinel/releases/download/v0.5.21/perf-sentinel-linux-amd64
chmod +x perf-sentinel-linux-amd64
sudo mv perf-sentinel-linux-amd64 /usr/local/bin/perf-sentinel

Linux binaries are statically linked against musl and run on any distribution (Alpine, Debian, RHEL, Ubuntu any version) regardless of glibc version, and inside FROM scratch images.

From crates.io:

cargo install perf-sentinel --version 0.5.21

Docker:

docker run --rm -p 4317:4317 -p 4318:4318 \
  ghcr.io/robintra/perf-sentinel:0.5.21 watch --listen-address 0.0.0.0

Helm chart 0.2.24 ships alongside, see the matching chart-v0.2.24 release for the chart-side details.

Full Changelog: v0.5.20...v0.5.21