Skip to content

v0.10.0 — observability

Choose a tag to compare

@incognick incognick released this 18 Jun 21:53
· 20 commits to main since this release
Immutable release. Only release title and notes can be modified.

The tenth Hamster release: observability — one in-process metrics registry as a node's single source of truth, rendered many ways. A Prometheus scrape endpoint for your existing monitoring, a typed snapshot the CLI (and the coming web console) decode, and a durability summary on cluster status — all gathered once, from the same numbers, so the three views can never disagree.

Dev preview, and read the limits below. Observability is real and proven, but the v0.x limits hold: writes still commit only on the Raft leader, multipart and server-side copy are still not on the cluster path (closing that gap is the next release — see below), reads serve the local replica, and on-disk/on-wire formats may change between v0 releases. Hamster is not assessed or certified for any regulation. Please don't trust real regulated data to it yet.

What's in v0.10

Designed in ADR-0035: collect once, render many. A node has one metrics registry — the single source of truth for its signals — and every surface renders the same gather, so the Prometheus text, the CLI, and cluster status are guaranteed consistent.

Prometheus metrics on the admin port

Start a node (or the single-node serve preview) with -admin <addr> and it serves the Prometheus text exposition at GET /metrics — the same admin surface the web console will later share. The exposition is hand-rolled (no external client library — the single-binary, small-module-graph promise holds) and golden-pinned. First signal set: build and node identity, uptime, the cluster-wide gauges (members, voters, is-leader, effective generation), the durability posture, and an S3 request counter.

The same numbers as a typed snapshot — cluster metrics

The registry also gathers into a typed snapshot encoded as hand-written protobuf and served over the cluster control channel. cluster metrics fetches and renders it from the CLI — and it's the exact model the v0.11 web console will decode, rather than re-parsing scrape text. One gather, two renderings (Prometheus text and the wire snapshot), both from the same typed families.

Durability is the headline signal

The numbers that matter most for a storage system are the durability ones: object-version and bucket counts, the active auto storage profile (k+m, i.e. how many node losses the cluster currently tolerates), and whether a layout transition is open. cluster status now prints a one-line durability summary derived from each node's own replica, and hamster_s3_requests_total{method,code} counts the gateway's request mix.

How it's verified

  • Golden-pinned exposition — the Prometheus text format and the snapshot codec are byte-pinned, so a format drift fails a test.
  • End to end over the real binary — a node started with -admin is scraped at /metrics, the same signals are fetched as the typed snapshot via cluster metrics, and the cluster status durability summary is asserted, all after a live S3 request moves the counter.
  • The aws CLI, rclone, restic, s3cmd compatibility suites, the race detector, and the deterministic simulation harness keep passing.

What this is not

  • No request-latency histograms yet. The registry's first signals are counters and gauges; latency histograms are the next additive increment.
  • No tracing. Distributed tracing is deliberately deferred.
  • Not assessed or certified. Hamster exposes its health honestly, but it has not been assessed for any regulation by anyone, and it is v0, not production-ready, with formats that may still change.

Next up

The next release closes a real gap this work made easy to see: bringing the cluster S3 surface to parity with the single-node store. Today cluster run -s3 refuses multipart and server-side copy and buffers whole objects in memory, while the single-node serve path does all three — so aws s3 cp on a large file works against a single node but not yet against a cluster. The next milestone lands efficient Range reads, streaming PUT, server-side copy, and erasure-coded multipart on the cluster path.

Binaries below are static (CGO_ENABLED=0), version-stamped (hamster version), with SHA-256 checksums in SHA256SUMS.