Skip to content

v0.9.0 — upgrade machinery

Choose a tag to compare

@incognick incognick released this 17 Jun 21:51
· 27 commits to main since this release
Immutable release. Only release title and notes can be modified.

The ninth Hamster release: the machinery a zero-downtime rolling upgrade needs — a cluster that knows which version every node runs, rolls that version forward on its own as nodes upgrade, and can tell you whether a node is safe to take down right now. The orchestration that drives the roll automatically lands in v0.10; v0.9 builds and proves the foundation it stands on.

Dev preview, and read the limits below. The upgrade machinery works and is proven end to end, but the v0.x limits hold: it is a cluster feature, the roll is operator-driven (v0.9 makes it observable and checkable; it does not yet drive the binary swap for you), writes still commit only on the Raft leader, multipart and server-side copy are still not on the cluster path, and on-disk/on-wire formats may change between v0 releases. Hamster is not assessed or certified for any regulation. Please don't trust real regulated data to it yet.

What's in v0.9

Three pieces, designed in ADR-0034.

The cluster knows its version, and rolls it forward etcd-style

Every node advertises two things into the replicated registry: a binary version string (for display) and a declared protocol generation (a small integer the binary owns, advanced only by a coordinated format change — not every release). The cluster's effective generation is the minimum across live members, exactly as etcd computes its cluster version — and it rolls forward automatically once the last node upgrades. There is no manual finalize step: Hamster's additively-versioned formats make most changes mixed-version-safe, so the common upgrade just works, and a manual gate stays in reserve for the rare non-additive change.

  • The leader keeps the registry current by polling each peer's own version, so a node upgraded in place (stop, swap the binary, start) has its new version picked up without a re-join — there is no proposal forwarding, so the leader learns it rather than the follower writing it.
  • cluster status shows each node's version, the effective generation, and — mid-roll — a one-step skew note (upgrade one generation at a time, the analogue of Kubernetes' version-skew policy).

The health interlock: cluster can-stop <node>

A new advisory check that answers whether taking a node down for maintenance or upgrade is safe right now: Raft quorum survives without it (a voter only when a majority remains), no other node is already down, and no data migration (layout transition) is open. It exits 0 (safe) or 1 (not) with a reason, so a roll script can gate on hamster cluster can-stop <node> && upgrade <node>.

It is advisory — it never refuses a shutdown; a node must always be stoppable in an emergency. The data-plane dimension (erasure-coding tolerance, not just Raft quorum) is the part a bare etcd interlock can't express; it's the analogue of a Kubernetes PodDisruptionBudget.

The end-to-end upgrade suite

The proof. Two binaries are built from one source at generations N and N+1; a three-node cluster starts on N holding versioned and COMPLIANCE-locked data, then each node is taken down (honoring can-stop) and restarted on N+1 from its own disk. Throughout, the suite asserts continuous availability, zero data loss, that the COMPLIANCE lock stays WORM across the upgrade (invariant 4), and that the effective generation holds until the last node lands, then auto-rolls forward.

How it's verified

  • Real-process cluster tests (loopback mTLS): a three-node in-place roll proves the effective generation holds at 1 until the last node moves to 2, then rolls; the interlock's quorum math is checked at one, two, and three voters.
  • End to end over the real binary: the rolling-upgrade suite above, plus the existing erasure-coded object, versioning, object-lock, encryption, and rotation suites — all still green.
  • The aws CLI, rclone, restic, s3cmd, the race detector, and the deterministic simulation harness keep passing.

What this is not

  • It is a cluster feature. The single-node serve preview has one version and nothing to roll.
  • The roll is not yet automated. v0.9 gives you the version view, the interlock, and the proof; you still swap binaries out of band (a new container image, a package upgrade, a new chart) and restart node by node. v0.10 drives that loop for you and turns the advisory interlock and skew rule into enforced gates.
  • Not assessed or certified. Hamster upgrades cleanly, but it has not been assessed for any regulation by anyone, and it is v0, not production-ready, with formats that may still change.

Binaries below are static (CGO_ENABLED=0), version-stamped (hamster version), with SHA-256 checksums in SHA256SUMS. Next up, v0.10: zero-downtime rolling upgrades — the orchestration that drives the roll over this release's machinery.