Skip to content

v0.3.0 — erasure coding and self-healing repair

Latest

Choose a tag to compare

@incognick incognick released this 13 Jun 02:35
· 49 commits to main since this release
Immutable release. Only release title and notes can be modified.

The third Hamster release: erasure coding and self-healing repair — and the S3 endpoint joins the cluster.

Dev preview. Objects now live erasure-coded across a real cluster, but the S3 write path commits only on the Raft leader (clients retry elsewhere), multipart and server-side copy are not on the cluster path yet, the data-plane membership is effectively fixed once data exists (rebalance lands in v0.4), and on-disk/on-wire formats may still change between v0 releases. Please don't trust real data to it — but the cluster preview now stores objects, not just metadata.

What's in v0.3

  • Erasure-coded objects, end to end. A PUT erasure-codes the body into k+m self-describing shards spread across k+m distinct nodes (no two shards of one object ever share a node); a GET reconstructs from any k. Storage profiles follow an auto ladder as the cluster grows, with a small-object k=1 rule. Object data never passes through the Raft log — only the small metadata commit does, which is the design's first invariant.
  • The S3 endpoint joins the cluster. hamster cluster run -s3 <addr> puts the full S3 API on every node: reads from the local replica, mutations as Raft proposals, objects through the erasure-coded data path. (Leader-only writes for now — a non-leader answers 503 SlowDown and clients retry elsewhere; multipart and server-side copy are refused on this path until their erasure-coded design lands.)
  • The write-ack rule, mechanically. All k+m shards durable on the healthy path; a degraded write acks at a hard floor of k+1 and refuses below it with SlowDown. The metadata commit is the linearization point and happens only after the ack rule is met — so an object's durability budget is honest at the moment it is acknowledged.
  • Self-healing repair. A repair sweep scrubs every shard against its replicated checksum and rebuilds missing or bit-rotted shards from any k verified survivors. Corruption is found by hashing and never laundered into a rebuild; a rotted shard is healed without anyone having to read the object first.
  • Bounded memory, random-access reads. Windowed shard transfer keeps memory bounded per in-flight transfer regardless of object size, and a ranged GET moves only the shard slices it actually covers.

How it's verified

  • The deterministic simulation harness drives the whole data path — placement, shard transfer, the ack rule, repair — through seeded cluster schedules: crashed receivers, down nodes, floor refusals, mid-PUT coordinator loss, degraded reads through crashed holders, an emptied node healed, two-shard bitrot rebuilt without a read, beyond-tolerance loss reported honestly, and crash-mid-sweep convergence. Durability is checked by decoding shard files off the surviving disks, not by trusting an ack.
  • A six-node end-to-end suite runs real processes over loopback mTLS, stores objects 4+2, and kills nodes mid-workload: reads reconstruct around the loss, writes ack at the floor, and once the cluster is below the floor, writes refuse honestly while reads keep serving at exactly k.
  • The race detector and the v0.1 compatibility suite (aws CLI, rclone, restic, s3cmd) keep passing.

Binaries below are static (CGO_ENABLED=0), version-stamped (hamster version), with SHA-256 checksums in SHA256SUMS. Next up, v0.4: partitioned placement made real — a stored, versioned cluster layout with zone-aware spread, capacity weighting, and rebalance, so a cluster can finally grow its data-plane membership after data exists.