Release v0.3.0 — erasure coding and self-healing repair · hamster-storage/hamster

The third Hamster release: erasure coding and self-healing repair — and the S3 endpoint joins the cluster.

Dev preview. Objects now live erasure-coded across a real cluster, but the S3 write path commits only on the Raft leader (clients retry elsewhere), multipart and server-side copy are not on the cluster path yet, the data-plane membership is effectively fixed once data exists (rebalance lands in v0.4), and on-disk/on-wire formats may still change between v0 releases. Please don't trust real data to it — but the cluster preview now stores objects, not just metadata.

What's in v0.3

Erasure-coded objects, end to end. A PUT erasure-codes the body into k+m self-describing shards spread across k+m distinct nodes (no two shards of one object ever share a node); a GET reconstructs from any k. Storage profiles follow an auto ladder as the cluster grows, with a small-object k=1 rule. Object data never passes through the Raft log — only the small metadata commit does, which is the design's first invariant.
The S3 endpoint joins the cluster. hamster cluster run -s3 <addr> puts the full S3 API on every node: reads from the local replica, mutations as Raft proposals, objects through the erasure-coded data path. (Leader-only writes for now — a non-leader answers 503 SlowDown and clients retry elsewhere; multipart and server-side copy are refused on this path until their erasure-coded design lands.)
The write-ack rule, mechanically. All k+m shards durable on the healthy path; a degraded write acks at a hard floor of k+1 and refuses below it with SlowDown. The metadata commit is the linearization point and happens only after the ack rule is met — so an object's durability budget is honest at the moment it is acknowledged.
Self-healing repair. A repair sweep scrubs every shard against its replicated checksum and rebuilds missing or bit-rotted shards from any k verified survivors. Corruption is found by hashing and never laundered into a rebuild; a rotted shard is healed without anyone having to read the object first.
Bounded memory, random-access reads. Windowed shard transfer keeps memory bounded per in-flight transfer regardless of object size, and a ranged GET moves only the shard slices it actually covers.

How it's verified

The deterministic simulation harness drives the whole data path — placement, shard transfer, the ack rule, repair — through seeded cluster schedules: crashed receivers, down nodes, floor refusals, mid-PUT coordinator loss, degraded reads through crashed holders, an emptied node healed, two-shard bitrot rebuilt without a read, beyond-tolerance loss reported honestly, and crash-mid-sweep convergence. Durability is checked by decoding shard files off the surviving disks, not by trusting an ack.
A six-node end-to-end suite runs real processes over loopback mTLS, stores objects 4+2, and kills nodes mid-workload: reads reconstruct around the loss, writes ack at the floor, and once the cluster is below the floor, writes refuse honestly while reads keep serving at exactly k.
The race detector and the v0.1 compatibility suite (aws CLI, rclone, restic, s3cmd) keep passing.

Binaries below are static (CGO_ENABLED=0), version-stamped (hamster version), with SHA-256 checksums in SHA256SUMS. Next up, v0.4: partitioned placement made real — a stored, versioned cluster layout with zone-aware spread, capacity weighting, and rebalance, so a cluster can finally grow its data-plane membership after data exists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0 — erasure coding and self-healing repair

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's in v0.3

How it's verified

Uh oh!