clay-codes

A Rust implementation of Clay (Coupled-Layer) erasure codes, based on the FAST'18 paper "Clay Codes: Moulding MDS Codes to Yield an MSR Code" (PDF). The construction was originally implemented in Ceph and is now part of its master codebase.

Why Clay codes?

In distributed storage, node failures are routine. When a node goes down, the system must repair it by fetching data from surviving nodes. With Reed-Solomon codes, repairing a single node means downloading k full chunks across the network -- even though you only need to reconstruct one.

Clay codes are MSR (Minimum Storage Regenerating) codes: they provide the same fault tolerance and storage overhead as Reed-Solomon, but repair a failed node by downloading only a fraction of each helper's data. In practice, this reduces repair network traffic by up to 2.9x and repair time by up to 3x.

Quickstart

Add to your Cargo.toml:

[dependencies]
clay-codes = "0.1"

Encode and Decode

use clay_codes::ClayCode;
use std::collections::HashMap;

// 4 data chunks, 2 parity chunks, 5 helpers for repair
let clay   = ClayCode::new(4, 2, 5).unwrap();
let data   = b"Hello, Clay codes!";
let chunks = clay.encode(data);

// Simulate losing node 0 -- collect the surviving chunks
let mut available: HashMap<usize, Vec<u8>> = HashMap::new();
for (node, chunk) in chunks.iter().enumerate() {
    if node != 0 {
        available.insert(node, chunk.clone());
    }
}

// Recover the original data from the remaining 5 chunks
let recovered = clay.decode(&available, &[0]).unwrap();
assert_eq!(&recovered[..data.len()], &data[..]);

Bandwidth-Optimal Repair

This is the core advantage over Reed-Solomon. Rather than downloading full chunks from k nodes, minimum_to_repair tells you exactly which sub-chunks to fetch from each helper, and repair reconstructs the lost node from that partial data.

use clay_codes::ClayCode;
use std::collections::HashMap;

let clay           = ClayCode::new(4, 2, 5).unwrap();
let data           = b"Hello, Clay codes!";
let chunks         = clay.encode(data);
let chunk_size     = chunks[0].len();
let sub_chunk_size = chunk_size / clay.sub_chunk_no;

// Which sub-chunks do we need from each helper to repair node 0?
let helpers    = vec![1, 2, 3, 4, 5];
let repair_map = clay.minimum_to_repair(0, &helpers).unwrap();

// Fetch only those sub-chunks from each helper node
let mut partial: HashMap<usize, Vec<u8>> = HashMap::new();

for (helper, sub_chunks) in &repair_map {
    let mut buf = Vec::new();
    for &sc in sub_chunks {
        let start = sc * sub_chunk_size;
        let end   = start + sub_chunk_size;
        buf.extend_from_slice(&chunks[*helper][start..end]);
    }
    partial.insert(*helper, buf);
}

// Reconstruct node 0 from the partial data
let recovered = clay.repair(0, &partial, chunk_size).unwrap();
assert_eq!(recovered, chunks[0]);

Parameters

Clay codes are configured with three values:

Parameter	Description
`k`	Number of data chunks
`m`	Number of parity chunks (tolerates up to `m` simultaneous failures)
`d`	Number of helper nodes for repair (`k+1 <= d <= k+m-1`)

The remaining parameters are derived automatically:

Derived	Formula	Meaning
`q`	`d - k + 1`	Coupling factor
`alpha`	`q^t`	Sub-chunks per chunk (sub-packetization level)
`beta`	`alpha / q`	Sub-chunks downloaded per helper during repair

Example Configurations

Config (k, m, d)	alpha	beta	Repair BW vs RS
(4, 2, 5)	8	4	37.5% savings
(9, 3, 11)	81	27	59.3% savings
(10, 4, 13)	256	64	67.5% savings

Higher d - k means more savings, at the cost of larger sub-packetization.

Documentation

API reference: docs.rs/clay-codes
Implementation guide: docs/clay-practical-implementation.md -- storage layouts, repair access patterns, metadata schemas, and performance considerations for integrating Clay codes into a distributed storage system.
Paper notes: docs/clay-codes-fast18.md -- detailed notes on the FAST'18 paper including construction, algorithms, and experimental results.

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
benches		benches
docs		docs
examples		examples
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clay-codes

Why Clay codes?

Quickstart

Encode and Decode

Bandwidth-Optimal Repair

Parameters

Example Configurations

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

clay-codes

Why Clay codes?

Quickstart

Encode and Decode

Bandwidth-Optimal Repair

Parameters

Example Configurations

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages