shipping-rust

Reference Rust workspace for course c9 — Shipping Rust: Cargo, CI, Benchmarks & Containers in the Coursera Applied AI Engineering / Rust Data Engineering specializations.

This repo is a complete, opinionated example of what shipping a small Rust binary looks like end-to-end:

a Cargo workspace with a typed core library, a CLI, and a benches crate
explicit error and reporting types, not strings or panics
100% line coverage gated by cargo llvm-cov
single-job CI that aggregates fmt / clippy / test / coverage / audit / deny / release-build / binary-size budget / bench-smoke
a multi-stage Docker build that produces a <2 MB scratch container (stock Docker layer caching only — no external build-cache helpers)
supply-chain hygiene via cargo-audit and cargo-deny
repo-shape gates on stable: bashrs lint (Makefile + Dockerfiles), pv lint contracts/ (provable-contract YAML), and pmat comply
a tag-driven release workflow that ships prebuilt binaries to the GitHub Release page (4 Linux targets) and container images to ghcr.io/paiml/shipping-rust (scratch + distroless)
dual MIT / Apache-2.0 licensing

It is the companion to course c12's paiml/zig-from-zero — same scope, same shape, in Rust.

Workspace layout

Crate	Kind	Purpose
`etl-core`	library	Typed CSV → JSON Lines pipeline. Rejects malformed rows into a row-aligned `Report`; never panics on bad input.
`etl-cli`	binary (`etl`)	Reads CSV from `--input` (path or `-`), writes JSON Lines to `--output` (path or `-`), emits the report on stderr.
`etl-bench`	library + bench	Synthetic CSV fixture generator (`synth_csv(n)`) used by criterion benches at 1k / 10k / 100k row sizes.

The example dataset is fruit measurements: each input row is id,fruit,weight_g, and each output record carries a size_bucket of Small (<100 g), Medium (100–299 g), Large (≥300 g), or Unknown (weight missing).

Provable contracts

The etl binary asserts two named contracts at runtime on every successful run:

ROWS_IN_EQUALS_ROWS_OUT — rows_in == rows_out + rows_rejected. The Report is row-aligned: the rows we read must equal the rows we wrote plus the rows we rejected, with nothing falling through silently.
REPORT_JSON_ROUNDTRIPS — serde_json::to_string(&report) then serde_json::from_str::<Report> parses back to a value equal to the original. The report is a structured artifact, not a debug dump.

Both contracts also hold under unit and integration tests (cargo test).

Quick start

Requires Rust 1.95.0 (pinned by rust-toolchain.toml).

# Build everything
cargo build --workspace

# Test (unit + integration)
cargo test --workspace

# 100% line coverage gate
cargo llvm-cov --workspace --fail-under-lines 100

# Lints — clippy with -D warnings, plus the workspace lints in
# Cargo.toml (unsafe_code = "forbid", unwrap_used = "warn", panic = "warn",
# pedantic enabled).
cargo clippy --workspace --all-targets -- -D warnings

# Doc build with -D warnings
RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps

# Supply chain
cargo audit --deny warnings
cargo deny check

# Benches (1k / 10k / 100k rows, criterion)
cargo bench --workspace

Running the CLI

# CSV in, JSON Lines out, report on stderr
$ printf 'id,fruit,weight_g\n1,apple,150\n2,watermelon,7800\n' | cargo run -q --bin etl
{"id":1,"fruit":"apple","size_bucket":"Medium"}
{"id":2,"fruit":"watermelon","size_bucket":"Large"}
{"rows_in":2,"rows_out":2,"rows_rejected":0,"errors_by_kind":{}}

# Reject paths produce structured rejections, not panics
$ printf 'id,fruit,weight_g\n1,apple,150\nbad_id,banana,118\n3,,77\n4,grape,5\n' | cargo run -q --bin etl
{"id":1,"fruit":"apple","size_bucket":"Medium"}
{"id":4,"fruit":"grape","size_bucket":"Small"}
{"rows_in":4,"rows_out":2,"rows_rejected":2,"errors_by_kind":{"empty_fruit":1,"invalid_id":1}}

Container

The included Dockerfile is a plain musl + scratch multi-stage build — no external Rust build-cache helpers, just stock Docker layer caching. The pattern follows paiml/forjar's own Dockerfile: for a small workspace the extra dependency is not worth the layer savings. We rely on Docker's stock layer cache by copying workspace manifests first so cargo fetch --locked is reused whenever sources change but Cargo.lock does not. The final image runs as user 65532 and contains nothing but the static etl binary.

$ docker build -t shipping-rust:latest .
$ docker images shipping-rust:latest --format "{{.Size}}"
1.5MB

$ printf 'id,fruit,weight_g\n1,apple,150\n' | docker run --rm -i shipping-rust:latest
{"id":1,"fruit":"apple","size_bucket":"Medium"}
{"rows_in":1,"rows_out":1,"rows_rejected":0,"errors_by_kind":{}}

A glibc-linked variant is available at Dockerfile.distroless-cc for workloads that need a libc / TLS roots / /etc/passwd (Google's distroless cc-debian12:nonroot base, ~25 MB). Same plain-multi-stage layering strategy, different runtime base.

Benchmarks

The gate job in CI runs cargo bench -- --test on every PR — that's "smoke mode": compile each criterion bench, run its body once, fail if it panics, don't record samples. It proves the harness works without paying the 30s+ per-bench statistical-sampling cost on every push.

The full criterion suite — warmup + 100 samples × 3 input sizes — runs in a separate workflow, .github/workflows/bench.yml, on a self-hosted intel runner from the paiml org runner pool. Triggers:

workflow_dispatch — manual run from the Actions tab
weekly cron — Sundays 06:00 UTC, drift check against main
push to main when etl-core/, etl-bench/, or Cargo.{toml,lock} change

Why a self-hosted runner? GitHub-hosted shared VMs run with 5-15% coefficient of variation, which makes anything below a ~10% regression invisible. On bare metal CV stays under 1%, so a 2% regression is real signal. For a teaching repo, signal quality is worth the operational cost.

The workflow commits results back into bench-results/latest/, keeping the repo as the source of truth for "is this fast?":

Size	Mean	Throughput
1k rows	~111 µs	~9.0M rows/sec
10k rows	~1.0 ms	~9.7M rows/sec
100k rows	~10.4 ms	~9.6M rows/sec

(Numbers from a Threadripper 7960X seed run — see bench-results/latest/SUMMARY.md for the full meta + JSON.) Constant throughput across input sizes is the key signal — confirms the pipeline is linear in input size, with per-call overhead amortized away by 10k rows.

make bench         # full criterion suite locally (~30s)
make bench-smoke   # CI smoke mode (compile + run-once)

After a workflow run, pull and diff against the saved baseline:

git pull
cargo bench --workspace -- --baseline main

Releases

Tag-driven releases (see .github/workflows/release.yml) publish to two places on every vX.Y.Z tag.

1. GitHub Release — prebuilt etl binaries for four Linux targets, each with a .sha256 companion:

etl-vX.Y.Z-x86_64-unknown-linux-musl.tar.gz
etl-vX.Y.Z-x86_64-unknown-linux-gnu.tar.gz
etl-vX.Y.Z-aarch64-unknown-linux-musl.tar.gz
etl-vX.Y.Z-aarch64-unknown-linux-gnu.tar.gz

aarch64 targets are cross-compiled with cross v0.2.5; x86_64 targets are native.

2. GitHub Container Registry — both Dockerfile variants are pushed as linux/amd64 images. The package is public; anonymous docker pull works without a GHCR login:

ghcr.io/paiml/shipping-rust:latest             # newest scratch (default flavor)
ghcr.io/paiml/shipping-rust:distroless         # newest distroless
ghcr.io/paiml/shipping-rust:vX.Y.Z             # versioned scratch
ghcr.io/paiml/shipping-rust:vX.Y.Z-distroless  # versioned distroless

Cutting a release — bump [workspace.package].version in Cargo.toml, tag, and push:

git tag v0.1.0
git push origin v0.1.0

The workflow verifies the tag against the workspace version, builds all four binaries plus both container variants, and attaches everything to an auto-generated GitHub Release. There is no crates.io publish step — shipping-rust is a teaching reference, not a published crate.

CI

A single GitHub Actions job named gate runs the full pipeline against both MSRV (1.95.0) and stable. Three repo-shape steps (bashrs lint, pv lint contracts/, pmat comply) are gated to the stable matrix entry only — the MSRV entry still proves the workspace itself builds on the pinned channel:

gate (1.95.0 | stable)
├── bashrs lint Makefile + Dockerfiles   (stable only)
├── pv lint contracts/                   (stable only)
├── pmat comply                          (stable only)
├── cargo fmt --check
├── cargo clippy -D warnings
├── cargo doc -D warnings
├── cargo test --workspace --all-targets
├── cargo test --workspace --doc
├── cargo llvm-cov --fail-under-lines 100
├── cargo audit --deny warnings
├── cargo deny check
├── cargo build --release
├── binary-size budget (<8 MB)
└── cargo bench -- --test                (smoke)

If gate is green, the workspace is ship-ready. See .github/workflows/ci.yml.

What this repo is teaching

Cargo workspaces are the unit of distribution, not single crates. [workspace.package], [workspace.dependencies], and [workspace.lints] are non-optional.
Errors are types, not strings. EtlError (an enum) and ErrorKind (a discriminator the report uses) draw the line cleanly.
Reports are row-aligned. If rows_in != rows_out + rows_rejected, something fell through silently — and the binary refuses to exit cleanly when that happens (the contract asserts at runtime).
100% line coverage is achievable when you write to it. The uncovered regions in cargo llvm-cov's output are macro-expansion artifacts (clap derive, thiserror, serde derive); every executable line in our own source has a test.
CI should be one job, not nine. A single gate matrix entry that agglomerates fmt / clippy / doc / test / coverage / audit / deny / size-budget / bench is far easier to read at a glance than a fan-out graph.
Containers should be smaller than your CSV input. The scratch + musl pattern lands a static Rust binary at <2 MB without any external Rust build-cache helper — Docker's stock layer cache is enough when the workspace is this size.

License

Dual MIT / Apache-2.0, the standard pattern for the Rust ecosystem. See LICENSE-MIT and LICENSE-APACHE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

shipping-rust

Workspace layout

Provable contracts

Quick start

Running the CLI

Container

Benchmarks

Releases

CI

What this repo is teaching

License

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
assets		assets
bench-results		bench-results
contracts		contracts
etl-bench		etl-bench
etl-cli		etl-cli
etl-core		etl-core
.dockerignore		.dockerignore
.gitignore		.gitignore
.pmat-gates.toml		.pmat-gates.toml
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Dockerfile.distroless-cc		Dockerfile.distroless-cc
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Folders and files

Latest commit

History

Repository files navigation

shipping-rust

Workspace layout

Provable contracts

Quick start

Running the CLI

Container

Benchmarks

Releases

CI

What this repo is teaching

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages