Release v0.9.0 — Π.15 (parallel multi-RG decode + NUMA awareness) · ryan-evans-git/ematix-parquet

Highlights

Ships parallel multi-row-group decode end-to-end. Multi-RG files are embarrassingly parallel (each RG is independent); v0.9 lifts the threading + NUMA awareness into the codec so consumers don't have to roll it.

New surface

New parallel feature on ematix-parquet-codec (rayon optional; default builds stay rayon-free).
parallel::read_columns_parallel(file, &targets, opts, decode_one) — decodes a slice of (row_group, column) targets concurrently. Generic over caller closure so the same primitive handles homogeneous + heterogeneous workloads. Output preserves input order.
CancellationToken (AtomicBool, Arc-cloneable). Cooperative semantics — checked at target boundaries; cancelled targets surface new CodecError::Cancelled; in-flight decodes complete.
ParallelDecodeOptions { pool, cancel } — optional caller-owned Arc<rayon::ThreadPool> + cancellation handle.

Linux-only NUMA (`parallel::numa` under `cfg(target_os = "linux")`)

NumaTopology::detect() — via /sys/devices/system/node/node*/cpulist.
pin_current_thread_to_node(&topology, node) — via sched_setaffinity.
build_numa_pinned_pool(num_threads) — rayon pool with workers pinned round-robin to NUMA nodes.
current_node() — via getcpu(2) syscall.
alloc_local_buffer(size) — 4 KiB-stride first-touch so the buffer lands on the calling thread's node. Combined with worker pinning, chunk bytes land on the right node — no libnuma C dep needed.

Bench harness

examples/bench_parallel_scaling.rs — synthetic 50-RG Snappy-compressed i64 file, sweeps thread counts 1, 2, 4, 8, … capped at the host CPU count; reports speedup + efficiency vs sequential. On Linux also exercises the NUMA-pinned pool. Ready to drop into a multi-socket AWS box for the plan acceptance numbers.

Constraints

NUMA module is cfg(target_os = "linux") — portable callers stay on read_columns_parallel; NUMA-aware callers cfg-gate their own usage.
Multi-socket scaling validation (plan acceptance #1) is deferred to AWS infra. Local single-NUMA-node host hits a ParquetFile.file: Mutex<File> serialization bottleneck at ~1.8× peak; the bench docstring documents it. Switching to pread-based unlocked I/O is a separate optimisation.
Cancellation is at target boundaries only — not within a single (rg, col) decode.

Crates published

ematix-parquet-format 0.9.0
ematix-parquet-io 0.9.0
ematix-parquet-crypto 0.9.0
ematix-parquet-codec 0.9.0
ematix-parquet-async 0.9.0

🤖 Generated with Claude Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0 — Π.15 (parallel multi-RG decode + NUMA awareness)

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

New surface

Linux-only NUMA (`parallel::numa` under `cfg(target_os = "linux")`)

Bench harness

Constraints

Crates published

Uh oh!

v0.9.0 — Π.15 (parallel multi-RG decode + NUMA awareness)

Highlights

New surface

Linux-only NUMA (parallel::numa under cfg(target_os = "linux"))

Bench harness

Constraints

Crates published

Uh oh!

Linux-only NUMA (`parallel::numa` under `cfg(target_os = "linux")`)