v0.8.0 — Π.14 (adaptive predicate-dispatch)
Highlights
Ships adaptive predicate-dispatch — the codec now probes the first N pages of a chunk with the fused kernel, measures selectivity, and decides per-chunk whether to emit a bitmap (fused, wins at low selectivity) or a values vector (materialised, wins at high selectivity).
New surface
ematix_parquet_codec::adaptivemodule —Dispatch,PageProbe,AdaptiveDictPredicate,AdaptiveDispatchOptions,AdaptiveChunkOutput<T>/AdaptiveOutputKind<T>,SelectivityProbe,popcount_bitmap_prefix,probe_page_fused,run_adaptive_dict_chunk<T: Copy>.ematix_parquet_codec::read::read_column_{i32,i64,f64}_predicate_adaptive— per-type façade entry points. Predicate is evaluated against dict entries (≤dict.len()invocations per chunk); returnsAdaptiveChunkOutput<T>.- Optional
Fn(SelectivityProbe)telemetry callback exposes the per-chunk dispatch decision.
Bench-derived threshold
DEFAULT_THRESHOLD = 0.10 from a 7-point sweep on TPC-H SF1 lineitem l_shipdate (aarch64, median of 51 release-mode iterations). The crossover where materialised first beats fused+gather for a values-consuming caller lands right at 10% selectivity. Full table baked into the AdaptiveDictPredicate::DEFAULT_THRESHOLD docstring as in-code reference for future retuning.
Constraints
- Adaptive entry points are dict-only. Chunks with PLAIN-encoded data pages return
InvalidInput; callers should fall back toread_column_*_masked_intofor those. - BYTE_ARRAY not covered (T: Copy doesn't hold for
Vec<u8>) — follow-up if requested. - Bitmap-consuming callers (filter chains, COUNT aggregator) should stay on the static
decode_rle_dictionary_predicate_bitmapentry point — fused always wins for that output shape, and the adaptive runner would add per-chunk dispatch overhead for no gain.
Test coverage
20 tests across the Π.14 surface: 4 unit + 4 happy-path oracle + 7 extended acceptance (widths bw ∈ {14, 16, 18}, probe-pages edges, custom threshold override, mid-chunk selectivity shift) + 5 façade integration tests via the codec's own dict writer.
Crates published
ematix-parquet-format0.8.0ematix-parquet-io0.8.0ematix-parquet-crypto0.8.0ematix-parquet-codec0.8.0ematix-parquet-async0.8.0
🤖 Generated with Claude Code