v0.9.1 — u8 dict + BYTE_ARRAY adaptive + bloom builder
Highlights
Three additive opportunistic features bundled as a patch on top of v0.9.0. No API breaks, no behaviour changes for existing callers.
u8 dict-indices reader for bw ≤ 8 columns
read::DictPreservedColumnU8 { dict_bytes, dict_offsets, indices: Vec<u8> }read::read_column_byte_array_dict_preserved_u8+..._u8_intodict::decode_rle_dictionary_indices_u8+..._u8_into
Saves 3 bytes/row on dict-encoded columns with ≤ 256 unique values (most TPC-H string columns: l_returnflag, l_linestatus, status enums, etc.). Unlocks Arrow DictionaryArray<UInt8, T> materialisation in ematix-flow with a 4× smaller indices buffer. Errors deterministically on (a) dict > 256 entries, (b) any data page with bw > 8, (c) no dictionary page — caller falls back to the u32 variant.
BYTE_ARRAY adaptive façade (closes the v0.8 gap)
read::read_column_byte_array_predicate_adaptive(file, rg, col, predicate, opts, telemetry)read::AdaptiveByteArrayChunkOutput+read::AdaptiveByteArrayOutputKind { Bitmap, Values { bytes, offsets } }
Same dispatch contract as the scalar adaptive entry points (i32/i64/f64) introduced in v0.8.0. Materialized output is Arrow-style (bytes, offsets) so consumers get the same shape as read_column_byte_array_offsets. Predicate is Fn(&[u8]) -> bool evaluated against dict entries (≤ dict.len() calls per chunk).
Split-Block Bloom Filter builder
bloom::SplitBlockBloomFilterBuilderwithinsert_hash/insert_bytes/into_bytesbloom::optimal_num_blocks(n, fpp) -> u32(rounds to next power of two)
Symmetric to the Π.6c decoder. Round-trips byte-stable through SplitBlockBloomFilter::from_bytes. Full writer-integration (emitting filters into a parquet file's body + setting ColumnMetaData.bloom_filter_offset) is a deferred follow-up that needs format-crate metadata-writer work.
Crates published
ematix-parquet-format0.9.1ematix-parquet-io0.9.1ematix-parquet-crypto0.9.1ematix-parquet-codec0.9.1ematix-parquet-async0.9.1
🤖 Generated with Claude Code