Release v0.11.0 — full SIMD specialisation parity (NEON + AVX2) · ryan-evans-git/ematix-parquet

v0.11.0 closes the SIMD specialisation table on both architectures. Every production bit width that the scalar fallback was serving at ~7-9 GB/s now has a hand-tuned SIMD kernel on both AArch64 NEON and x86_64 AVX2.

Coverage delta vs v0.10.0

Width	NEON v0.10	NEON v0.11	AVX2 v0.10	AVX2 v0.11
1	scalar	✓ added	scalar	✓ added
4	✓ shipped	✓	scalar	✓ added
5	scalar	✓ added	scalar	✓ added
8	✓ shipped	✓	scalar	✓ added
12-18	✓	✓	✓	✓
20	scalar	✓ added	scalar	✓ added
21	scalar	✓ added	scalar	✓ added

All new kernels in #64; the release PR is #65.

Per-width strategy

bw=1 — broadcast each source byte to 8 lanes, AND with per-lane bit-mask, compare-eq → 0/1 outputs
bw=4 — nibble extract (AND 0x0F + shift-right-4), interleave per parquet LSB-first packing, widen
bw=5 — extract one u16 per lane via shuffle, variable-shift, mask, widen
bw=8 — trivial byte-aligned widen (vmovl chain / _mm256_cvtepu8_epi32)
bw=20 — mirrors bw=17/18: two 16-byte loads (offsets 0, 10), alternating shifts, mask 0x0F_FFFF
bw=21 — like bw=20 but lo/hi halves use different shuffles + every lane has a distinct shift

Widths 12, 14, 15, 16, 17, 18 already had full NEON + AVX2 coverage since the Π.12 cycle. Widths still on the scalar const-generic path (bw=2, 3, 6, 7, 9, 10, 11, 13, 19, 22..32) get SIMD coverage on-demand — v0.12.0 added bw=2 and bw=3.

Full test suite (680 tests) green; CI ubuntu-latest exercises the AVX2 path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.11.0 — full SIMD specialisation parity (NEON + AVX2)

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Coverage delta vs v0.10.0

Per-width strategy

Uh oh!