perf: Add bulkGet64WithBaseline and 8-byte fast path for FixedBitWidthEncoding (#641) by xiaoxmeng · Pull Request #641 · facebookincubator/nimble

xiaoxmeng · 2026-04-05T17:15:50Z

Summary:

Referenced from MRS AusList decode optimization D98819389 (AusLongListForBitpackEncoder). Ports the key branchless byte-aligned load technique to Nimble's FixedBitWidthEncoding for general use.

Add bulk decode optimizations for 64-bit types in FixedBitWidthEncoding, targeting the selective reader and serializer/deserializer materialize() paths.

Changes:

FixedBitArray: Add bulkGet64WithBaseline() for 64-bit output with arbitrary bitWidth. Three code paths by bit width:

bitWidth <= 32: delegates to the optimized template-unrolled 32-bit path (bulkGetWithBaseline32Into64).
bitWidth 33-57: branchless byte-aligned loads — since the sub-byte offset is at most 7, bitWidth + remainder <= 57 + 7 = 64, so each value fits in a single 64-bit load with no cross-word boundary branch. This eliminates the branch in the hot loop and enables better instruction-level parallelism.
bitWidth > 57: falls back to per-element get() for cross-word handling.

FixedBitWidthEncoding: Extend the selective reader fast path (bulkScan + readWithVisitorFast) from 4-byte-only to also support 8-byte integral types (int64/uint64). Previously, 64-bit columns always used the slow per-element path.

Legacy FixedBitWidthEncoding: Updated materialize() to use bulkGet64WithBaseline for 8-byte types.

Differential Revision: D99154749

meta-codesync · 2026-04-05T17:15:58Z

@xiaoxmeng has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99154749.

…hEncoding (facebookincubator#641) Summary: Referenced from MRS AusList decode optimization D98819389 (AusLongListForBitpackEncoder). Ports the key branchless byte-aligned load technique to Nimble's FixedBitWidthEncoding for general use. Add bulk decode optimizations for 64-bit types in FixedBitWidthEncoding, targeting the selective reader and serializer/deserializer materialize() paths. Changes: FixedBitArray: Add bulkGet64WithBaseline() for 64-bit output with arbitrary bitWidth. Three code paths by bit width: - bitWidth <= 32: delegates to the optimized template-unrolled 32-bit path (bulkGetWithBaseline32Into64). - bitWidth 33-57: branchless byte-aligned loads — since the sub-byte offset is at most 7, bitWidth + remainder <= 57 + 7 = 64, so each value fits in a single 64-bit load with no cross-word boundary branch. This eliminates the branch in the hot loop and enables better instruction-level parallelism. - bitWidth > 57: falls back to per-element get() for cross-word handling. FixedBitWidthEncoding: Extend the selective reader fast path (bulkScan + readWithVisitorFast) from 4-byte-only to also support 8-byte integral types (int64/uint64). Previously, 64-bit columns always used the slow per-element path. Legacy FixedBitWidthEncoding: Updated materialize() to use bulkGet64WithBaseline for 8-byte types. Differential Revision: D99154749

meta-codesync · 2026-04-06T04:15:11Z

This pull request has been merged in a7f0acd.

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 5, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 5, 2026

meta-codesync Bot changed the title ~~perf: Add bulkGet64WithBaseline and 8-byte fast path for FixedBitWidthEncoding~~ perf: Add bulkGet64WithBaseline and 8-byte fast path for FixedBitWidthEncoding (#641) Apr 5, 2026

xiaoxmeng force-pushed the export-D99154749 branch from 136b8c8 to 33752c3 Compare April 5, 2026 17:16

xiaoxmeng force-pushed the export-D99154749 branch from 33752c3 to 34c2b18 Compare April 5, 2026 18:11

meta-codesync Bot closed this in a7f0acd Apr 6, 2026

facebook-github-tools Bot added the Merged label Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add bulkGet64WithBaseline and 8-byte fast path for FixedBitWidthEncoding (#641)#641

perf: Add bulkGet64WithBaseline and 8-byte fast path for FixedBitWidthEncoding (#641)#641
xiaoxmeng wants to merge 1 commit intofacebookincubator:mainfrom
xiaoxmeng:export-D99154749

xiaoxmeng commented Apr 5, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Apr 5, 2026

Uh oh!

meta-codesync Bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiaoxmeng commented Apr 5, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Apr 5, 2026

Uh oh!

meta-codesync Bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xiaoxmeng commented Apr 5, 2026 •

edited by meta-codesync Bot

Loading