Use clz32 for counting trailing zeroes. #9340

jussi-kalliokoski · 2023-10-24T14:35:42Z

This changes the implementation of BitSet to use Math.clz32 to count trailing zeroes instead of the current WebAssembly-based approach. This makes the implementation IMO slightly more readable to those of us without a brain to parse hex-encoded WebAssembly. 😄

Additionally this approach should yield slightly better performance for the iteration of the BitSets. At least that's what running the following micro-benchmark on my machine indicates:

Benchmark Source Code

const wasmBuf = new Uint8Array([
  0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x06, 0x01, 0x60, 0x01,
  0x7f, 0x01, 0x7f, 0x03, 0x02, 0x01, 0x00, 0x07, 0x0d, 0x01, 0x09, 0x74, 0x72,
  0x61, 0x69, 0x6c, 0x69, 0x6e, 0x67, 0x30, 0x00, 0x00, 0x0a, 0x07, 0x01, 0x05,
  0x00, 0x20, 0x00, 0x68, 0x0b, 0x00, 0x0f, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x02,
  0x08, 0x01, 0x00, 0x01, 0x00, 0x03, 0x6e, 0x75, 0x6d,
]);

const {trailing0} = new WebAssembly.Instance(new WebAssembly.Module(wasmBuf)).exports;

function ctz32(n) {
  if (n === 0) {
    return 32;
  }
  n = n & -n;
  return 31 - Math.clz32(n);
}

const MAX_U32 = 0xffffffff;

function expectEqual(a, b) {
  if (a !== b) {
    throw new Error(`${a} !== ${b}`);
  }
}

function bench(name, f, N = 1e6) {
  console.time(name);
  f(N);
  console.timeEnd(name);
}

for (let i = 0; i < 1e6; i++) {
  expectEqual(trailing0(i), ctz32(i));
  expectEqual(trailing0(MAX_U32 - i), ctz32(MAX_U32 - i));
}

const { platform, arch, release, cpus } = require("os");
console.log("node", process.version, platform(), arch(), release());
console.log(cpus()[0].model);

bench("trailing0", function (N) {
  for (let i = 0; i < N; i++) {
    trailing0(i);
    trailing0(MAX_U32 - i);
  }
});

bench("ctz32", function (N) {
  for (let i = 0; i < N; i++) {
    ctz32(i);
    ctz32(MAX_U32 - i);
  }
});

Giving the output:

node v16.16.0 darwin x64 23.0.0
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
trailing0: 18.292ms
ctz32: 3.429ms

devongovett · 2023-10-24T16:35:48Z

ooh nice, I'll run a full end-to-end benchmark but looks promising. I had considered this but I guess I assumed that the wasm instruction would be faster since ideally it would be a single CPU instruction. Maybe there's some overhead though if the engine doesn't inline it.

jussi-kalliokoski · 2023-10-25T13:55:45Z

Yeah, based on the numbers I’ve been seeing I think the current implementations never end up inlining JS->Wasm calls and instead even calls to simple functions end up allocating an execution context, so for single word computations it rarely is worth it, especially in this case as the JS implementation should end up being JIT compiled in the asmjs-like path on V8 (only arithmetic and no compute + assign operators like &=, at least that was the heuristic a few years ago IIRC). An ephemeral micro-optimization for sure, but one that IMO doesn’t compromise on readability meaningfully and gets the job done in current runtimes.

This is my first PR to Parcel btw (about time..), what’s up with the integration tests? Is it expected that they’re failing?

mischnic · 2023-10-25T16:09:41Z

~~That was fixed recently, so all tests should pass after rebasing~~ That is unfortunately our usual flaky-ness, so don't worry about it

jussi-kalliokoski · 2023-10-30T08:34:56Z

Not sure how to interpret the benchmark results as I'm not familiar with how much variance is expected 🤔

Another question: what's the desired process from here on out, i.e. should I keep the branch up-to-date until reviewed or wait for review and rebase once the changes are up-to-par? :)

mischnic · 2023-10-31T09:01:47Z

Don't worry about the CI benchmark action, the actual numbers aren't really reliable.

No need to keep this branch updated/rebased, this PR's commit history will get squashed on merge anyway, so we'll handle that.

devongovett

Tested on the React Spectrum website. I didn't see a meaningful overall build time change on that project, but this is definitely easier to read. I think it just isn't the main bottleneck anymore. Could have a bigger effect on an even larger project.

Use clz32 for counting trailing zeroes.

a788031

jussi-kalliokoski force-pushed the native_ctz32 branch from e3b7259 to a788031 Compare October 24, 2023 16:23

devongovett approved these changes Nov 2, 2023

View reviewed changes

devongovett merged commit 7d924d2 into parcel-bundler:v2 Nov 2, 2023
13 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use clz32 for counting trailing zeroes. #9340

Use clz32 for counting trailing zeroes. #9340

jussi-kalliokoski commented Oct 24, 2023

devongovett commented Oct 24, 2023

jussi-kalliokoski commented Oct 25, 2023

mischnic commented Oct 25, 2023 •

edited

Loading

jussi-kalliokoski commented Oct 30, 2023

mischnic commented Oct 31, 2023

devongovett left a comment

Use clz32 for counting trailing zeroes. #9340

Use clz32 for counting trailing zeroes. #9340

Conversation

jussi-kalliokoski commented Oct 24, 2023

devongovett commented Oct 24, 2023

jussi-kalliokoski commented Oct 25, 2023

mischnic commented Oct 25, 2023 • edited Loading

jussi-kalliokoski commented Oct 30, 2023

mischnic commented Oct 31, 2023

devongovett left a comment

Choose a reason for hiding this comment

mischnic commented Oct 25, 2023 •

edited

Loading