Skip to content

Iteration perf improvements #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 8, 2023
Merged

Conversation

benlesh
Copy link
Contributor

@benlesh benlesh commented May 8, 2023

Here are two commits that drastically improve the performance of using TypedFastBitSet and SparseTypedFastBitSet in for..of loops.

Just to explain: *[Symbol.iterator]() implementations, while terse, leverage JavaScript generators which are notably slower than just returning a plain iterator object. This is because generators also support coroutines, which aren't being leveraged at all here.

I'm making this PR because our team is favoring forEach over for..of for performance reasons, and if the point of this type is to make things fast, I think it should be fast in all cases. forEach(fn), in particular when used in a nested fashion, don't perform as well as in-lined for loops. :)

Thank you for this library BTW!

@lemire
Copy link
Owner

lemire commented May 8, 2023

if the point of this type is to make things fast, I think it should be fast in all cases

I have checked in a new benchmark script at https://github.com/lemire/TypedFastBitSet.js/blob/master/benchmark/issue25.js

Could you run it?

I am getting that the for-of construction, with your PR, is still 2x slower than forEach.

TypedFastBitSet-forof x 121 ops/sec ±0.55% (79 runs sampled)
TypedFastBitSet-foreach x 248 ops/sec ±0.79% (92 runs sampled)
SparseTypedFastBitSet-forof x 123 ops/sec ±0.78% (80 runs sampled)
SparseTypedFastBitSet-foreach x 198 ops/sec ±0.28% (88 runs sampled)

I think we will merge your PR, but if your expectation is that you have bridged the performance gap... I am not sure that's the case. You should still use forEach if you care about performance, it seems.

Or do I misunderstand?

1 similar comment
@lemire
Copy link
Owner

lemire commented May 8, 2023

if the point of this type is to make things fast, I think it should be fast in all cases

I have checked in a new benchmark script at https://github.com/lemire/TypedFastBitSet.js/blob/master/benchmark/issue25.js

Could you run it?

I am getting that the for-of construction, with your PR, is still 2x slower than forEach.

TypedFastBitSet-forof x 121 ops/sec ±0.55% (79 runs sampled)
TypedFastBitSet-foreach x 248 ops/sec ±0.79% (92 runs sampled)
SparseTypedFastBitSet-forof x 123 ops/sec ±0.78% (80 runs sampled)
SparseTypedFastBitSet-foreach x 198 ops/sec ±0.28% (88 runs sampled)

I think we will merge your PR, but if your expectation is that you have bridged the performance gap... I am not sure that's the case. You should still use forEach if you care about performance, it seems.

Or do I misunderstand?

@benlesh
Copy link
Contributor Author

benlesh commented May 8, 2023

:) Okay, so this explanation will get weird, so I'm including an update to your benchmark.

The implementation of forEach will ALWAYS be faster in the naive case for the exact same reason I'd like to make this change: Inlined loops are way faster than repeat function calls.

While the iterator might still be 2x slower than forEach, 2x slower than lightning fast is still better than 10x slower than lightning fast (which is what we had with the generator-based implementation)

And, when the only "fast" tool provided is forEach, that makes developers lean into nested forEach loops, which don't optimize well/naturally in the runtime. This is because the most obvious/readable implementations will involve inlined functions (usually arrow functions) that must be allocated and GC'ed during each loop. The deeper the nesting the worse it is.

I've included benchmark examples showing a nesting of for..of and forEach 3x deep over a smaller set of data. You'll see that the performance is opposite. This is because -- as stated above -- nested, inlined loops beat repeat function calls.

  .add("TypedFastBitSet-nested-forof", function () {
      let card = 0;
      for (const x of tb_small) {
        for (const y of tb_small) {
          for (const z of tb_small) {
            card += x + y + z;
          }
        }
      }
      return card;
    })
    .add("TypedFastBitSet-nested-forEach", function () {
      let card = 0;
      tb_small.forEach(function (x) {
        tb_small.forEach(function (y) {
          tb_small.forEach(function (z) {
            card += x + y + z;
          });
        });
      });
      return card;
    })

Results:

TypedFastBitSet-nested-forof x 103 ops/sec ±0.40% (75 runs sampled)
TypedFastBitSet-nested-forEach x 205 ops/sec ±0.29% (87 runs sampled)

:) I hope this helps explain the intent behind the requested change

@benlesh
Copy link
Contributor Author

benlesh commented May 8, 2023

Interestingly, the benchmarks you have don't show a huge difference between the naive cases from master to this PR... but they show a HUGE improvement when nested.

master:

TypedFastBitSet-forof x 97.89 ops/sec ±1.77% (71 runs sampled)
TypedFastBitSet-foreach x 186 ops/sec ±1.75% (78 runs sampled)
SparseTypedFastBitSet-forof x 100 ops/sec ±1.77% (74 runs sampled)
SparseTypedFastBitSet-foreach x 127 ops/sec ±1.82% (73 runs sampled)
TypedFastBitSet-nested-forof x 17.34 ops/sec ±2.30% (47 runs sampled)
TypedFastBitSet-nested-forEach x 203 ops/sec ±0.57% (86 runs sampled)

this PR:

TypedFastBitSet-forof x 96.65 ops/sec ±1.84% (71 runs sampled)
TypedFastBitSet-foreach x 196 ops/sec ±1.42% (83 runs sampled)
SparseTypedFastBitSet-forof x 98.03 ops/sec ±1.48% (72 runs sampled)
SparseTypedFastBitSet-foreach x 130 ops/sec ±1.76% (74 runs sampled)
TypedFastBitSet-nested-forof x 101 ops/sec ±0.55% (74 runs sampled)
TypedFastBitSet-nested-forEach x 205 ops/sec ±0.25% (87 runs sampled)

Anecdotally, while testing the implementation in arbitrary Stackblitz experiments, I was seeing a 4x-5x improvement even in the naive case. 🤷‍♂️

@lemire
Copy link
Owner

lemire commented May 8, 2023

Merging.

@lemire lemire merged commit 31c64b3 into lemire:master May 8, 2023
@benlesh
Copy link
Contributor Author

benlesh commented May 8, 2023

Thank you very much, @lemire! LMK when you have time to publish. ❤️

@benlesh benlesh deleted the iteration-perf branch May 8, 2023 20:50
@lemire
Copy link
Owner

lemire commented May 10, 2023

It should have been published by now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants