Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize BitArray set ops to improve performance #18

Open
modulovalue opened this issue Dec 22, 2023 · 9 comments
Open

Specialize BitArray set ops to improve performance #18

modulovalue opened this issue Dec 22, 2023 · 9 comments

Comments

@modulovalue
Copy link

Hey @isoos,

I think there's a simple way to improve the performance of all the set operations of BitArray on other BitArrays by specializing them to take a BitArray and using that knowledge to avoid using iterators.

That is, I've observed that we can be more efficient if we replace the following:

  // ... and and/andNot/xor
  void or(BitSet set) {
    final iter = set.asUint32Iterable().iterator;
    for (var i = 0; i < _data.length && iter.moveNext(); i++) {
      _data[i] |= iter.current;
    }
  }

with

  // ... and and/andNot/xor
  void or(BitArray set) {
    final Uint32List data2 = set._data;
    for (var i = 0; i < _data.length && i < data2.length; i++) {
      _data[i] |= data2[i];
    }
  }

I'm not proposing to replace the existing set operations, but to add new ones that work on BitArrays only.

What do you think?

@isoos
Copy link
Owner

isoos commented Dec 22, 2023

I think this could be done within the same method, no need to change the API for it:

if (set is BitArray) {
  // new code
} else {
  super.and(set);9
}

Or maybe the API could have a method that return Uint32List? which is there for arrays, that is absent for other types, and we don't need the explicit type check, rather either work on the list or fall back to iterator use.

@modulovalue
Copy link
Author

modulovalue commented Dec 22, 2023

Or maybe the API could have a method that return Uint32List? which is there for arrays, that is absent for other types

This would allow others to use that Uint32List in unsafe ways (unless we wrap it in UnmodifiableListView or something like that, but this, again, introduces a level of indirection that might negatively impact performance).

One use case that I have for BitArray takes 2 seconds~ (In Java, with its built in BitSet) and uses roughly 17 million ors and ands. I think it would be nice if we could avoid the type checks and any branching that would be needed for that.

What do you think about something like the following:

// Alongside BitArray
abstract class BitArrayOps {
  // XOR reference https://github.com/RoaringBitmap/CRoaring/blob/13407ae912cbe16d28f196966a1989b714b4996d/src/bitset.c#L346-L357 (not or) 
  // @pragma(...inline...) ... does inlining work if private members are being used ...?
  void inplaceOr(BitArray subject, BitArray object) {
    final left = subject._data;
    final right = object._data;
    final length = min(left.length, right.length);
    for (var i = 0; i < length; i++) {
      left[i] |= right[i];
    }
  }
  
  // ... Others ...
}

@isoos
Copy link
Owner

isoos commented Dec 22, 2023

Couple of options so far:

  • Type check for BitArray (keeps the implementation concise).
  • Type check for asUint32Iterable() is Uint32List (allows non-BitArray class to be efficient)
  • Uint32List? with UnmodifiableUint32ListView wrapper (allows exposing it without modification).
  • specialized methods (including separate helper class, further methods on BitArray or BitSet (e.g. andAllBitArrays(List<BitArray>)

It is important to recognize that the primary reason for this package is to be efficient, so maybe specialized methods would be preferable compared to API simplicity. However, it would be nice to see how much overhead e.g. the is BitArray check adds for your use case, before putting that off the table. Could you please run an experiment that runs that 17 mill. operation for like 10 times, but also add an is BitArray check for each and() call? Couple of measurements and picking the lowest numbers? May not be scientific, but if it is visible difference, I'll put the is <X> checks away...

@modulovalue
Copy link
Author

I agree with everything you said.

I'll give it a go and let you know how it went.

@modulovalue
Copy link
Author

Here are some results:

set.or(other)


JIT: 
Measure: run: took: 9977.85400 ms
Measure: run: took: 9306.82000 ms
Measure: run: took: 9278.14700 ms
Measure: run: took: 9343.11800 ms
Measure: run: took: 9267.38300 ms
Measure: run: took: 9246.39700 ms
Measure: run: took: 9267.77700 ms
Measure: run: took: 9228.74600 ms
Measure: run: took: 9322.24500 ms
Measure: run: took: 9316.60400 ms
time = 93.559s

AOT
Measure: run: took: 19549.53800 ms
Measure: run: took: 19582.27600 ms


BitArrayOps.inplaceOr

abstract class BitArrayOps {
  static void inplaceOr(BitArray a, BitArray b) {
    final aData = a._data;
    final bData = b._data;
    final min = math.min(aData.length, bData.length);
    for (var i = 0; i < min; i++) {
      aData[i] |= bData[i];
    }
  }
}
JIT
Measure: run: took: 9212.79800 ms
Measure: run: took: 9117.07000 ms
Measure: run: took: 9075.82800 ms
Measure: run: took: 9041.75700 ms
Measure: run: took: 9063.17500 ms
Measure: run: took: 9118.70700 ms
Measure: run: took: 9092.03800 ms
Measure: run: took: 9110.16900 ms
Measure: run: took: 9143.65100 ms
Measure: run: took: 9084.90500 ms
time = 91.061s

AOT
Measure: run: took: 15792.41000 ms
Measure: run: took: 15746.63000 ms


set.or(set) with type check:

    if (set is BitArray) {
      return BitArrayOps.inplaceOr(this, set);
    } else {
      final iter = set.asUint32Iterable().iterator;
      for (var i = 0; i < _data.length && iter.moveNext(); i++) {
        _data[i] |= iter.current;
      }
    }
JIT
Measure: run: took: 9033.32300 ms
Measure: run: took: 9079.60100 ms
Measure: run: took: 9056.57700 ms
Measure: run: took: 9208.21700 ms
Measure: run: took: 9074.69000 ms
Measure: run: took: 9113.01300 ms
Measure: run: took: 9092.11300 ms
Measure: run: took: 9098.37700 ms
Measure: run: took: 9260.31700 ms
Measure: run: took: 9087.77400 ms
time = 91.106s

AOT
Measure: run: took: 15899.48300 ms
Measure: run: took: 15731.78600 ms
Measure: run: took: 15774.08800 ms

I draw the following conclusion: the is check does not introduce much overhead. Specializing via is is probably fine.

@isoos
Copy link
Owner

isoos commented Dec 23, 2023

Thank you for the benchmarks! I think it is reassuring that we can keep the current API surface, while also improve performance.

What's your plan: would you have the time to prepare PRs (and also hopefully tests) for these?

@modulovalue
Copy link
Author

modulovalue commented Dec 23, 2023

My goal right now is to see whether a Dart-native bit array can be competitive with other bit arrays.

I've tried binding CRoaring to Dart https://github.com/modulovalue/roaring_dart, but neither its compressed bitmaps nor its plain old bitset were able to outperform your bit array, which I'm attributing to FFI overhead.

It doesn't look like the existing tests will be able to cover the changes that this would introduce. I agree, it would be good to have some additional ones.


Specializing and can also be observed to lead to an improvement:

JIT • current and
Measure: run: took: 8400.85700 ms
Measure: run: took: 8377.36500 ms
Measure: run: took: 8336.81800 ms
Measure: run: took: 8294.66400 ms
Measure: run: took: 8271.60500 ms
Measure: run: took: 8387.44400 ms

JIT • `is` specialized and
Measure: run: took: 8403.58200 ms
Measure: run: took: 8320.60800 ms
Measure: run: took: 8234.92000 ms
Measure: run: took: 8246.11100 ms
Measure: run: took: 8242.61800 ms
Measure: run: took: 8250.43100 ms

AOT • current and
Measure: run: took: 10401.46000 ms
Measure: run: took: 10562.89300 ms
Measure: run: took: 10432.88500 ms

AOT • `is` specialized and
Measure: run: took: 9292.28000 ms
Measure: run: took: 9223.08200 ms
Measure: run: took: 9258.39600 ms
 
  static void inplaceAnd(BitArray a, BitArray b) {
    final aData = a._data;
    final bData = b._data;
    final min = math.min(aData.length, bData.length);
    var i = 0;
    for (; i < min; i++) {
      aData[i] &= bData[i];
    }
    for (; i < aData.length; i++) {
      aData[i] = 0;
    }
  }

@modulovalue
Copy link
Author

What's your plan: would you have the time to prepare PRs

There are quite a few issues that I've opened. I think I'll try to submit a PR with some tests once you've had the time to look through them and give a LGTM.

(and Happy Holidays!)

@isoos
Copy link
Owner

isoos commented Dec 23, 2023

I think you are doing a great investigation and preparation work here, I'll be happy to integrate it into the package. I'll have a bit hectic holiday schedule, but I'll set out time to review PRs if/when you happen to submit them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants