You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi - thanks for creating a library for Bloom filters (with a great API!)
I have run some experiments using your scalable Bloom filters, but they do not seem to perform very well :(
I created a fork containing code for benchmarking the use cases I want to use your library for, as well as experiments with optimizing some of your code.
One performance bottleneck can be found in your choice of method for converting objects to bytes, in order to hash the bytes. I have created a project for benchmarking various methods for the conversion, showing that the BinaryFormatter used in your library performs horribly compared to the alternatives.
The benchmarks in my fork shows the performance improvements achievable by replacing BinaryFormatter with alternatives (I have included the benchmarking results at the bottom of this issue - notice the memory usage column on the far right)
Despite the new optimizations, the memory usage of your Bloom filters (esspecially the scalable version, which I really want to use) is very high compared to an alternative like a HashSet.
Is this just the nature of the implementation, or can it be improved?
(The table below is a part of the output of the benchmarking code. In the tabl, your original implementation of a scalable Bloom filter is called ScalableBloomFilter.)
Thanks for this input! I really appreciate the thorough data. I've been aware of this issue for a while now, but frankly it's been tough for me to decide what the right solution is exactly. The answer that I see requires some breaking changes in the API and more importantly also requires that consumers handle serialization prior to passing objects to any of the data structures that require hashing (almost all of them). I've got some code where I've tinkered with making the change as a new major version and adding some serialization helper methods as extension methods on the data structures. I think you've convinced me that now is the time to make that code production-ready and get it out for folks to use. I appreciate your feedback, and I'll keep this issue updated as I work through this update.
Hi - thanks for creating a library for Bloom filters (with a great API!)
I have run some experiments using your scalable Bloom filters, but they do not seem to perform very well :(
I created a fork containing code for benchmarking the use cases I want to use your library for, as well as experiments with optimizing some of your code.
One performance bottleneck can be found in your choice of method for converting objects to bytes, in order to hash the bytes. I have created a project for benchmarking various methods for the conversion, showing that the
BinaryFormatter
used in your library performs horribly compared to the alternatives.The benchmarks in my fork shows the performance improvements achievable by replacing
BinaryFormatter
with alternatives (I have included the benchmarking results at the bottom of this issue - notice the memory usage column on the far right)Despite the new optimizations, the memory usage of your Bloom filters (esspecially the scalable version, which I really want to use) is very high compared to an alternative like a
HashSet
.Is this just the nature of the implementation, or can it be improved?
(The table below is a part of the output of the benchmarking code. In the tabl, your original implementation of a scalable Bloom filter is called
ScalableBloomFilter
.)The text was updated successfully, but these errors were encountered: