addAll is much faster in sparse mode since only existing indexes have to be updated. We can be smarter than creating a copy of other and converting it to normal mode when other is in smart mode. We can traverse the other sparseSet and only update relevant index. This patch speed up scenario when lot of small cardinalities HLL++ are aggregated into more coarse grained buckets.
under the Apache License, Version 2.0.
…ntics - A new estimator is always created - This and estimators are never modified
- addAll is similar to Set.addAll. It performs a mutable union. The current HLL(++) is modified, the elements of the other HLL(++) are added to the current HLL(++). Other is never modified. We cannot reuse the merge name since it would break compatibility. - Updated merge to have a consistent behavior. A new HLL(++) is always created. Nor this nor the estimators are ever modified. The issues were that: - this was returned when estimators was null or empty. This is an unsafe behavior. It could easily lead to corruption since the user has now way to know if it is safe to modify the returned instance of not. - The HLL++ implementation was modifying this. This behavior is not consistent with HLL and is unsafe. - The HLL++ implementation was converting estimators to normal mode is this is in normal mode. This can be avoided, and is costly both in time and memory
Use "mvn -Pgpg" to enable gpg for a specific build. By default gpg is disabled.
- only sort values in the encodedSet up to a specified index to prevent uninitialized values from polluting result - fix failing tests and add a new test of a single element
Update travis configuration file, -Dgpg.skip=true no longer required.
…proves memory efficiency. This commit does introduce a few more member variables that track HyperLogLogPlus' state. This member variables such as the sparseSet index and tmpSet index will need to be synchronized in some way to make the class threadsafe
… when the version number is not present in the first byte of the stream. All encodes will use the new encoding scheme so the result should be a one time transformation from legacy to new encoding format
…o prevent sparseSet from growing too large - improve space efficiency for encoding/decoding Sparse and Normal representations of HLL. Significant space savings but this is a breaking change and not compatible with previous encoding schemes
Combined redundant classes (ScoredItem and ErrorAndCount), Prevent ever-increasing variable "size".