Permalink
Commits on Jul 30, 2013
  1. Merge branch 'serialization_improvements'

    abramsm committed Jul 30, 2013
    Conflicts:
    	src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java
  2. Merge branch 'serialization_improvements'

    abramsm committed Jul 30, 2013
    Conflicts:
    	src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java
Commits on Jul 23, 2013
  1. Merge branch 'cleaner_merge' of https://github.com/cykl/stream-lib in…

    abramsm committed Jul 23, 2013
    …to cykl-cleaner_merge
  2. fixes #44

    abramsm committed Jul 23, 2013
Commits on Jul 19, 2013
  1. Faster addAll when other is in sparse mode.

    cykl committed Jul 19, 2013
    addAll is much faster in sparse mode since only existing indexes have to be
    updated. We can be smarter than creating a copy of other and converting it
    to normal mode when other is in smart mode. We can traverse the other sparseSet
    and only update relevant index.
    
    This patch speed up scenario when lot of small cardinalities HLL++ are
    aggregated into more coarse grained buckets.
Commits on Jul 18, 2013
  1. Merge pull request #43 from mspiegel/master

    abramsm committed Jul 18, 2013
    Adding support for insertions and lookups for Strings in CountMinSketch
  2. Clarify that CountMinSketch and associated unit test is liscenced

    Michael Spiegel committed Jul 18, 2013
    under the Apache License, Version 2.0.
Commits on Jul 17, 2013
  1. Merge pull request #41 from cykl/travis-ci

    abramsm committed Jul 17, 2013
    Enable Travis CI
  2. Ensure that all the cardinality estimator follows the same merge sema…

    cykl committed Jul 17, 2013
    …ntics
    
    - A new estimator is always created
    - This and estimators are never modified
  3. Add addAll method to HLL & HLL++

    cykl committed Jul 17, 2013
    - addAll is similar to Set.addAll. It performs a mutable union. The current
      HLL(++) is modified, the elements of the other HLL(++) are added to the
      current HLL(++). Other is never modified.
      We cannot reuse the merge name since it would break compatibility.
    
    - Updated merge to have a consistent behavior. A new HLL(++) is always created.
      Nor this nor the estimators are ever modified. The issues were that:
    
        - this was returned when estimators was null or empty. This is an unsafe
          behavior. It could easily lead to corruption since the user has now way
          to know if it is safe to modify the returned instance of not.
    
        - The HLL++ implementation was modifying this. This behavior is not
          consistent with HLL and is unsafe.
    
        - The HLL++ implementation was converting estimators to normal mode
          is this is in normal mode. This can be avoided, and is costly both
          in time and memory
  4. Adding support for insertions and lookups for String keys in CountMin…

    Michael Spiegel committed Jul 17, 2013
    …Sketch class.
Commits on Jul 16, 2013
  1. Update mvn profile to enable gpg signing on demand.

    cykl committed Jul 16, 2013
    Use "mvn -Pgpg" to enable gpg for a specific build. By default gpg is disabled.
Commits on Jul 15, 2013
  1. - ensure tmpIndex is reset when the tmpSet is merged with the sparseSet

    abramsm committed Jul 15, 2013
    - only sort values in the encodedSet up to a specified index to prevent uninitialized values from polluting result
    - fix failing tests and add a new test of a single element
  2. Travis CI does not provide oraclejdk6

    cykl committed Jul 15, 2013
    "Travis CI provides OpenJDK 6, OpenJDK 7 and Oracle JDK 7. Sun JDK 6 is not provided and because it is EOL as of November 2012."
  3. Add a profile to enable gpg signing only for releases.

    cykl committed Jul 15, 2013
    Update travis configuration file, -Dgpg.skip=true no longer required.
  4. Add Travis CI configuration file

    cykl committed Jul 15, 2013
Commits on Jul 14, 2013
  1. convert sparseSet from an List of byte arrays to a int array. this im…

    abramsm committed Jul 14, 2013
    …proves memory efficiency. This commit does introduce a few more member variables that track HyperLogLogPlus' state. This member variables such as the sparseSet index and tmpSet index will need to be synchronized in some way to make the class threadsafe
Commits on Jul 12, 2013
  1. Merge pull request #40 from cykl/fix_java6

    abramsm committed Jul 12, 2013
    Fix java 6 regression introduced by 0fcb805
Commits on Jul 10, 2013
  1. add version to the codec and automatically degrade to legacy decoding…

    abramsm committed Jul 10, 2013
    … when the version number is not present in the first byte of the stream. All encodes will use the new encoding scheme so the result should be a one time transformation from legacy to new encoding format
  2. - get more accurate sparseSet size when comparing to sort threshold t…

    abramsm committed Jul 10, 2013
    …o prevent sparseSet from growing too large
    
    - improve space efficiency for encoding/decoding Sparse and Normal representations of HLL.  Significant space savings but this is a breaking change and not compatible with previous encoding schemes
Commits on Jun 20, 2013
  1. add missing imports

    abramsm committed Jun 20, 2013
  2. Merge branch 'master' of https://github.com/eric-vlaanderen/stream-lib

    abramsm committed Jun 20, 2013
    …into eric-vlaanderen-master
Commits on Jun 14, 2013
  1. Improvements to prevent corruption of the minimum value,

    Eric committed Jun 14, 2013
    Combined redundant classes (ScoredItem and ErrorAndCount),
    Prevent ever-increasing variable "size".
Commits on Jun 13, 2013
  1. Fix.

    Eric committed Jun 13, 2013
  2. Implement ConcurrentStreamSummary

    Eric committed Jun 13, 2013
Commits on Jun 5, 2013
  1. Merge pull request #36 from cykl/faster_hll_card

    abramsm committed Jun 5, 2013
    Faster hll card
Commits on Jun 3, 2013
  1. Faster HLL cardinality for small sets

    cykl committed Jun 3, 2013
    Avoid to traverse the register set twice for small sets.
    It does not imply any slow down for large sets.
  2. Faster HLL cardinality

    cykl committed Jun 3, 2013
    Math.pow(2, (-1 * X)) is the same than 1.0 / (1 << X) but much slower.
    
    The new implementation is from 2 to 50 times faster depending on the set cardinality.
  3. Merge pull request #35 from cykl/faster_hll

    abramsm committed Jun 3, 2013
    Faster hll