Permalink
Commits on Feb 16, 2012
Commits on Feb 15, 2012
  1. Fix MSVC++ compile errors.

    committed Feb 15, 2012
Commits on Feb 7, 2012
  1. Use BDELTA_GLOBAL as a flag, instead of BDELTA_LOCAL. Also, fix BDelt…

    …a's Python wrapper for flag handling.
    committed Jan 13, 2012
  2. Add ability to require that the hole sides be ordered or the hole sid…

    …e be under a specified maximum.
    committed Jan 13, 2012
  3. Use Cython to create a new Python wrapper which is much more function…

    …al than the old one.
    committed Jan 11, 2012
  4. Checkpoint 5

    committed Jan 10, 2012
  5. Small optimization.

    committed Jan 9, 2012
  6. More Cleanups.

    committed Jan 7, 2012
  7. Fixes

    committed Jan 6, 2012
  8. Checkpoint 4

    committed Jan 10, 2012
  9. Checkpoint 3

    committed Jan 5, 2012
  10. Fix hash table issues.

    Most importantly, make sure identical checksums are always adjacent
    when the list is sorted, since our "removal of repeated checksums" code
    depends on it.
    committed Jan 10, 2012
  11. Checkpoint 2

    committed Jan 4, 2012
  12. Checkpoint 1

    committed Jan 4, 2012
  13. New experimental ideas...

    This commit isn't meant to be used publicly.
    committed Jan 3, 2012
  14. Clean "THOROUGH" mode code.

    committed Jan 2, 2012
  15. Use STL list to store "potential" checksum matches.

    Besides the advantage of using the standard, this change actually
    reduces the code size on my machine - both source and compiled.
    committed Jan 2, 2012
  16. Select matches more carefully.

    Only recognize matches that are at least "blocksize * 2" bytes.
    Sometimes the "next pass" will find a better-suited match.
    
    Sample data sets:
    1: "*-12(Benefits for Organizations*-12(Benefits for Countries..."
    2: "-12(Benefits for Organizations-12(Benefits for Countries..."
    
    Without this patch, and with a blocksize of 16, the following blocks
    are hashed:
    "*-12(Benefits fo", "r Organizations*", "-12(Benefits for"
    This causes the front of dataset 2 to be linked against the third
    hashed block of dataset 1. (BAD!)
    
    With this patch, the hashes match, but the match won't be recognized
    because it's only 17 bytes (needs to be 32).
    The next pass hashes the following blocks:
    "*-12(Ben", "efits fo", "r Organi", "zations*", "-12(Bene", "fits for"
    Now there are two reliable hashes to latch onto.
    committed Jan 2, 2012