Permalink
Commits on Feb 7, 2012
  1. Checkpoint 5

    committed Jan 10, 2012
  2. Small optimization.

    committed Jan 9, 2012
  3. More Cleanups.

    committed Jan 7, 2012
  4. Fixes

    committed Jan 6, 2012
  5. Checkpoint 4

    committed Jan 10, 2012
  6. Checkpoint 3

    committed Jan 5, 2012
  7. Fix hash table issues.

    Most importantly, make sure identical checksums are always adjacent
    when the list is sorted, since our "removal of repeated checksums" code
    depends on it.
    committed Jan 10, 2012
  8. Checkpoint 2

    committed Jan 4, 2012
  9. Checkpoint 1

    committed Jan 4, 2012
  10. New experimental ideas...

    This commit isn't meant to be used publicly.
    committed Jan 3, 2012
  11. Clean "THOROUGH" mode code.

    committed Jan 2, 2012
  12. Use STL list to store "potential" checksum matches.

    Besides the advantage of using the standard, this change actually
    reduces the code size on my machine - both source and compiled.
    committed Jan 2, 2012
  13. Select matches more carefully.

    Only recognize matches that are at least "blocksize * 2" bytes.
    Sometimes the "next pass" will find a better-suited match.
    
    Sample data sets:
    1: "*-12(Benefits for Organizations*-12(Benefits for Countries..."
    2: "-12(Benefits for Organizations-12(Benefits for Countries..."
    
    Without this patch, and with a blocksize of 16, the following blocks
    are hashed:
    "*-12(Benefits fo", "r Organizations*", "-12(Benefits for"
    This causes the front of dataset 2 to be linked against the third
    hashed block of dataset 1. (BAD!)
    
    With this patch, the hashes match, but the match won't be recognized
    because it's only 17 bytes (needs to be 32).
    The next pass hashes the following blocks:
    "*-12(Ben", "efits fo", "r Organi", "zations*", "-12(Bene", "fits for"
    Now there are two reliable hashes to latch onto.
    committed Jan 2, 2012
Commits on Jan 10, 2012
  1. Fix hash table issues.

    Most importantly, make sure identical checksums are always adjacent
    when the list is sorted, since our "removal of repeated checksums" code
    depends on it.
    committed Jan 10, 2012
Commits on Jan 3, 2012
  1. Clean "THOROUGH" mode code.

    committed Jan 2, 2012
Commits on Jan 2, 2012
  1. Use STL list to store "potential" checksum matches.

    Besides the advantage of using the standard, this change actually
    reduces the code size on my machine - both source and compiled.
    committed Jan 2, 2012
  2. Select matches more carefully.

    Select matches more carefully.
    
    Only recognize matches that are at least "blocksize * 2" bytes.
    Sometimes the "next pass" will find a better-suited match.
    
    Sample data sets:
    1: "*-12(Benefits for Organizations*-12(Benefits for Countries..."
    2: "-12(Benefits for Organizations-12(Benefits for Countries..."
    
    Without this patch, and with a blocksize of 16, the following blocks
    are hashed:
    "*-12(Benefits fo", "r Organizations*", "-12(Benefits for"
    This causes the front of dataset 2 to be linked against the third
    hashed block of dataset 1. (BAD!)
    
    With this patch, the hashes match, but the match won't be recognized
    because it's only 17 bytes (needs to be 32).
    The next pass hashes the following blocks:
    "*-12(Ben", "efits fo", "r Organi", "zations*", "-12(Bene", "fits for"
    Now there are two reliable hashes to latch onto.
    committed Jan 2, 2012
Commits on Dec 30, 2011
  1. Clean up the main loop.

    - Properly fix the bug that was causing fatal read errors.
    
    - Tweak for better results.
    
    - Reorganize code.
    committed Dec 30, 2011
Commits on Dec 17, 2011
  1. HACK / DIRTY WORKAROUND to prevent bdelta from death

    Found a out-of-bounds file reading on the following samples today:
    
        http://anti-virus.by/samples/bdelta/2011-12-16/from.bin
        http://anti-virus.by/samples/bdelta/2011-12-16/to.bin
    
    The patch attempts to skip tail checksumming, but i don't
    know if the rest of code behaves correctly.
    
    NEEDS PROPER FIX
    
    Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
    trofi committed with Dec 16, 2011