Very simple fast test of hashing performance for moderate sized files #241

inorton · 2016-11-02T17:26:25Z

I noticed the conversation about hashing in one of the other issues. Perhaps this would be useful?

It creates a directory tree of 1000 files each of 256K in size and then simply hashes each. As files are fresh each run the misleading figures due to warm/cold disc caches should be consistent. The whole test only takes a 1-2 sec for me.

codecov-io · 2016-11-02T17:32:30Z

Current coverage is 88.86% (diff: 100%)

Merging #241 into master will decrease coverage by 0.55%

@@             master       #241   diff @@
==========================================
  Files             1          1          
  Lines          1040        997    -43   
  Methods           0          0          
  Messages          0          0          
  Branches        166        158     -8   
==========================================
- Hits            930        886    -44   
- Misses           82         83     +1   
  Partials         28         28

Powered by Codecov. Last update 1e7b28a...7c8f2d1

frerich · 2016-11-03T21:19:32Z

Thanks, I think it's a good idea to have some sort of standard benchmark. I suppose instead of creating a new file, this could be part of the existing performancetests.py test case?

Alas (?), the discussion in #239 suggests that the issue with slow hashing appears to be related to concurrent clcache instances, at least for @akleber 's setup -- so I'm not sure the test code as it is reproduces that issue

In any case, I very much agree that some sort of performance test for this functionality would be good -- it's not clear to me though which scenarios to benchmark.

inorton · 2016-11-05T08:40:38Z

I started this with a theory that we could compute hashes in parallel using concurrent.futures if we had multiple cpus (some doing IO some doing hashing). My tests here showed that it actually just made things worse by quite some considerable margin.

frerich · 2016-11-05T10:22:09Z

Indeed, it matches @akleber 's observation that concurrent hashing of files is substantially slower than sequential hashing.

Maybe this is another argument in favor of some sort of server process which acts as the sole instance to sequentially hash (and potentially cache) hashes.

frerich · 2016-11-14T07:51:25Z

I think a performance test to check how fast cache hits and cache misses are (both concurrently as well as sequentially) would be a nice thing to have, but that should probably go into performancetests.py.

Very simple fast test of hashing performance for moderate sized files

7c8f2d1

hubx mentioned this pull request Nov 12, 2016

Switch file hashing to xxhash #243

Closed

frerich added the test label Nov 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very simple fast test of hashing performance for moderate sized files #241

Very simple fast test of hashing performance for moderate sized files #241

inorton commented Nov 2, 2016

codecov-io commented Nov 2, 2016 •

edited

Loading

frerich commented Nov 3, 2016

inorton commented Nov 5, 2016

frerich commented Nov 5, 2016

frerich commented Nov 14, 2016

Very simple fast test of hashing performance for moderate sized files #241

Are you sure you want to change the base?

Very simple fast test of hashing performance for moderate sized files #241

Conversation

inorton commented Nov 2, 2016

codecov-io commented Nov 2, 2016 • edited Loading

Current coverage is 88.86% (diff: 100%)

frerich commented Nov 3, 2016

inorton commented Nov 5, 2016

frerich commented Nov 5, 2016

frerich commented Nov 14, 2016

codecov-io commented Nov 2, 2016 •

edited

Loading