Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update and complete benchmarks #19

Merged
merged 1 commit into from
Jan 20, 2020

Conversation

sebastian-nagel
Copy link
Contributor

Just a couple of updates of the comparison/benchmarking tool, also to discuss possible further performance improvements

  • upgrade to most recent versions of compared libs/tools
  • for easier profiling:
    • wrap profiled tools and parameters into methods to have a separate stack
    • sleep 1 sec. at start to allow to attach a profiler
  • catch exceptions and keep going to benchmark remaining tools
  • webarchive-commons: add variant with disabled signature check
  • gzip: add variant with increased buffer size (64 kiB)

Shortly about the results (on a gzipped WARC file):

  • a larger buffer size (8 -> 64 kiB) seems to speed up gzip, could be worth to try this also for jwarc
  • webarchive-commons is faster by 36% if no digest is calculated while reading the WARC file

Output:

Benchmarking CC-MAIN-20191207160050-20191207184050-00031.warc.gz
iteration 1
gzipinputstream (buffer 8kB)  in 12253ms
gzipinputstream (buffer 64kB)  in 10904ms
webarchive-commons 133945 in 40756ms
webarchive-commons (no digest check) 133945 in 24176ms
jwat buff 133945 in 21584ms
jwarc 133945 in 14623ms

iteration 2
gzipinputstream (buffer 8kB)  in 12583ms
gzipinputstream (buffer 64kB)  in 11482ms
webarchive-commons 133945 in 43104ms
webarchive-commons (no digest check) 133945 in 23460ms
jwat buff 133945 in 20800ms
jwarc 133945 in 14962ms

iteration 3
gzipinputstream (buffer 8kB)  in 12953ms
gzipinputstream (buffer 64kB)  in 12496ms
webarchive-commons 133945 in 44103ms
webarchive-commons (no digest check) 133945 in 24895ms
jwat buff 133945 in 19573ms
jwarc 133945 in 13978ms

Profile using async-profiler (interactive SVG bench.2020-01-17-17-57.async-prof.svg.gz):
bench 2020-01-17-17-57 async-prof

@sebastian-nagel sebastian-nagel force-pushed the jwarc-benchmark-improve branch 2 times, most recently from 7c295b6 to 3899b72 Compare January 20, 2020 09:29
- upgrade to most recent versions of compared libs/tools
- for easier profiling
  - wrap profiled tools and parameters into methods to
    get a separate stack
  - sleep 1 sec. at start to allow to attach a profiler
- catch exceptions and keep going to benchmark remaining tools
- webarchive-commons: add variant with disabled digesting
- gzip: add variant with increased buffer size (64 kB)
@ato ato merged commit f954123 into iipc:master Jan 20, 2020
@ato
Copy link
Member

ato commented Jan 20, 2020

Interesting. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants