This repo contains benchmarking scripts for compression libraries found in CPython.
Right now things are a bit messy as I set this up specifically for python/cpython#139877, but I plan to refactor into a generic compression benchmark suite.
To run the benchmarks first tune the system for benchmarks.
Then download the English wiki data (or supply your own data)
curl -SLo "enwik9.zip" "https://enwik9.zip"
unzip the data:
unzip enwik9.zip
and prepare the data used (so it doesn't need to be re-calculated):
python prepare_data.py
Then, run a benchmark script:
python bench_zstd.py -o zstd.json
- Unify benchmarks into one CLI program
 - Select settings for benchmark (which modules to include, what settings per module)
 - Try silesia corpus?