Threaded Java implementation of GZIP output stream for high-performance compression.
High-level features:
- Write to an OutputStream interface.
- Configurable buffer sizes, compression level, and thread pooling.
- Fails fast on any I/O issues on underlying stream or compression.
- Speeds up compression of larger content close to linearly with # threads.
- Implements pigz technique of priming compression dictionary with last 1/4 of previous block for better compression.
Accepting all internal defaults:
// * Buffer sizes of 128 kB
// * Default `Deflater` compression level
// * Re-usable fixed thread pool of size = # processors.
final OutputStream out = new ConcurrentGZIPOutputStream(new ByteArrayOutputStream());
// write bytes to the stream however you like
out.write(someBytesToCompress);
out.close();
Benchmarks performed on a 24-core KnownHost VPS-2 instance with JDK 7u21. See ConcurrentGZIPPerformanceTest for the test case, which can be run with:
$ mvn clean test -DincludePerfTests=true
Random input pattern designed for poor compression:
Sequential input pattern designed to compress well:
Compression ratios in both cases are within ~0.1%. (JRE producing the slightly better compression. The last 1/4 of previous block method helps offset some of the disadvantage, but still isn't as good as using a single dictionary for the entire input.)
See ConcurrentGZIPOutputStreamTest unit test for more examples.
- Moderate amount of overhead. JRE's single-threaded performance is better.