Skip to content

Commit

Permalink
[feature] Move to rayon backend (#7)
Browse files Browse the repository at this point in the history
  • Loading branch information
sstadick committed Aug 15, 2021
1 parent 8ba3e7f commit a664a7d
Show file tree
Hide file tree
Showing 10 changed files with 358 additions and 333 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,4 @@ jobs:
uses: Swatinem/rust-cache@v1

- name: Run tests
run: cargo test --verbose --features parsnap_default,pargz_default
run: cargo test --verbose --features parsnap_default,pargz_default -- test_simple test_regression
169 changes: 54 additions & 115 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ description = "Parallel Gzip Compression"
[lib]
name = "gzp"
path = "src/lib.rs"
bench = false

[features]
default = []
Expand All @@ -31,11 +32,11 @@ rust_backend = ["flate2/miniz_oxide"]
[dependencies]
bytes = "1.0.1"
flate2 = { version = "~1", default-features = false, optional = true }
futures = "0.3.16"
num_cpus = "1.13.0"
thiserror = "1.0.26"
tokio = { version = "1.9.0", features = ["sync", "rt-multi-thread"] }
snap = { version = "~1", optional = true }
flume = { version = "0.10.8", features = ["async"] }
rayon = "1.5.1"

[dev-dependencies]
proptest = "1.0.0"
Expand Down
21 changes: 3 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,25 +147,10 @@ fn main() {

## Benchmarks

All benchmarks were run on the file in `./bench-data/shakespeare.txt` catted together 100 times which creats a rough
All benchmarks were run on the file in `./bench-data/shakespeare.txt` catted together 100 times which creates a rough
550Mb file.

Note that there are far more comprehensive comparisons of the tradeoffs in differet compression algorithms / compression
levels elsewhere on the interent. This is meant to give rough understanding of the tradoffs involved.

| Name | Num Threads | Compression Level | Buffer Size | Time | File Size |
| --- | - | ----------------- | -----------| ---- | --------- |
| Gzip Only | NA | 3 | 128 Kb | 6.6s | 218 Mb |
| Gzip | 1 | 3 | 128 Kb | 2.4s | 223 Mb |
| Gzip | 4 | 3 | 128 Kb | 1.2s | 223 Mb |
| Gzip | 8 | 3 | 128 Kb | 0.8s | 223 Mb |
| Gzip | 16 | 3 | 128 Kb | 0.6s | 223 Mb |
| Gzip | 30 | 3 | 128 Kb | 0.6s | 223 Mb |
| Snap Only | NA | NA | 128 Kb | 1.6s | 333 Mb |
| Snap | 1 | NA | 128 Kb | 0.7s | 333 Mb |
| Snap | 4 | NA | 128 Kb | 0.5s | 333 Mb |
| Snap | 8 | NA | 128 Kb | 0.4s | 333 Mb |
| Snap | 16 | NA | 128 Kb | 0.4s | 333 Mb |
| Snap | 30 | NA | 128 Kb | 0.4s | 333 Mb |
The primary takeaway here is that you probably want to give `gzp` at least 4 threads. 2 threads breaks even with the
overhead of orchestrating the multi-threadedness, 4 gives a roughly 3x improvement.

![benchmarks](./violin.svg)
2 changes: 1 addition & 1 deletion benches/bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ fn criterion_benchmark(c: &mut Criterion) {
let buffersize = 64 * (1 << 10);
let compression_level = 3;
let mut group = c.benchmark_group("Compression");
for num_cpus in [1usize, 4, 8, 16, 30] {
for num_cpus in [2, 4, 8, 16, 30] {
group.bench_with_input(
BenchmarkId::new("Gzip", num_cpus),
&num_cpus,
Expand Down

0 comments on commit a664a7d

Please sign in to comment.