Add Zstandard compression option for wlog #11666

leizor · 2022-12-03T00:59:04Z

This PR experiments with using the Zstandard compression algorithm as an option for the WAL alongside the existing option of compressing using Snappy.

In order to compare, I've converted about 16GB of Snappy-compressed WAL into Zstandard-compressed WAL and non-compressed WAL:

[~/tmp/prometheus/data]$ du -h .
 32G	./wal-none
 16G	./wal-snappy
 10G	./wal-zstd
 58G	.

[~/tmp/prometheus/data]$ du .
66303808	./wal-none
33764616	./wal-snappy
21022240	./wal-zstd
121090664	.

Benchmarks w/ the new Zstandard compression option added in:

name                               time/op
WAL_LogBatched/compress=none-10      1.01µs ± 9%
WAL_LogBatched/compress=snappy-10     213ns ± 1%
WAL_LogBatched/compress=zstd-10       894ns ± 2%

name                               speed
WAL_LogBatched/compress=none-10    2.04GB/s ± 8%
WAL_LogBatched/compress=snappy-10  9.61GB/s ± 1%
WAL_LogBatched/compress=zstd-10    2.29GB/s ± 2%

In summary, it looks like zstd yielded a ~37.7% improvement in storage over snappy but is ~76.2% slower. I suspect throughput is more important than storage size for this application so I've left snappy as the default compression method, but in cases where the opposite trade-off is desired, this PR allows the compression method to be switched to zstd.

Signed-off-by: Justin Lei justin.lei@grafana.com

roidelapluie · 2022-12-08T10:16:12Z

I will let @codesome (and maybe @SuperQ ) judge, but the flag change would not be allowed by our stability guarantees.

leizor · 2022-12-12T17:49:19Z

Oop, thanks for pointing that out @roidelapluie! I think I'll leave it for now as I'm not sure it's clear-cut whether it's desirable to include zstd as an option even.

If we do decide we'd like to, I'll revisit and get that fixed up.

codesome · 2022-12-14T10:39:35Z

How does WAL replay compare with this compression?

codesome · 2022-12-14T10:42:43Z

/prombench main

I am curious to see how it affects write latency.

prombot · 2022-12-14T10:42:45Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-11666 and main

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart main

SuperQ · 2022-12-14T20:03:33Z

I agree, we need to keep the flags backwards compatible. But I think it's great to add bettter compression options.

The most important improvements to me are on-disk size, and startup speed reduction.

codesome · 2022-12-15T09:42:27Z

/prombench cancel

No noticeable difference in write latency or any spikes

prombot · 2022-12-15T09:42:30Z

Benchmark cancel is in progress.

leizor · 2022-12-15T16:24:02Z

Yay, cool to see that there's some interest in this!

As far as the flag change goes, I'm thinking I'll go back and revert the --compress flag to what it used to do, but add a new --compress-type flag that can be either snappy or zstd.

I'll also work on grabbing some benchmark numbers for WAL replay (is this also what plays into startup speed reduction?).

leizor · 2022-12-15T16:52:45Z

Oop, and just thought of this regarding the prombench thing - this branch still defaults to snappy so I'm not sure it would have showed much of a difference. I'll also plan on temporarily defaulting it to zstd and re-run prombench to see what it looks like.

SuperQ · 2022-12-15T20:29:31Z

Yup, WAL replay speed would be interesting to compare. We chose snappy originally due to the decompress speed.

SuperQ · 2022-12-15T20:31:07Z

It might also be interesting to try klauspost/compress for snappy.

codesome · 2022-12-16T06:51:22Z

Oop, and just thought of this regarding the prombench thing - this branch still defaults to snappy so I'm not sure it would have showed much of a difference. I'll also plan on temporarily defaulting it to zstd and re-run prombench to see what it looks like.

🤦 oh right. Yes, let's use zstd by default for now. We can revert it back later after prombench is done with it. You can hardcode it for the test.

leizor · 2022-12-19T22:36:46Z

/prombench main

prombot · 2022-12-19T22:36:47Z

@leizor is not a org member nor a collaborator and cannot execute benchmarks.

SuperQ · 2022-12-19T22:52:59Z

/prombench main

prombot · 2022-12-19T22:53:00Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-11666 and main

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart main

codesome · 2022-12-20T08:40:44Z

/prombench cancel

Things are looking fine. CPU usage is tiny bit higher, but nothing concerning. Although the BenchmarkLoadWAL is not promising. This is comparing snappy and zstd.
Edit: tested on Macbook with M1 pro chip.

benchmark                                                                                                             old ns/op      new ns/op      delta
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=0,mmappedChunkT=0-10          327833114      380143722      +15.96%
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=36,mmappedChunkT=0-10         324585052      385304667      +18.71%
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=72,mmappedChunkT=0-10         321583427      389826500      +21.22%
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=360,mmappedChunkT=0-10        369224250      435253056      +17.88%
BenchmarkLoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50,exemplarsPerSeries=0,mmappedChunkT=0-10          309922875      288768906      -6.83%
BenchmarkLoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50,exemplarsPerSeries=2,mmappedChunkT=0-10          331404073      320740906      -3.22%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=0,mmappedChunkT=0-10          144374446      199462917      +38.16%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=2,mmappedChunkT=0-10          146159190      202149097      +38.31%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=5,mmappedChunkT=0-10          148677530      201612483      +35.60%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=24,mmappedChunkT=0-10         174990403      240130500      +37.22%
BenchmarkLoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=0,mmappedChunkT=3800-10      1180189584     1700711375     +44.10%
BenchmarkLoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=2,mmappedChunkT=3800-10      1198851458     1712634292     +42.86%
BenchmarkLoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=5,mmappedChunkT=3800-10      1224151834     1754925833     +43.36%
BenchmarkLoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=24,mmappedChunkT=3800-10     1446964709     2051575375     +41.78%

prombot · 2022-12-20T08:40:46Z

Benchmark cancel is in progress.

SuperQ · 2022-12-20T09:23:40Z

Yes, ztsd is slower, more CPU heavy than snappy. Snappy was originally chosen to provide reasonable compression with minimal overhead.

One interesting option is adding to our snappy implementation with klauspost S2.

leizor · 2022-12-20T22:33:33Z

One interesting option is adding to our snappy implementation with klauspost S2.

I did throw together a benchmark for S2 after I was disappointed with how zstd performed. I might've done something wrong though as it turned out to be slower than snappy still. I'll add it back in so we can double-check though!

SuperQ · 2022-12-21T08:00:48Z

I'm not surprised that zstd is slower, and that's fine. It's still a good option to have for users that are willing to trade off CPU / speed for better compression.

I think this feature is worth having, but maybe we stick with snappy as the default for now until we get some more experience with zstd.

leizor · 2022-12-22T23:50:50Z

Alright, in that case, sounds good! I think I'll just rollback that commit setting zstd as default and resolve the merge conflicts then, and this PR can be considered ready for review.

I think that way we can address the other compression options in follow-up PRs?

codesome · 2022-12-23T08:01:41Z

Hmmm, I mostly hear complaints about how slow the WAL replay is, and not how huge it is. I am not sure if we want to add zstd. @SuperQ I might not be exposed to the relevant audience; do you hear many users wanting this and will be ok with this trade-off? I can post it in prometheus-developers mailing list and get feedback if needed.

SuperQ · 2022-12-23T12:17:46Z

Yes, I hear about WAL replay being slow. But I don't think decompression is really a major source of the slowdown.

For example, one source of WAL replay slowdown is happens when there is a problem where Prometheus gets into a crash loop (OOM, etc) and generates many (12+ hours) of WAL files. But Prometheus tries to replay all of these in a single go, never stopping to compact them out. So the problem gets worse, because it's trying to play everything back into memory no matter how old. I'm pretty sure this is still a problem.

codesome · 2022-12-27T13:02:57Z

True, I understand that we should be able to do something about very long WAL, but we are mixing different things here. Zstd won't fix that (the memory problem). Replay of 2-3h WAL is still not the best it can be. Do we really want to try out a much slower WAL replay with zstd, which will make the 10-12h WAL problem worse by delaying the OOM that is eventually gonna happen?

roidelapluie · 2023-07-04T11:54:18Z

cc @SuperQ @codesome @jesusvazquez can we come to a conclusion here whether we want this?

SuperQ · 2023-07-04T12:03:08Z

I would like to see support for zstd, higher compression ratios with only a bit more overhead.

Signed-off-by: Justin Lei <justin.lei@grafana.com>

codesome · 2023-07-05T13:32:28Z

I would like not like to block this one, so since @SuperQ feels strongly about this, we can try it out! It is anyway behind a flag.

jesusvazquez

I concur with @codesome, we are fine having this under a flag. I'll have a thorough review tomorrow at the test side of this.

One thing I have noticed is that the lib we're importing for the ztsd compression https://github.com/klauspost/compress also has a snappy implementation claiming to be more performant than golang/snappy. I think this is something to investigate in a different issue/PR 👍

jesusvazquez

This took me a bit longer than expected but I have reviewed the changes and I think the test coverage is good enough. Specially considering this is only enabled optionally and snappy remains the default compression algorithm.

One note of clarification is that there are no plans to change the default algorithm, it will continue to be snappy for now.

Nice work @leizor and everyone else that came by 💪

SuperQ · 2023-07-11T13:23:21Z

I would love to see benchmark comparisons of klauspost/compress for snappy next.

jesusvazquez · 2023-07-11T13:29:47Z

Agreed!

leizor force-pushed the leizor/prometheus/prometheus/issues/11223 branch 5 times, most recently from 5aa4204 to e05ce6b Compare December 5, 2022 17:24

leizor marked this pull request as ready for review December 6, 2022 00:32

leizor requested review from dgl and codesome as code owners December 6, 2022 00:32

prombot added the prombench label Dec 14, 2022

leizor force-pushed the leizor/prometheus/prometheus/issues/11223 branch 4 times, most recently from b650fad to bddd6b8 Compare December 19, 2022 21:59

leizor force-pushed the leizor/prometheus/prometheus/issues/11223 branch from bddd6b8 to 4920330 Compare December 22, 2022 23:51

leizor added 2 commits July 4, 2023 12:08

Add Zstandard compression option for wlog

62e1078

Signed-off-by: Justin Lei <justin.lei@grafana.com>

Bring back original compression flag and add new compression type flag

85492c1

Signed-off-by: Justin Lei <justin.lei@grafana.com>

leizor force-pushed the leizor/prometheus/prometheus/issues/11223 branch from 4920330 to 85492c1 Compare July 4, 2023 19:15

leizor requested a review from jesusvazquez as a code owner July 4, 2023 19:15

jesusvazquez reviewed Jul 5, 2023

View reviewed changes

jesusvazquez approved these changes Jul 11, 2023

View reviewed changes

jesusvazquez merged commit 32d8728 into prometheus:main Jul 11, 2023
23 checks passed

leizor deleted the leizor/prometheus/prometheus/issues/11223 branch July 11, 2023 15:38

zenador mentioned this pull request Jul 19, 2023

Update mimir-prometheus vendor to 2a075ced6239 grafana/mimir#5549

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Zstandard compression option for wlog #11666

Add Zstandard compression option for wlog #11666

leizor commented Dec 3, 2022 •

edited

roidelapluie commented Dec 8, 2022

leizor commented Dec 12, 2022

codesome commented Dec 14, 2022

codesome commented Dec 14, 2022

prombot commented Dec 14, 2022

SuperQ commented Dec 14, 2022

codesome commented Dec 15, 2022

prombot commented Dec 15, 2022

leizor commented Dec 15, 2022

leizor commented Dec 15, 2022

SuperQ commented Dec 15, 2022

SuperQ commented Dec 15, 2022

codesome commented Dec 16, 2022

leizor commented Dec 19, 2022

prombot commented Dec 19, 2022

SuperQ commented Dec 19, 2022

prombot commented Dec 19, 2022

codesome commented Dec 20, 2022 •

edited

prombot commented Dec 20, 2022

SuperQ commented Dec 20, 2022

leizor commented Dec 20, 2022

SuperQ commented Dec 21, 2022

leizor commented Dec 22, 2022

codesome commented Dec 23, 2022

SuperQ commented Dec 23, 2022

codesome commented Dec 27, 2022

roidelapluie commented Jul 4, 2023

SuperQ commented Jul 4, 2023

codesome commented Jul 5, 2023

jesusvazquez left a comment

jesusvazquez left a comment

SuperQ commented Jul 11, 2023 •

edited

jesusvazquez commented Jul 11, 2023

Add Zstandard compression option for wlog #11666

Add Zstandard compression option for wlog #11666

Conversation

leizor commented Dec 3, 2022 • edited

roidelapluie commented Dec 8, 2022

leizor commented Dec 12, 2022

codesome commented Dec 14, 2022

codesome commented Dec 14, 2022

prombot commented Dec 14, 2022

SuperQ commented Dec 14, 2022

codesome commented Dec 15, 2022

prombot commented Dec 15, 2022

leizor commented Dec 15, 2022

leizor commented Dec 15, 2022

SuperQ commented Dec 15, 2022

SuperQ commented Dec 15, 2022

codesome commented Dec 16, 2022

leizor commented Dec 19, 2022

prombot commented Dec 19, 2022

SuperQ commented Dec 19, 2022

prombot commented Dec 19, 2022

codesome commented Dec 20, 2022 • edited

prombot commented Dec 20, 2022

SuperQ commented Dec 20, 2022

leizor commented Dec 20, 2022

SuperQ commented Dec 21, 2022

leizor commented Dec 22, 2022

codesome commented Dec 23, 2022

SuperQ commented Dec 23, 2022

codesome commented Dec 27, 2022

roidelapluie commented Jul 4, 2023

SuperQ commented Jul 4, 2023

codesome commented Jul 5, 2023

jesusvazquez left a comment

Choose a reason for hiding this comment

jesusvazquez left a comment

Choose a reason for hiding this comment

SuperQ commented Jul 11, 2023 • edited

jesusvazquez commented Jul 11, 2023

leizor commented Dec 3, 2022 •

edited

codesome commented Dec 20, 2022 •

edited

SuperQ commented Jul 11, 2023 •

edited