Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Zstandard compression option for wlog #11666

Merged

Conversation

leizor
Copy link
Contributor

@leizor leizor commented Dec 3, 2022

#11223

This PR experiments with using the Zstandard compression algorithm as an option for the WAL alongside the existing option of compressing using Snappy.

In order to compare, I've converted about 16GB of Snappy-compressed WAL into Zstandard-compressed WAL and non-compressed WAL:

[~/tmp/prometheus/data]$ du -h .
 32G	./wal-none
 16G	./wal-snappy
 10G	./wal-zstd
 58G	.

[~/tmp/prometheus/data]$ du .
66303808	./wal-none
33764616	./wal-snappy
21022240	./wal-zstd
121090664	.

Benchmarks w/ the new Zstandard compression option added in:

name                               time/op
WAL_LogBatched/compress=none-10      1.01µs ± 9%
WAL_LogBatched/compress=snappy-10     213ns ± 1%
WAL_LogBatched/compress=zstd-10       894ns ± 2%

name                               speed
WAL_LogBatched/compress=none-10    2.04GB/s ± 8%
WAL_LogBatched/compress=snappy-10  9.61GB/s ± 1%
WAL_LogBatched/compress=zstd-10    2.29GB/s ± 2%

In summary, it looks like zstd yielded a ~37.7% improvement in storage over snappy but is ~76.2% slower. I suspect throughput is more important than storage size for this application so I've left snappy as the default compression method, but in cases where the opposite trade-off is desired, this PR allows the compression method to be switched to zstd.

Signed-off-by: Justin Lei justin.lei@grafana.com

@leizor leizor force-pushed the leizor/prometheus/prometheus/issues/11223 branch 5 times, most recently from 5aa4204 to e05ce6b Compare December 5, 2022 17:24
@leizor leizor marked this pull request as ready for review December 6, 2022 00:32
@roidelapluie
Copy link
Member

I will let @codesome (and maybe @SuperQ ) judge, but the flag change would not be allowed by our stability guarantees.

@leizor
Copy link
Contributor Author

leizor commented Dec 12, 2022

Oop, thanks for pointing that out @roidelapluie! I think I'll leave it for now as I'm not sure it's clear-cut whether it's desirable to include zstd as an option even.

If we do decide we'd like to, I'll revisit and get that fixed up.

@codesome
Copy link
Member

How does WAL replay compare with this compression?

@codesome
Copy link
Member

/prombench main

I am curious to see how it affects write latency.

@prombot
Copy link
Contributor

prombot commented Dec 14, 2022

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-11666 and main

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart main

@SuperQ
Copy link
Member

SuperQ commented Dec 14, 2022

I agree, we need to keep the flags backwards compatible. But I think it's great to add bettter compression options.

The most important improvements to me are on-disk size, and startup speed reduction.

@codesome
Copy link
Member

/prombench cancel

No noticeable difference in write latency or any spikes

@prombot
Copy link
Contributor

prombot commented Dec 15, 2022

Benchmark cancel is in progress.

@leizor
Copy link
Contributor Author

leizor commented Dec 15, 2022

Yay, cool to see that there's some interest in this!

As far as the flag change goes, I'm thinking I'll go back and revert the --compress flag to what it used to do, but add a new --compress-type flag that can be either snappy or zstd.

I'll also work on grabbing some benchmark numbers for WAL replay (is this also what plays into startup speed reduction?).

@leizor
Copy link
Contributor Author

leizor commented Dec 15, 2022

Oop, and just thought of this regarding the prombench thing - this branch still defaults to snappy so I'm not sure it would have showed much of a difference. I'll also plan on temporarily defaulting it to zstd and re-run prombench to see what it looks like.

@SuperQ
Copy link
Member

SuperQ commented Dec 15, 2022

Yup, WAL replay speed would be interesting to compare. We chose snappy originally due to the decompress speed.

@SuperQ
Copy link
Member

SuperQ commented Dec 15, 2022

It might also be interesting to try klauspost/compress for snappy.

@codesome
Copy link
Member

Oop, and just thought of this regarding the prombench thing - this branch still defaults to snappy so I'm not sure it would have showed much of a difference. I'll also plan on temporarily defaulting it to zstd and re-run prombench to see what it looks like.

🤦 oh right. Yes, let's use zstd by default for now. We can revert it back later after prombench is done with it. You can hardcode it for the test.

@leizor leizor force-pushed the leizor/prometheus/prometheus/issues/11223 branch 4 times, most recently from b650fad to bddd6b8 Compare December 19, 2022 21:59
@leizor
Copy link
Contributor Author

leizor commented Dec 19, 2022

/prombench main

@prombot
Copy link
Contributor

prombot commented Dec 19, 2022

@leizor is not a org member nor a collaborator and cannot execute benchmarks.

@SuperQ
Copy link
Member

SuperQ commented Dec 19, 2022

/prombench main

@prombot
Copy link
Contributor

prombot commented Dec 19, 2022

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-11666 and main

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart main

@codesome
Copy link
Member

codesome commented Dec 20, 2022

/prombench cancel

Things are looking fine. CPU usage is tiny bit higher, but nothing concerning. Although the BenchmarkLoadWAL is not promising. This is comparing snappy and zstd.
Edit: tested on Macbook with M1 pro chip.

benchmark                                                                                                             old ns/op      new ns/op      delta
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=0,mmappedChunkT=0-10          327833114      380143722      +15.96%
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=36,mmappedChunkT=0-10         324585052      385304667      +18.71%
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=72,mmappedChunkT=0-10         321583427      389826500      +21.22%
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=360,mmappedChunkT=0-10        369224250      435253056      +17.88%
BenchmarkLoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50,exemplarsPerSeries=0,mmappedChunkT=0-10          309922875      288768906      -6.83%
BenchmarkLoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50,exemplarsPerSeries=2,mmappedChunkT=0-10          331404073      320740906      -3.22%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=0,mmappedChunkT=0-10          144374446      199462917      +38.16%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=2,mmappedChunkT=0-10          146159190      202149097      +38.31%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=5,mmappedChunkT=0-10          148677530      201612483      +35.60%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=24,mmappedChunkT=0-10         174990403      240130500      +37.22%
BenchmarkLoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=0,mmappedChunkT=3800-10      1180189584     1700711375     +44.10%
BenchmarkLoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=2,mmappedChunkT=3800-10      1198851458     1712634292     +42.86%
BenchmarkLoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=5,mmappedChunkT=3800-10      1224151834     1754925833     +43.36%
BenchmarkLoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=24,mmappedChunkT=3800-10     1446964709     2051575375     +41.78%

@prombot
Copy link
Contributor

prombot commented Dec 20, 2022

Benchmark cancel is in progress.

@SuperQ
Copy link
Member

SuperQ commented Dec 20, 2022

Yes, ztsd is slower, more CPU heavy than snappy. Snappy was originally chosen to provide reasonable compression with minimal overhead.

One interesting option is adding to our snappy implementation with klauspost S2.

@leizor
Copy link
Contributor Author

leizor commented Dec 20, 2022

One interesting option is adding to our snappy implementation with klauspost S2.

I did throw together a benchmark for S2 after I was disappointed with how zstd performed. I might've done something wrong though as it turned out to be slower than snappy still. I'll add it back in so we can double-check though!

@SuperQ
Copy link
Member

SuperQ commented Dec 21, 2022

I'm not surprised that zstd is slower, and that's fine. It's still a good option to have for users that are willing to trade off CPU / speed for better compression.

I think this feature is worth having, but maybe we stick with snappy as the default for now until we get some more experience with zstd.

@leizor
Copy link
Contributor Author

leizor commented Dec 22, 2022

Alright, in that case, sounds good! I think I'll just rollback that commit setting zstd as default and resolve the merge conflicts then, and this PR can be considered ready for review.

I think that way we can address the other compression options in follow-up PRs?

@leizor leizor force-pushed the leizor/prometheus/prometheus/issues/11223 branch from bddd6b8 to 4920330 Compare December 22, 2022 23:51
@codesome
Copy link
Member

Hmmm, I mostly hear complaints about how slow the WAL replay is, and not how huge it is. I am not sure if we want to add zstd. @SuperQ I might not be exposed to the relevant audience; do you hear many users wanting this and will be ok with this trade-off? I can post it in prometheus-developers mailing list and get feedback if needed.

@SuperQ
Copy link
Member

SuperQ commented Dec 23, 2022

Yes, I hear about WAL replay being slow. But I don't think decompression is really a major source of the slowdown.

For example, one source of WAL replay slowdown is happens when there is a problem where Prometheus gets into a crash loop (OOM, etc) and generates many (12+ hours) of WAL files. But Prometheus tries to replay all of these in a single go, never stopping to compact them out. So the problem gets worse, because it's trying to play everything back into memory no matter how old. I'm pretty sure this is still a problem.

@codesome
Copy link
Member

True, I understand that we should be able to do something about very long WAL, but we are mixing different things here. Zstd won't fix that (the memory problem). Replay of 2-3h WAL is still not the best it can be. Do we really want to try out a much slower WAL replay with zstd, which will make the 10-12h WAL problem worse by delaying the OOM that is eventually gonna happen?

@roidelapluie
Copy link
Member

cc @SuperQ @codesome @jesusvazquez can we come to a conclusion here whether we want this?

@SuperQ
Copy link
Member

SuperQ commented Jul 4, 2023

I would like to see support for zstd, higher compression ratios with only a bit more overhead.

Signed-off-by: Justin Lei <justin.lei@grafana.com>
Signed-off-by: Justin Lei <justin.lei@grafana.com>
@leizor leizor force-pushed the leizor/prometheus/prometheus/issues/11223 branch from 4920330 to 85492c1 Compare July 4, 2023 19:15
@leizor leizor requested a review from jesusvazquez as a code owner July 4, 2023 19:15
@codesome
Copy link
Member

codesome commented Jul 5, 2023

I would like not like to block this one, so since @SuperQ feels strongly about this, we can try it out! It is anyway behind a flag.

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @codesome, we are fine having this under a flag. I'll have a thorough review tomorrow at the test side of this.

One thing I have noticed is that the lib we're importing for the ztsd compression https://github.com/klauspost/compress also has a snappy implementation claiming to be more performant than golang/snappy. I think this is something to investigate in a different issue/PR 👍

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took me a bit longer than expected but I have reviewed the changes and I think the test coverage is good enough. Specially considering this is only enabled optionally and snappy remains the default compression algorithm.

One note of clarification is that there are no plans to change the default algorithm, it will continue to be snappy for now.

Nice work @leizor and everyone else that came by 💪

@jesusvazquez jesusvazquez merged commit 32d8728 into prometheus:main Jul 11, 2023
23 checks passed
@SuperQ
Copy link
Member

SuperQ commented Jul 11, 2023

I would love to see benchmark comparisons of klauspost/compress for snappy next.

@jesusvazquez
Copy link
Member

Agreed!

@leizor leizor deleted the leizor/prometheus/prometheus/issues/11223 branch July 11, 2023 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants