Excessively high memory usage when using client-side zstd compression in confighttp #8216

swiatekm-sumo · 2023-08-10T11:29:15Z

Describe the bug
We've added zstd support to the server side of confighttp in #7927. I rolled this out for OTLP traffic between an agent and a gateway in a K8s environment and saw very significant increase in memory consumption on the client side.

Steps to reproduce
I have some memory profiles saved of this, and I'm planning to create a self-contained reproduction, ideally with just synthetic benchmarks.

What did you expect to see?
Memory consumption in the ballpark of what other compression methods use.

What did you see instead?
Memory consumption for otlphttp exporter using zstd more than 10x the amount for gzip.

What version did you use?
Version: 0.82.0

What config did you use?
The relevant part:

exporters:
  otlphttp:
    endpoint: ...
    compression: zstd
processors:
   batch:
    send_batch_max_size: 2000
    send_batch_size: 1000
    timeout: 1s

Environment
Kubernetes 1.24, EKS to be precise.

Additional context
I'm reporting this as is so it doesn't get lost and to consolidate reports in case other users experience this. Will update with more data once I'm able to.

swiatekm-sumo · 2023-09-26T15:39:43Z

I managed to reproduce this in a test K8s cluster. I've also written some benchmarks which demonstrate the problem, although this is somewhat challenging for reasons which I'll elaborate upon.

Presentation

The test cluster had a synthetic workload generating 10k log lines per second running. The resource consumption as reported by kubectl top was, respectively:

With gzip compression

otelcol-logs-collector-5mdw2     593m         79Mi

With zstd compression

otelcol-logs-collector-4jc6b     526m         469Mi

Nothing about the code for confighttp indicates why there would be such a large difference, and why the zstd encoder would allocate so much memory.

Root cause

I believe the root cause is a combination of the zstd encoder allocating a fair amount of memory by default, and our pooling mechanism for encoder just not working as expected. We put encoders in a sync.Pool, but in most practical circumstances, we don't make requests frequently enough to keep the pool hot. As a result, we create a new encoder for each request, and the zstd encoder handles this particularly poorly.

I've sketched out and tested a solution using a different pooling mechanism here: main...swiatekm-sumo:opentelemetry-collector:fix/zstd-encoder-pooling. With this change, the memory usage is reasonable again.

atoulme · 2023-12-19T19:25:28Z

Would you like to submit a PR to fix the issue?

swiatekm-sumo · 2024-01-11T11:34:53Z

@atoulme I'll try when I get the time. I still don't understand why the problem is as severe as it is. My fix is effective, so I must be roughly correct about the root cause, but I'd like to try to understand it better before submitting.

rnishtala-sumo · 2024-03-06T21:27:19Z

@swiatekm-sumo @atoulme here are some of my findings on this. I deployed the otel collector on a k8s cluster today with atleast 50 nginx pods generating a new log every second. The difference seen in the memory usage with both the compression types is below:

gzip: sumo-sumologic-otelcol-logs-collector-vqp5z 124m 74Mi
zstd: sumo-sumologic-otelcol-logs-collector-4r92s 138m 106Mi

The difference in memory usage wasn't 10x as seen previously. I used 0.94.0 for testing. I let the collector run for atleast an hour.

rnishtala-sumo · 2024-03-07T18:15:09Z

@swiatekm-sumo @atoulme

Here are more findings and results from tests

v0.92.0
zstd: sumo-sumologic-otelcol-logs-collector-2lk4k 245m 250Mi
gzip: sumo-sumologic-otelcol-logs-collector-x2wmp 291m 84Mi

v0.94.0
gzip: sumo-sumologic-otelcol-logs-collector-djwmz 199m 73Mi
zstd: sumo-sumologic-otelcol-logs-collector-v8pjr 198m 118Mi

There seems to be some improvement in 0.94.0 which used a later version (v0.17.5) of the compress library that has some zstd changes.
https://github.com/klauspost/compress/releases/tag/v1.17.5

Apart from the above, there was a known memory leak with using zstd with a sync.Pool that was addressed in v0.15.0 by adding the option to disable concurrency which does not allow spawning new goroutines as explained here

zstd: add no-goroutine option to encoding klauspost/compress#264

A possible change we could make here is to simply use zstd.WithDecoderConcurrency(1) as mentioned in this PR, which prevents goroutine leaks
segmentio/kafka-go#889

atoulme · 2024-03-07T18:23:32Z

Thank you for this inquiry and providing data, I appreciate it. Do we have appropriate benchmarks for zstd usage that we could use to test the simple change you mention?

rnishtala-sumo · 2024-03-07T20:48:56Z

I can work on the benchmarks to test the zstd compression with and without concurrency enabled. I do want to emphasize that the zstd memory usage in v1.17.5 of the compress package seems to be lesser than in v0.17.4 from running tests on a k8s cluster.

swiatekm-sumo · 2024-03-08T09:09:40Z

Ideally we'd have a benchmark showing the difference, though from trying to create one myself, this may not be so easy to do. The behaviour is timing-sensitive due to the use of sync.Pool.

rnishtala-sumo · 2024-03-12T21:50:24Z

Created the following draft PR for this, to show the difference in memory allocation with concurrency disabled - #9749

**Description:** zstd benchmark tests added The goal of this PR is to disable concurrency in zstd compression to reduce its memory footprint and avoid a known issue with goroutine leaks. Please see - klauspost/compress#264 **Link to tracking Issue:** #8216 **Testing:** Benchmark test results below ``` BenchmarkCompression/zstdWithConcurrency/compress-10 21392 55855 ns/op 187732.88 MB/s 2329164 B/op 28 allocs/op BenchmarkCompression/zstdNoConcurrency/compress-10 29526 39902 ns/op 262787.42 MB/s 1758988 B/op 15 allocs/op input => 10.00 MB ```

swiatekm-sumo · 2024-04-18T19:30:17Z

Resolved in #9749

swiatekm-sumo added the bug Something isn't working label Aug 10, 2023

swiatekm-sumo mentioned this issue Aug 10, 2023

Revert "chore: use zstd compression for internal otlp data" SumoLogic/sumologic-kubernetes-collection#3197

Merged

4 tasks

rnishtala-sumo mentioned this issue Mar 12, 2024

Disable concurrency in zstd and add Benchmark tests for it #9749

Merged

swiatekm-sumo closed this as completed Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessively high memory usage when using client-side zstd compression in confighttp #8216

Excessively high memory usage when using client-side zstd compression in confighttp #8216

swiatekm-sumo commented Aug 10, 2023

swiatekm-sumo commented Sep 26, 2023 •

edited

atoulme commented Dec 19, 2023

swiatekm-sumo commented Jan 11, 2024

rnishtala-sumo commented Mar 6, 2024

rnishtala-sumo commented Mar 7, 2024 •

edited

atoulme commented Mar 7, 2024

rnishtala-sumo commented Mar 7, 2024

swiatekm-sumo commented Mar 8, 2024

rnishtala-sumo commented Mar 12, 2024

swiatekm-sumo commented Apr 18, 2024

Excessively high memory usage when using client-side zstd compression in confighttp #8216

Excessively high memory usage when using client-side zstd compression in confighttp #8216

Comments

swiatekm-sumo commented Aug 10, 2023

swiatekm-sumo commented Sep 26, 2023 • edited

Presentation

With gzip compression

With zstd compression

Root cause

atoulme commented Dec 19, 2023

swiatekm-sumo commented Jan 11, 2024

rnishtala-sumo commented Mar 6, 2024

rnishtala-sumo commented Mar 7, 2024 • edited

atoulme commented Mar 7, 2024

rnishtala-sumo commented Mar 7, 2024

swiatekm-sumo commented Mar 8, 2024

rnishtala-sumo commented Mar 12, 2024

swiatekm-sumo commented Apr 18, 2024

swiatekm-sumo commented Sep 26, 2023 •

edited

rnishtala-sumo commented Mar 7, 2024 •

edited