Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessively high memory usage when using client-side zstd compression in confighttp #8216

Closed
swiatekm-sumo opened this issue Aug 10, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@swiatekm-sumo
Copy link
Contributor

Describe the bug
We've added zstd support to the server side of confighttp in #7927. I rolled this out for OTLP traffic between an agent and a gateway in a K8s environment and saw very significant increase in memory consumption on the client side.

Steps to reproduce
I have some memory profiles saved of this, and I'm planning to create a self-contained reproduction, ideally with just synthetic benchmarks.

What did you expect to see?
Memory consumption in the ballpark of what other compression methods use.

What did you see instead?
Memory consumption for otlphttp exporter using zstd more than 10x the amount for gzip.

What version did you use?
Version: 0.82.0

What config did you use?
The relevant part:

exporters:
  otlphttp:
    endpoint: ...
    compression: zstd
processors:
   batch:
    send_batch_max_size: 2000
    send_batch_size: 1000
    timeout: 1s

Environment
Kubernetes 1.24, EKS to be precise.

Additional context
I'm reporting this as is so it doesn't get lost and to consolidate reports in case other users experience this. Will update with more data once I'm able to.

@swiatekm-sumo
Copy link
Contributor Author

swiatekm-sumo commented Sep 26, 2023

I managed to reproduce this in a test K8s cluster. I've also written some benchmarks which demonstrate the problem, although this is somewhat challenging for reasons which I'll elaborate upon.

Presentation

The test cluster had a synthetic workload generating 10k log lines per second running. The resource consumption as reported by kubectl top was, respectively:

With gzip compression

otelcol-logs-collector-5mdw2     593m         79Mi            

With zstd compression

otelcol-logs-collector-4jc6b     526m         469Mi           

zstd-profile-k8s-logs

Nothing about the code for confighttp indicates why there would be such a large difference, and why the zstd encoder would allocate so much memory.

Root cause

I believe the root cause is a combination of the zstd encoder allocating a fair amount of memory by default, and our pooling mechanism for encoder just not working as expected. We put encoders in a sync.Pool, but in most practical circumstances, we don't make requests frequently enough to keep the pool hot. As a result, we create a new encoder for each request, and the zstd encoder handles this particularly poorly.

I've sketched out and tested a solution using a different pooling mechanism here: main...swiatekm-sumo:opentelemetry-collector:fix/zstd-encoder-pooling. With this change, the memory usage is reasonable again.

@atoulme
Copy link
Contributor

atoulme commented Dec 19, 2023

Would you like to submit a PR to fix the issue?

@swiatekm-sumo
Copy link
Contributor Author

@atoulme I'll try when I get the time. I still don't understand why the problem is as severe as it is. My fix is effective, so I must be roughly correct about the root cause, but I'd like to try to understand it better before submitting.

@rnishtala-sumo
Copy link
Contributor

@swiatekm-sumo @atoulme here are some of my findings on this. I deployed the otel collector on a k8s cluster today with atleast 50 nginx pods generating a new log every second. The difference seen in the memory usage with both the compression types is below:

  • gzip: sumo-sumologic-otelcol-logs-collector-vqp5z 124m 74Mi
  • zstd: sumo-sumologic-otelcol-logs-collector-4r92s 138m 106Mi

The difference in memory usage wasn't 10x as seen previously. I used 0.94.0 for testing. I let the collector run for atleast an hour.

@rnishtala-sumo
Copy link
Contributor

rnishtala-sumo commented Mar 7, 2024

@swiatekm-sumo @atoulme

Here are more findings and results from tests

v0.92.0
zstd: sumo-sumologic-otelcol-logs-collector-2lk4k 245m 250Mi
gzip: sumo-sumologic-otelcol-logs-collector-x2wmp 291m 84Mi

v0.94.0
gzip: sumo-sumologic-otelcol-logs-collector-djwmz 199m 73Mi
zstd: sumo-sumologic-otelcol-logs-collector-v8pjr 198m 118Mi

There seems to be some improvement in 0.94.0 which used a later version (v0.17.5) of the compress library that has some zstd changes.
https://github.com/klauspost/compress/releases/tag/v1.17.5

Apart from the above, there was a known memory leak with using zstd with a sync.Pool that was addressed in v0.15.0 by adding the option to disable concurrency which does not allow spawning new goroutines as explained here

A possible change we could make here is to simply use zstd.WithDecoderConcurrency(1) as mentioned in this PR, which prevents goroutine leaks
segmentio/kafka-go#889

@atoulme
Copy link
Contributor

atoulme commented Mar 7, 2024

Thank you for this inquiry and providing data, I appreciate it. Do we have appropriate benchmarks for zstd usage that we could use to test the simple change you mention?

@rnishtala-sumo
Copy link
Contributor

I can work on the benchmarks to test the zstd compression with and without concurrency enabled. I do want to emphasize that the zstd memory usage in v1.17.5 of the compress package seems to be lesser than in v0.17.4 from running tests on a k8s cluster.

@swiatekm-sumo
Copy link
Contributor Author

Ideally we'd have a benchmark showing the difference, though from trying to create one myself, this may not be so easy to do. The behaviour is timing-sensitive due to the use of sync.Pool.

@rnishtala-sumo
Copy link
Contributor

Created the following draft PR for this, to show the difference in memory allocation with concurrency disabled - #9749

dmitryax pushed a commit that referenced this issue Apr 12, 2024
**Description:** zstd benchmark tests added
The goal of this PR is to disable concurrency in zstd compression to
reduce its memory footprint and avoid a known issue with goroutine
leaks. Please see - klauspost/compress#264

**Link to tracking Issue:**
#8216

**Testing:** Benchmark test results below
```
BenchmarkCompression/zstdWithConcurrency/compress-10         	   21392	     55855 ns/op	187732.88 MB/s	 2329164 B/op	      28 allocs/op
BenchmarkCompression/zstdNoConcurrency/compress-10           	   29526	     39902 ns/op	262787.42 MB/s	 1758988 B/op	      15 allocs/op
input => 10.00 MB
```
@swiatekm-sumo
Copy link
Contributor Author

Resolved in #9749

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants