Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve memory usage by using metadata informer #5424

Merged

Conversation

sthaha
Copy link
Contributor

@sthaha sthaha commented Mar 20, 2023

This patch uses metadatainformer package to watch secrets and configmaps so that all content need not be loaded into memory thus reducing the amount of memory consumed.

Description

Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
If it fixes a bug or resolves a feature request, be sure to link to that issue.

Type of change

What type of changes does your code introduce to the Prometheus operator? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Changelog entry

Prometheus Operator now consumes less memory that before when there are lots of secrets and config-maps
present in the cluster.

- Optimised to consume less memory when there are lots of secrets or configmaps in the cluster. 

@sthaha sthaha force-pushed the fix-5410-metadata-informers branch 2 times, most recently from 71f8be6 to 6c31947 Compare March 20, 2023 07:17
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Do you have numbers comparing memory usage before/after the PR?
If this is useful, the same change should be done for the alertmanager and thanosruler controllers (but can be follow-up PRs).

}

// NewMetadataInformerFactory creates factories for kube resources without
// loading only the metadata information of the resource for the given allowed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't read correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! :) sometimes, find and replace just doesn't cut it. 😂

Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Awsome work, I wasn't aware that such informers existed 🤯

@sthaha
Copy link
Contributor Author

sthaha commented Mar 20, 2023

lgtm! Awsome work, I wasn't aware that such informers existed 🤯

Thanks to @simonpasquier for pointing it out! 🙇

@sthaha
Copy link
Contributor Author

sthaha commented Mar 20, 2023

lgtm. Do you have numbers comparing memory usage before/after the PR?

@simonpasquier yes, I have been updating #5410 with new numbers. I will update here as well. Thanks a lot for the feedback

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Waiting for the latest numbers before merging :)
As I wrote earlier, we probably need the same change for Alertmanager & ThanosRuler but it can be a follow-up PR.

@sthaha
Copy link
Contributor Author

sthaha commented Mar 21, 2023

Here is the difference between 2 versions after adding 100 secrets; tldr is ...

Current Operator :  Showing nodes accounting for 302.26MB, 97.87% of 308.83MB total
New Operator     :  Showing nodes accounting for 4860.48kB, 100% of 4860.48kB total
create-secrets.sh
#!/usr/bin/env bash

set -eu -o pipefail

main() {
	local ns=$1
	shift

	local num=$1
	shift

	dd if=/dev/urandom bs=756439 count=1 | base64 >tmp/random.txt
	kubectl create ns "$ns" || true
	for i in $(seq $num); do
		{
			kubectl delete -n "$ns" secret "foo-${i}" || true
			kubectl create -n "$ns" secret generic "foo-${i}" --from-file=key=tmp/random.txt
		} &
		while [[ $(jobs -r | wc -l) -gt 30 ]]; do wait -n; done
	done
	wait
	rm -f tmp/random.txt

}
main "$@"
./create-secrets.sh foobar 1000

New Operator using MetadataInformer

File: operator
Type: inuse_space
Time: Mar 22, 2023 at 9:44am (AEST)
Showing nodes accounting for 5695.21kB, 100% of 5695.21kB total
      flat  flat%   sum%        cum   cum%
 1057.98kB 18.58% 18.58%  1057.98kB 18.58%  regexp/syntax.(*compiler).inst (inline)
  528.17kB  9.27% 27.85%  1556.17kB 27.32%  runtime/pprof.writeHeapInternal
  520.04kB  9.13% 36.98%   520.04kB  9.13%  k8s.io/utils/buffer.NewRingGrowing (inline)
     514kB  9.03% 46.01%     1028kB 18.05%  runtime/pprof.(*profileBuilder).emitLocation
     514kB  9.03% 55.03%      514kB  9.03%  runtime/pprof.(*profileBuilder).stringIndex (inline)
  512.56kB  9.00% 64.03%   512.56kB  9.00%  github.com/prometheus/client_golang/prometheus.(*metricMap).getOrCreateMetricWithLabelValues
  512.19kB  8.99% 73.03%   512.19kB  8.99%  golang.org/x/net/http2.(*ClientConn).RoundTrip
  512.14kB  8.99% 82.02%   512.14kB  8.99%  k8s.io/api/apps/v1.init
  512.07kB  8.99% 91.01%   512.07kB  8.99%  github.com/aws/aws-sdk-go/aws/endpoints.init
  512.05kB  8.99%   100%   512.05kB  8.99%  regexp/syntax.(*parser).newRegexp (inline)
         0     0%   100%   512.05kB  8.99%  github.com/asaskevich/govalidator.init
         0     0%   100%   512.56kB  9.00%  github.com/prometheus-operator/prometheus-operator/pkg/client/informers/externalversions/monitoring/v1.NewFilteredThanosRulerInformer.func1
         0     0%   100%   512.56kB  9.00%  github.com/prometheus-operator/prometheus-operator/pkg/client/versioned/typed/monitoring/v1.(*thanosRulers).List
         0     0%   100%   512.56kB  9.00%  github.com/prometheus-operator/prometheus-operator/pkg/k8sutil.(*clientGoHTTPMetricAdapter).Observe
         0     0%   100%   512.19kB  8.99%  github.com/prometheus-operator/prometheus-operator/pkg/operator.(*instrumentedListerWatcher).Watch
         0     0%   100%   520.04kB  9.13%  github.com/prometheus-operator/prometheus-operator/pkg/prometheus/server.(*Operator).Run
         0     0%   100%   520.04kB  9.13%  github.com/prometheus-operator/prometheus-operator/pkg/prometheus/server.(*Operator).addHandlers
         0     0%   100%   512.56kB  9.00%  github.com/prometheus/client_golang/prometheus.(*MetricVec).GetMetricWithLabelValues
         0     0%   100%   512.56kB  9.00%  github.com/prometheus/client_golang/prometheus.(*SummaryVec).GetMetricWithLabelValues

The current operator

File: operator
Type: inuse_space
Time: Mar 22, 2023 at 9:45am (AEST)
Showing nodes accounting for 302.26MB, 97.87% of 308.83MB total
Dropped 67 nodes (cum <= 1.54MB)
      flat  flat%   sum%        cum   cum%
  293.73MB 95.11% 95.11%   294.75MB 95.44%  io.ReadAll
    6.02MB  1.95% 97.06%     6.02MB  1.95%  golang.org/x/net/http2.glob..func3
    2.50MB  0.81% 97.87%     2.50MB  0.81%  runtime.allocm
         0     0% 97.87%     6.52MB  2.11%  golang.org/x/net/http2.(*ClientConn).readLoop
         0     0% 97.87%     6.02MB  1.95%  golang.org/x/net/http2.(*clientConnReadLoop).processData
         0     0% 97.87%     6.52MB  2.11%  golang.org/x/net/http2.(*clientConnReadLoop).run
         0     0% 97.87%     6.02MB  1.95%  golang.org/x/net/http2.(*dataBuffer).Write
         0     0% 97.87%     6.02MB  1.95%  golang.org/x/net/http2.(*dataBuffer).lastChunkOrAlloc
         0     0% 97.87%     6.02MB  1.95%  golang.org/x/net/http2.(*pipe).Write
         0     0% 97.87%     6.02MB  1.95%  golang.org/x/net/http2.getDataBufferChunk
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/informers/core/v1.NewFilteredSecretInformer.func1
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/kubernetes/typed/core/v1.(*secrets).List
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/rest.(*Request).Do
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/rest.(*Request).Do.func1
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/rest.(*Request).request
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/rest.(*Request).request.func3
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/rest.(*Request).request.func3.1 (inline)
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/rest.(*Request).transformResponse
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/tools/cache.(*ListWatch).List
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/tools/cache.(*Reflector).list.func1
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/tools/cache.(*Reflector).list.func1.2
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/tools/pager.(*ListPager).List
         0     0% 97.87%   294.75MB 95.44%  k8s.io/client-go/tools/pager.SimplePageFunc.func1
         0     0% 97.87%     3.55MB  1.15%  runtime.doInit
         0     0% 97.87%     3.55MB  1.15%  runtime.main

This patch uses  metadatainformer package to watch secrets and
configmaps so that all content need not be loaded into memory thus
reducing the amount of memory consumed.

Fixes: prometheus-operator#5410

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
@sthaha sthaha force-pushed the fix-5410-metadata-informers branch from 6c31947 to 4ea54d8 Compare March 22, 2023 06:35
@sthaha sthaha changed the title fix: new metadata informer fix: improve memory usage by using metadata informer Mar 22, 2023
@sthaha sthaha marked this pull request as ready for review March 22, 2023 07:58
@sthaha sthaha requested a review from a team as a code owner March 22, 2023 07:58
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!

@simonpasquier simonpasquier merged commit a89e8ad into prometheus-operator:main Mar 22, 2023
@simonpasquier simonpasquier deleted the fix-5410-metadata-informers branch March 22, 2023 17:05
simonpasquier added a commit to simonpasquier/prometheus-operator that referenced this pull request Oct 9, 2023
The Prometheus agent controller didn't use the metadata informer for
secrets and configmaps as implemented in prometheus-operator#5424 and prometheus-operator#5448 for the other
controllers.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
simonpasquier added a commit to simonpasquier/prometheus-operator that referenced this pull request Oct 10, 2023
The Prometheus agent controller didn't use the metadata informer for
secrets and configmaps as implemented in prometheus-operator#5424 and prometheus-operator#5448 for the other
controllers.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
simonpasquier added a commit to simonpasquier/prometheus-operator that referenced this pull request Oct 10, 2023
The Prometheus agent controller didn't use the metadata informer for
secrets and configmaps as implemented in prometheus-operator#5424 and prometheus-operator#5448 for the other
controllers.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
simonpasquier added a commit to simonpasquier/prometheus-operator that referenced this pull request Oct 10, 2023
The Prometheus agent controller didn't use the metadata informer for
secrets and configmaps as implemented in prometheus-operator#5424 and prometheus-operator#5448 for the other
controllers.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
simonpasquier added a commit to simonpasquier/prometheus-operator that referenced this pull request Oct 10, 2023
The Prometheus agent controller didn't use the metadata informer for
secrets and configmaps as implemented in prometheus-operator#5424 and prometheus-operator#5448 for the other
controllers.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants