-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory footprint for GCP admission component #143
Comments
/assign @ialidzhikov |
Thank you @timuthy for opening this issue. The avg memory usage now seem to be stable ~80Mb with spikes to 100Mb. These numbers are from a landscape with ~10000 Secrets and ~7000 SecretBindings. I can give a try for
and see whether it improves the memory usage. |
I had a look today into this issue and tried to outline future improvements.
My last change on this issue was to disable the cache on Secrets (#253 (comment)) and it had a possitive effect by reducing the memory usage to |
I guess this won't hurt but in the end I don't think that it will save much memory, especially if we use protobuf, right?
Does it make sense to use profiling (gardener/gardener#4568) here? |
Hmm, why do you think so? The shoot resource can grow quite large and has to be stored somewhere in memory, which is independent of the used transport protocol. |
I checked the rough size of shoots on our biggest landscape and yes, it's a number but I cannot deduce that this will save memory in the Please note, that I never claimed filtering out the shoots can be disregarded. I was rather wondering if it will save a considerable amount of memory increase that we have experienced since quite a while. |
Got your point. Well, I guess we just have to try and see 😄 |
After more thoughts on how to tackle this issue, I came up with the following proposal: |
Thanks @ialidzhikov!
This should probably be the GCM since it's in no way seed-specific and can be handled centrally. |
I thought about it but faced the following problem. If it is a new controller in GCM that is watching Shoots and maintaining the cloud provider Secret label, then when GCM is down, GCM will lose track of the Shoots that were deleted meanwhile (=> fails to remove the label from the Secret). Of course, this can be handled by adding another finalizer to the Shoot metadata but I was not quite sure whether we want to do this. I can think again and try to find something better.. |
Good point, indeed. Yeah, the additional finalizer sounds not optimal. Still, I can imagine that having such logic in gardenlet might lead to other challenges w.r.t. the seed authorization and multiple gardenlets chasing for the same secret, or? I haven't thought about it much, but from a first gud feeling putting such logic in GCM would feel more natural (if we find good ways to overcome the mentioned problems, of course). Let's see whether there'll be ideas about it. |
+1 for the proposal from @ialidzhikov, I like it :)
We should be able to construct a controller that does not only rely on watch events (controllers generally should be able to compensate for restarts) even without using finalizers. |
When I go one step back, I think that the issue boils down to the direction that there is no "free" (by free I mean without additional requests/watches) way to determine the provider type of cloud provider Secret. Currently it is really nice that the provider type is part of the Shoot and CloudProfile specs. This allows extension admission controllers to pick only the requests that match the provider type. I think that the part contributed to the memory increase in the admission controller is the part that determines the cloud provider Secret type and in general whether the Secret is in use by a Shoot from the given provider type (of course, we can try to enable profiling as suggested by @timuthy to confirm/reject this). apiVersion: core.gardener.cloud/v1beta1
kind: SecretBinding
metadata:
name: my-provider-account
namespace: garden-dev
provider:
type: gcp
secretRef:
name: my-provider-account Pros:
Cons:
|
I updated my previous comment #143 (comment) with implementation proposal on how to introduce a |
I like the proposal @ialidzhikov!
Could we use a list of providers instead of only one?
What are valid values?
How would the label be named? |
My current assumption is that in general it is not a good practice to reuse single Secret. At least I would vote for separation of concerns and in a personal setup I wouldn't put such sensitive data into single Secret. Let me know if there are cases where we recommend this approach. I agree with you in general that it is kind of an incompatible change. On the other side we can use the opportunity to impose the good practice if we really have only exceptional cases of misuse. So, I better check how many folks depend on this reuse of single cloud provider Secret for multiple provider types.
Good point. Initially I had in mind to check whether the provider type is a registered one (similar to what we do for Shoot and Seed). But I see that we allow for example CloudProfile to be created with non-registered provider type. We better follow the CloudProfile approach and do not introduce a validation for this.
I had in mind |
/component gardener |
With #396 on a not that busy landscape the avg memory utilization of admission-gcp dropped from ~40MiB to ~21MiB. Let's see what will be the effect for busy landscapes where admission-gcp memory usage is > 200MiB. |
On a large landscape with the rollout of We can say that the issue is fixed for admission-gcp. However I will keep it open to track the progress for other other admission components that have to be adapted in a similar way. |
Very nice, well done @ialidzhikov |
The other admission components are adapted in a similar way. Hence, we can close this issue. /close |
How to categorize this issue?
/area robustness
/area cost
/priority critical
/platform gcp
What would you like to be added:
Since the validation of cloud provider secrets has been introduced (#112) the GCP admission plugin's memory footprint increased for a considerable amount, mainly because of added caches for
Shoots
,SecretBindings
andSecrets
.The required memory depends on the K8s cluster or the number/size of the stored resources, but we've observed increases from from ~20Mb to ~500Mb. Under consideration that such an admission component has multiple replicas and is only responsible for one provider, the runtime costs are too high.
Hence, we should try to:
Secrets
because we expect thatSecrets
are the most occurring resource kinds in the Garden cluster.The text was updated successfully, but these errors were encountered: