-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make memory bloat debuggable #1703
Comments
It looks like syncing has a prometheus metric called gatekeeper/pkg/controller/sync/stats_reporter.go Lines 20 to 33 in ad30ce0
Though that wouldn't help if it was a large object (say, a config map) that was causing the memory usage. Is it only the webhook pod and not the audit pod? If so, I wonder if capping the number of webhook goroutines would help. @willbeason may also be aware of efficiency improvements coming down the pipe. |
Yep, we've got efficiency improvements in the works which should reduce memory usage by 50-60% (or more, depending on use case). We're doing a lot of performance measuring, so we'll have to think on what metrics are helpful (that is, the ones we know correlated with memory/cpu usage and not just noise). We're also planning on adding ways of measuring memory/CPU usage for ConstraintTemplates in the Gator CLI. For now (before the efficiency improvements):
So if you add a lot of ConstraintTemplates at once (~100) you will experience high memory usage. Each object evaluated requires allocating and deallocating at least 70 MB memory per 1,000 Constraints for simple ConstraintTemplates. Complex ConstraintTemplates use significantly more. "Constraint Template complexity" is a very rough term, can only be experimentally measured, and has variable impact depending on use. A rough approximation is "longer ConstraintTemplates use more memory to execute queries". These are intended as rough ways of reasoning about performance, and are not completely generalizable. As with all performance advice, characteristics are dependent on use case. |
memory bloat seems to only affect gatekeeper, audit is happy at ~2gb |
I hadn't thought of that - that will indeed limit memory throughput since at most that number of requests will be served at a time. |
same issue with
... anyone got more ideas or is this a feature that's missing ? |
Try using the gatekeeper/pkg/webhook/policy.go Line 61 in fbb5d2b
|
Thx, that sounds good
…On Tue, Jan 11, 2022, 6:11 PM Max Smythe ***@***.***> wrote:
Try using the --max-serving-threads flag:
https://github.com/open-policy-agent/gatekeeper/blob/fbb5d2b93514e219261714d56ded4d9eb0a9617a/pkg/webhook/policy.go#L61
—
Reply to this email directly, view it on GitHub
<#1703 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAACYZ3RA25UWS7UNKGYAYLUVTPLXANCNFSM5JAJLLVQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@grosser Memory consumption for audit should be greatly reduced with Gatekeeper v3.8.x - our benchmarks saw memory usage reduced by 10x or more. Has this improved your situation at all? The main memory improvements we see aren't debuggable with the Gator CLI, so I'm removing it from the Gator milestone. |
we were not able to update to 3.8.x yet since it causes lots of oomkills, not a priority atm, will report back when we finally do 🤞 |
to clarify for v3.8.x a little: audit seems fine but we ran into memory issues with webhook in high volume clusters so had to revert the upgrade. we'll likely have a better idea once #2060 is addressed |
For webhook memory usage, can you set GOMAXPROCS to the # of CPU in your pod per #1907 ? Also, for high volume usage, setting |
The above wont fix lock contention, but it should prevent OOMing |
We currently have We did try setting |
I think ours is just processing stuff that doesn't even have a constraint related to it. I tinkered with admissionwebhook changing it from * to a more explicit list and memory has reduced 500mb per pid. But why is this even processing things that have no rules? I wrote a quick-n-dirty admissionwebhook and its sitting at 100m cpu and 125m memory receiving everything (but doing nothing). So GK must be receiving everything but also running it thru all constraints. Here are some example of things i dont see anybody ever having a rule for which are at the top.
|
Fyi for our setup I added a validation that always makes sure the admission
we hook is in sync with the resources the constraints need so there is no
overhead or unenforced constraints
…On Fri, Jul 15, 2022, 10:28 AM Timothy Ehlers ***@***.***> wrote:
I think ours is just processing stuff that doesn't even have a constraint
related to it. I tinkered with admissionwebhook changing it from * to a
more explicit list and memory has reduced 500mb per pid. But why is this
even processing things that have no rules? I wrote a quick-n-dirty
admissionwebhook and its sitting at 100m cpu and 125m memory receiving
everything (but doing nothing). So GK must be receiving everything but also
running it thru all constraints.
Here are some example of things i dont see anybody ever having a rule for
which are at the top.
kubectl -n gatekeeper-system logs gatekeeper-controller-manager-8c874cbc4-bnl9j |grep received |tr ' ' '\n' |grep Kind |sort |uniq -c |sort -nr
598 Kind=Lease",
322 Kind=Event",
—
Reply to this email directly, view it on GitHub
<#1703 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAACYZ3KJCC2SFJP6IWGXCDVUDLFRANCNFSM5JAJLLVQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks for sharing this data @tehlers320!
Was this with constraints? If so, how many constraints and constraint templates?
@grosser Where did you add this validation? |
In our custom code that generates the webhook
…On Fri, Jul 15, 2022, 12:39 PM Rita Zhang ***@***.***> wrote:
Fyi for our setup I added a validation that always makes sure the admission
we hook is in sync with the resources the constraints need so there is no
overhead or unenforced constraints
Where did you add this validation?
—
Reply to this email directly, view it on GitHub
<#1703 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAACYZ42IRDN4VV653VWIZ3VUD2RRANCNFSM5JAJLLVQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
im not sure if we are super complex. |
you can see how many constraints (policies) you have with something like this: Another thing to evaluate is do all constraints have match kinds? if so, it should only evaluate requests that match those kinds. if no match kind is provided, then all requests could be hitting the constraint and constraint template for evaluation. e.g. This constraint ensures only requests that contain pod will be evaluated against this constraint and constraint template. apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
name: prod-repo-is-openpolicyagent
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"] |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
we run with 10gb limit and the webhook still gets oomkilled a lot
it would be nice to have a "synced objects" or "requests in queue" or other things that could bloat the memory be queryable so we can alert and debug when they go too high
version: afc9fe2
The text was updated successfully, but these errors were encountered: