added: support for metrics configuration, periodic metrics cleanup and selective namespace whitelisting and blacklisting with respect to metrics registration #2288

yashvardhan-kukreja · 2021-08-21T23:48:38Z

Signed-off-by: Yashvardhan Kukreja yash.kukreja.98@gmail.com

Related issue

closes #2268

Milestone of this PR

What type of PR is this

/kind feature

Proposed Changes

Now, values.yaml will encapsulate configuration for metrics exposure as well

config:
  metricsConfig:
    namespaces: {
      "include": [], # list of namespaces to capture metrics for. Default: metrics being captured for all namespaces except excludeNamespaces.
      "exclude": [] # list of namespaces to NOT capture metrics for. Default: []
    }
    metricsRefreshInterval: 24h # rate at which metrics should reset so as to clean up the memory footprint of kyverno metrics. Default: null, no refresh of metrics
  # Or provide an existing metrics config-map by uncommenting the below line
  # existingMetricsConfig: sample-metrics-configmap. Refer to the ./templates/metricsconfigmap.yaml for the structure of metrics configmap.

namespaces.include - specifically, the namespaces for which the metrics will be collected. Default: metrics will be collected for all namespaces.
namespaces.exclude - the namespaces for which the metrics will NOT be collected. Default: none of the namespaces are excluded.
metricsRefreshInterval - The periodic rate at which the metrics of Kyverno will get cleaned. This will help in reducing the memory footprint associated with metrics collection and exposure in Kyverno. And this sort of metrics cleanup doesn't involve any loss of data because the metrics are always going to be persistently getting stored on end-user's Prometheus server.

Proof Manifests

Checklist

[] I have read the contributing guidelines.
[] I have added tests that prove my fix is effective or that my feature works.
[] My PR contains new or altered behavior to Kyverno and
- [] I have added or changed the documentation myself in an existing PR and the link is:
- [] I have raised an issue in kyverno/website to track the doc update and the link is:
- [] I have read the PR documentation guide and followed the process including adding proof manifests to this PR.

Further Comments

realshuting · 2021-08-23T19:52:02Z

Hi @yashvardhan-kukreja - have you tested it locally? Seems like the Kyverno Pod never enters running & ready state. Can you please verify locally?

yashvardhan-kukreja · 2021-08-23T20:24:19Z

Yes I did for both with a definite metrics refresh and no metrics refresh. Was running perfectly fine on my end.

I'll test it again, meanwhile do you mind sending the logs and the output of kubectl describeof Kyverno pod?

Also, can you share the values.yaml file against which you are testing this PR?

realshuting · 2021-08-23T20:35:10Z

Here's the log, I think it uses the direct manifest to install Kyverno, and Pod was in an Error state:

Run echo ">>> Install Kyverno"
  echo ">>> Install Kyverno"
  sed 's/imagePullPolicy:.*$/imagePullPolicy: IfNotPresent/g' ${GITHUB_WORKSPACE}/definitions/install.yaml | kubectl apply -f -
  kubectl apply -f ${GITHUB_WORKSPACE}/definitions/github/rbac.yaml
  chmod a+x ${GITHUB_WORKSPACE}/scripts/verify-deployment.sh
  sleep 50
  echo ">>> Check kyverno"
  kubectl get pods -n kyverno
  ${GITHUB_WORKSPACE}/scripts/verify-deployment.sh -n kyverno  kyverno
  sleep 20
  
  echo ">>> Expose the Kyverno's service's metric server to the host"
  kubectl port-forward svc/kyverno-svc-metrics -n kyverno 8000:8000 &
  echo ">>> Run Kyverno e2e test"
  make test-e2e
  kubectl delete -f ${GITHUB_WORKSPACE}/definitions/install.yaml
  shell: /usr/bin/bash -e {0}
  env:
    GOROOT: /opt/hostedtoolcache/go/1.16.7/x64
    CT_CONFIG_DIR: /opt/hostedtoolcache/ct/v3.3.0/x86_64/etc
    VIRTUAL_ENV: /opt/hostedtoolcache/ct/v3.3.0/x86_64/venv
>>> Install Kyverno
namespace/kyverno created
customresourcedefinition.apiextensions.k8s.io/clusterpolicies.kyverno.io created
customresourcedefinition.apiextensions.k8s.io/clusterpolicyreports.wgpolicyk8s.io created
customresourcedefinition.apiextensions.k8s.io/clusterreportchangerequests.kyverno.io created
customresourcedefinition.apiextensions.k8s.io/generaterequests.kyverno.io created
customresourcedefinition.apiextensions.k8s.io/policies.kyverno.io created
customresourcedefinition.apiextensions.k8s.io/policyreports.wgpolicyk8s.io created
customresourcedefinition.apiextensions.k8s.io/reportchangerequests.kyverno.io created
serviceaccount/kyverno-service-account created
clusterrole.rbac.authorization.k8s.io/kyverno:admin-policies created
clusterrole.rbac.authorization.k8s.io/kyverno:admin-policyreport created
clusterrole.rbac.authorization.k8s.io/kyverno:admin-reportchangerequest created
clusterrole.rbac.authorization.k8s.io/kyverno:customresources created
clusterrole.rbac.authorization.k8s.io/kyverno:generatecontroller created
clusterrole.rbac.authorization.k8s.io/kyverno:leaderelection created
clusterrole.rbac.authorization.k8s.io/kyverno:policycontroller created
clusterrole.rbac.authorization.k8s.io/kyverno:userinfo created
clusterrole.rbac.authorization.k8s.io/kyverno:webhook created
clusterrolebinding.rbac.authorization.k8s.io/kyverno:customresources created
clusterrolebinding.rbac.authorization.k8s.io/kyverno:generatecontroller created
clusterrolebinding.rbac.authorization.k8s.io/kyverno:leaderelection created
clusterrolebinding.rbac.authorization.k8s.io/kyverno:policycontroller created
clusterrolebinding.rbac.authorization.k8s.io/kyverno:userinfo created
clusterrolebinding.rbac.authorization.k8s.io/kyverno:webhook created
configmap/init-config created
service/kyverno-svc created
service/kyverno-svc-metrics created
deployment.apps/kyverno created
clusterrole.rbac.authorization.k8s.io/kyverno:userinfo configured
clusterrole.rbac.authorization.k8s.io/kyverno:customresources configured
clusterrole.rbac.authorization.k8s.io/kyverno:policycontroller configured
clusterrole.rbac.authorization.k8s.io/kyverno:generatecontroller configured
>>> Check kyverno
NAME                       READY   STATUS   RESTARTS   AGE
kyverno-64d6687bc9-jbvdl   0/1     Error    2          43s
Waiting for deployment of kyverno in namespace kyverno with a timeout 60 seconds
Expected generation for deployment kyverno: 1
Observed expected generation: 1
Specified replicas: 1
current/updated/available replicas: 1/1/, waiting
current/updated/available replicas: 1/1/, waiting
current/updated/available replicas: 1/1/, waiting

yashvardhan-kukreja · 2021-08-23T22:01:36Z

Hi @yashvardhan-kukreja - have you tested it locally? Seems like the Kyverno Pod never enters running & ready state. Can you please verify locally?

Shuting, I also checked around this. So, I installed Kyverno around 10-15 times and only once the issue occured and that was around Kyverno entering "Running" state but still it had 0/1 containers meaning, Kyverno container was failing. And upon describing it, I found that the Kyverno container failed the liveness probe's check in that situation due to which the Kyverno was registered as unhealthy.
But I checked the logs of that container and it was perfectly fine. Hence, the root cause was clear and that was, that the liveness probe checks executed before Kyverno's webhook server started (a pretty rare event).

So, as a resolution I increased the initialDelaySeconds of kyverno container from 10 seconds to 15 seconds so as to ensure that Kyverno gets enough time to start the webhook server before the liveness probe checks begin.

yashvardhan-kukreja · 2021-08-23T22:06:22Z

Another thing Shuting. I found that although I had programmed the default case scenarios when namespaces field and metricsRefreshInterval field under config.metricsConfig in values.yaml but still, if the user explicitly removed/commented the entire config.metricsConfig block, then that was crashing the helm install of Kyverno. Hence, for that I programmed the default case scenario for that as well in the second last commit.

yashvardhan-kukreja · 2021-08-23T22:23:31Z

Here's the log, I think it uses the direct manifest to install Kyverno, and Pod was in an Error state:

I guess got the issue around here. I forgot to update metrics config related manifests and kustomize definitions under definitions/k8s-resources/. Working on it!

yashvardhan-kukreja · 2021-08-23T23:32:56Z

All done @realshuting :)

yashvardhan-kukreja · 2021-09-02T00:56:46Z

@realshuting once you're done reviewing this PR, please do not merge it. Please review and merge #2351 first. Once it gets merged to main branch, I'll make a very tiny change to this PR, rebase it and I guess then, it would be in a mergeable state.

realshuting

/lgtm

@yashvardhan-kukreja - do we have issues tracking website updates? We need to document how to customize ConfigMap using Helm and direct manifests.

yashvardhan-kukreja · 2021-09-02T02:21:17Z

/lgtm

@yashvardhan-kukreja - do we have issues tracking website updates? We need to document how to customize ConfigMap using Helm and direct manifests.

Yeah Shuting, I'll hop onto documenting that.
Meanwhile, I just got to make one last commit to this PR commenting the following line in values.yaml

metricsRefreshInterval: 24h

So that, by default, Kyverno's metric exporter won't reset and cleanup metrics in its buffer every 24 hrs.
If the user wants to do so, they can still uncomment it and feed it to helm to make Kyverno perform metrics refresh.

yashvardhan-kukreja · 2021-09-04T23:39:54Z

@realshuting do you mind running e2e tests corresponding to this branch on your end? I tried running them on my end but they Passed so I am not exactly able to find what's the concern here and why the e2e tests of the github actions are reporting failure.

realshuting · 2021-09-09T23:23:35Z

@yashvardhan-kukreja - verified locally, all looked good!

Can you please rebase the main branch? And sorry for the late response.

…d selective namespace whitelisting and blacklisting for metrics Signed-off-by: Yashvardhan Kukreja <yash.kukreja.98@gmail.com>

realshuting · 2021-09-10T21:39:09Z

Thank you @yashvardhan-kukreja !

realshuting · 2021-09-15T22:00:36Z

Hi @yashvardhan-kukreja - following up on the doc update, do we have an issue logged to track it?

yashvardhan-kukreja force-pushed the issue-2268/selective-metric-exposure branch 3 times, most recently from 8bb7a37 to 8cf8f78 Compare August 22, 2021 11:04

yashvardhan-kukreja force-pushed the issue-2268/selective-metric-exposure branch from 6f92582 to d3fa768 Compare September 2, 2021 00:30

realshuting self-assigned this Sep 2, 2021

realshuting approved these changes Sep 2, 2021

View reviewed changes

yashvardhan-kukreja force-pushed the issue-2268/selective-metric-exposure branch 3 times, most recently from 657c424 to bfb149a Compare September 4, 2021 03:01

added: support for metrics configuration, periodic metrics cleanup an…

fd8b2d9

…d selective namespace whitelisting and blacklisting for metrics Signed-off-by: Yashvardhan Kukreja <yash.kukreja.98@gmail.com>

yashvardhan-kukreja force-pushed the issue-2268/selective-metric-exposure branch from bfb149a to fd8b2d9 Compare September 10, 2021 17:11

realshuting merged commit 5fcd9b8 into kyverno:main Sep 10, 2021

yashvardhan-kukreja mentioned this pull request Sep 16, 2021

[Docs] Sync docs to match the latest metrics-related changes in Kyverno kyverno/website#294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added: support for metrics configuration, periodic metrics cleanup and selective namespace whitelisting and blacklisting with respect to metrics registration #2288

added: support for metrics configuration, periodic metrics cleanup and selective namespace whitelisting and blacklisting with respect to metrics registration #2288

yashvardhan-kukreja commented Aug 21, 2021 •

edited

realshuting commented Aug 23, 2021

yashvardhan-kukreja commented Aug 23, 2021 •

edited

realshuting commented Aug 23, 2021

yashvardhan-kukreja commented Aug 23, 2021 •

edited

yashvardhan-kukreja commented Aug 23, 2021

yashvardhan-kukreja commented Aug 23, 2021 •

edited

yashvardhan-kukreja commented Aug 23, 2021

yashvardhan-kukreja commented Sep 2, 2021 •

edited

realshuting left a comment

yashvardhan-kukreja commented Sep 2, 2021

yashvardhan-kukreja commented Sep 4, 2021

realshuting commented Sep 9, 2021

realshuting commented Sep 10, 2021

realshuting commented Sep 15, 2021

added: support for metrics configuration, periodic metrics cleanup and selective namespace whitelisting and blacklisting with respect to metrics registration #2288

added: support for metrics configuration, periodic metrics cleanup and selective namespace whitelisting and blacklisting with respect to metrics registration #2288

Conversation

yashvardhan-kukreja commented Aug 21, 2021 • edited

Related issue

Milestone of this PR

What type of PR is this

Proposed Changes

Proof Manifests

Checklist

Further Comments

realshuting commented Aug 23, 2021

yashvardhan-kukreja commented Aug 23, 2021 • edited

realshuting commented Aug 23, 2021

yashvardhan-kukreja commented Aug 23, 2021 • edited

yashvardhan-kukreja commented Aug 23, 2021

yashvardhan-kukreja commented Aug 23, 2021 • edited

yashvardhan-kukreja commented Aug 23, 2021

yashvardhan-kukreja commented Sep 2, 2021 • edited

realshuting left a comment

Choose a reason for hiding this comment

yashvardhan-kukreja commented Sep 2, 2021

yashvardhan-kukreja commented Sep 4, 2021

realshuting commented Sep 9, 2021

realshuting commented Sep 10, 2021

realshuting commented Sep 15, 2021

yashvardhan-kukreja commented Aug 21, 2021 •

edited

yashvardhan-kukreja commented Aug 23, 2021 •

edited

yashvardhan-kukreja commented Aug 23, 2021 •

edited

yashvardhan-kukreja commented Aug 23, 2021 •

edited

yashvardhan-kukreja commented Sep 2, 2021 •

edited