Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Metrics for CRDs #1210

Closed
mrueg opened this issue Aug 18, 2020 · 30 comments
Closed

Resource Metrics for CRDs #1210

mrueg opened this issue Aug 18, 2020 · 30 comments
Labels
after-2.0 kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mrueg
Copy link
Member

mrueg commented Aug 18, 2020

/kind feature

With the growing use of CRDs in K8s it might be useful to monitor those as well and provide some initial metrics for them.

Idea:
Have kube-state-metrics read a configuration file that allows the user to list further resources.

A list of crds to monitor:

- servicemonitors.monitoring.coreos.com
- certificaterequests.cert-manager.io

could create metrics similar to https://github.com/kubernetes/kube-state-metrics/blob/master/docs/configmap-metrics.md

kube_CRD_NAME_info
kube_CRD_NAME_created
kube_CRD_NAME_metadata_resource_version

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 18, 2020
@mrueg mrueg changed the title Ressource Metrics for CRDs Resource Metrics for CRDs Aug 20, 2020
@tariq1890
Copy link
Contributor

Looks like this PR would need to be revived: #515

I agree that it would be a great addition to kube-state-metrics. I am also wondering if we should take out VPA metrics and have them be monitored as CRDs. VPAs aren't considered as k8s primitive APIs.

/help

@k8s-ci-robot
Copy link
Contributor

@tariq1890:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Looks like this PR would need to be revived: #515

I agree that it would be a great addition to kube-state-metrics. I am also wondering if we should take out VPA metrics and have them be monitored as CRDs. VPAs aren't considered as k8s primitive APIs.

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 25, 2020
@dgrisonnet
Copy link
Member

Is anyone currently working on this? I would be interested in implementing this feature.

@lilic
Copy link
Member

lilic commented Oct 22, 2020

Hey @dgrisonnet there is no real plan on the roadmap for this. I would prefer native instrumentation for CRs, while I understand its not always possible, but have my doubts on CR instance metrics being added in kube-state-metrics.

In any case I would prefer to wait for release-2.0 to be out before we do this.

@lilic lilic added after-2.0 and removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Oct 22, 2020
@dgrisonnet
Copy link
Member

I completely agree with you that CR should not be monitored by kube-state-metrics. However, in my opinion, Custom Resource Definition should be as it is a native Kubernetes resource.
Following what was done in #515 and @mrueg proposition, I would say that adding the following metrics seems useful:

kube_customresourcedefinition_info{group="", name=""}
kube_customresourcedefinition_created{group="", name=""}
kube_customresourcedefinition_metadata_resource_version{group="", name=""}

In any case I would prefer to wait for release-2.0 to be out before we do this.

Sure, no worries. I wasn't intending to put pressure on getting this feature in 2.0.

@lilic
Copy link
Member

lilic commented Oct 23, 2020

Sure, no worries. I wasn't intending to put pressure on getting this feature in 2.0.

No pressure at all! :)

Seems interesting. @mrueg @dgrisonnet what value does this bring to you, can you describe some use cases how you would use those metrics in queries, alerts or dashboards, etc.? Thanks! We like to get some use cases so we don't just add things for the sake of adding them but folks don't use it and its just extra series that are produced, hope that makes sense.

@mrueg
Copy link
Member Author

mrueg commented Oct 23, 2020

@lilic
As mentioned in the initial issue post, what immediately comes to my mind are prometheus-operator's servicemonitors.
Having those metrics would allow to alert on a bad configuration if the target does not appear in the prometheus instance it should be (which for example can happen easily if you set up prometheus{,-operator} to only select servicemonitors with specific labels and the servicemonitor is missing it).

Another use case are cert-manager's CRDs where one could use the crd metrics to verify that the actual secrets including the certificate and key get created.

@lilic
Copy link
Member

lilic commented Oct 23, 2020

As mentioned in the initial issue post, what immediately comes to my mind are prometheus-operator's servicemonitors.
Having those metrics would allow to alert on a bad configuration if the target does not appear in the prometheus instance it should be

Not sure how kube_customresourcedefinition_created{group="monitoring.coreos.com", name="servicemonitors"} 1603447067 , which just gives you when the custom resource definition was registered with the API, can give you that info? Can you explain a bit more, not saying its a bad idea just want a bit more details. You planning on combing that with another metrics for example? I think for this you would need CR instance metrics, no?

@brancz
Copy link
Member

brancz commented Oct 23, 2020

The prometheus operator already exposes metrics about it's CRs itself, which is how I believe it should be. Certmanager should do the same.

@mrueg
Copy link
Member Author

mrueg commented Oct 26, 2020

As mentioned in the initial issue post, what immediately comes to my mind are prometheus-operator's servicemonitors.
Having those metrics would allow to alert on a bad configuration if the target does not appear in the prometheus instance it should be

Not sure how kube_customresourcedefinition_created{group="monitoring.coreos.com", name="servicemonitors"} 1603447067 , which just gives you when the custom resource definition was registered with the API, can give you that info? Can you explain a bit more, not saying its a bad idea just want a bit more details. You planning on combing that with another metrics for example? I think for this you would need CR instance metrics, no?

Apologies, should have read more closely. yes I was thinking about instance level metrics.

The generic ones could be interesting to provide info when a CRD was updated (unless the tools using them already provide that info)

@lilic
Copy link
Member

lilic commented Oct 27, 2020

Apologies, should have read more closely. yes I was thinking about instance level metrics.

Not sure we would do instance level metrics at this points for custom resources, but if you come up with a mini proposal on the pros and cons we can have a look, google doc is fine or just more detailed description in a new issue as I would like to keep this one for CRD metrics. Thanks!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2021
@lilic
Copy link
Member

lilic commented Jan 25, 2021

/remove-lifecycle stale

I believe this is still valid. @mrueg do you want me to label this as help wanted or do you want to tackle the proposal part yourself? Thanks!

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2021
@grzesuav
Copy link
Contributor

Was it implemented ? If not - upvoting ;)

@grzesuav
Copy link
Contributor

Maybe my use-case - I have dashboards showing information about resources related to one of the operators - ConfigMap, Secrets and its leaks information about CR's

@lilic
Copy link
Member

lilic commented Apr 26, 2021

@grzesuav this was not implemented, it's up for grabs if you are interested in tackling it?

@grzesuav
Copy link
Contributor

hi @lilic , I can try, I am not familiar with the project though, will try to look at it next weekend and ask more question about feasibility of implementation if that's ok

@lilic
Copy link
Member

lilic commented Apr 29, 2021

@grzesuav sounds great! Best would be to come up with just a small proposal here on the issue before you start doing a PR, might be good to sync with @mrueg who had an initial idea and wants this as well, so both requests are answered. Thanks again! 🎉

@grzesuav
Copy link
Contributor

just one thing @lilic , as I noticed kube-state-metrics uses typed API, is it ok to focus on CRD v1 introduced in k8s 1.16 and omit v1beta1 ?

@lilic
Copy link
Member

lilic commented Jun 3, 2021

Yes that sounds great! 👍

grzesuav added a commit to grzesuav/kube-state-metrics that referenced this issue Jun 29, 2021
@grzesuav
Copy link
Contributor

hi, I have question related to implementation, I have some easy draft here - #1517.

The challenge I currently have is that typed client for CRD is in k8s.io/apiextensions-apiserver v0.21.0 package, also its interface is bit different :

type func(kubeClient "k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset".Interface, ns string) "k8s.io/client-go/tools/cache".ListerWatcher)

than

func(kubeClient "k8s.io/client-go/kubernetes".Interface, ns string) "k8s.io/client-go/tools/cache".ListerWatcher

therefore I have some questions :

  1. is there nice way to abstract this ? The problem I see it that whole design here heavily uses typed builder, therefore unsure what would be the best way - create internal client interface and make both clienset interfaces compatible// wrapped by it (but it would require a bit of changes with typing in many places)
  2. another idea which I have is to use Unstructured API to get CRD's without using apiextensions dependency - but it will require more extractions and tooling in crd package (however leaving rest of the design untouched)

additional questions :

  1. for now I am adding metric for crd objects, I think it would be beneficial to have also particular cr's in, but would keep it as separate PR - to not make it too big
  2. making prom labels from kubernetes labels - from what I noticed there is certain allowlist to filter which labels should be included - how does it work ? I noticed my metrics have additional labels (on the Secrets) so it is configurable ? I mean how I can extend list of labels to be attached there (on cluster)

That is all what I have, please let me know whad do you think,

Cheers

@grzesuav
Copy link
Contributor

⬆️ @lilic @mrueg would appreciate some feedback on that ;)

@robbie-demuth
Copy link

It looks like @mrueg's feature request was for instance-level (i.e. custom resource) metrics, but the conversation (and in-flight PR) has shifted to focus on custom resource definition metrics. I'm not going to get into technical details, level of effort, etc, but I'd like to give a plus 1 for instance-level metrics. All custom resources have the same metadata (i.e. ObjectMeta). I think it'd be incredibly valuable to serve metrics about that metadata for free (or via some opt-in API)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 5, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RafaeLeal
Copy link

Hey, was this implemented? I'm not familiar with the codebase, but I can come up with a proposal if that's something the maintainers want to put forward. I'm interested in instance-level metadata metrics, especially kube_CRD_NAME_labels would be very useful for me.

My use case: I maintain a CICD cluster using TektonCD. Tekton controllers manage TaskRuns and PipelineRuns, which are CRDs to use as building blocks for a CI/CD system. Tekton already has some metrics at the PipelineRun and TaskRun levels, but I'd like to be able to add labels on them and aggregate them arbitrarily. To give a quick example, I could add a label on PipelineRuns generated by a repository and calculate the average deployment time, and things like that. Joining the existing metrics with kube_pipelineruns_labels would be enough for me.

@grzesuav
Copy link
Contributor

@RafaeLeal
Copy link

@grzesuav reading this I'm not sure if we can implement a labels metric with it..
I could have something like this:

kind: CustomResourceStateMetrics
spec:
  resources:
    - groupVersionKind:
        group: myteam.io
        kind: "Foo"
        version: "v1"
      metrics:
        - name: "labels"
          help: "Foo labels"
          each: ???
          labelsFromPath:
            "*": [metadata, labels]

But I don't know exactly what to put on each, because the value is a constant 1 in the kube_*_labels 🤔

Maybe we should extend the API to allow this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
after-2.0 kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.