Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8scluster] add k8s.container.status_waiting_reason metric #32457

Open
ElfoLiNk opened this issue Apr 16, 2024 · 2 comments
Open

[k8scluster] add k8s.container.status_waiting_reason metric #32457

ElfoLiNk opened this issue Apr 16, 2024 · 2 comments
Labels

Comments

@ElfoLiNk
Copy link

ElfoLiNk commented Apr 16, 2024

Component(s)

receiver/k8scluster

Is your feature request related to a problem? Please describe.

I would like to get some container state metrics, about waiting reason. One use case is to know whether the container is in CrashLoopBackOff.

Example happening in pod:

kubectl get pod X -o yaml

...
apiVersion: v1
kind: Pod
...
status:
  conditions:
  containerStatuses:
  - containerID: containerd://e7d1583c9d91178c1f649d5d5a4d38f10decbd4a2d921976909e9d6ab5f3ac23
    image: docker.io/otel/opentelemetry-collector-contrib:0.97.0
    imageID: docker.io/otel/opentelemetry-collector-contrib@sha256:42a27d048c35720cf590243223543671e9d9f1ad8537d5a35c4b748fc8ebe873
    lastState:
      terminated:
        containerID: containerd://e7d1583c9d91178c1f649d5d5a4d38f10decbd4a2d921976909e9d6ab5f3ac23
        exitCode: 2
        finishedAt: "2024-04-16T17:30:04Z"
        reason: Error
        startedAt: "2024-04-16T17:29:35Z"
    name: opentelemetry-collector
    ready: false
    restartCount: 11
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=opentelemetry-collector
          pod=opentelemetry-obs-col-2_obs(58012348-343b-4895-a39e-27e49f014ae8)
        reason: CrashLoopBackOff

Kube State Metrics has this modelled as this Prometheus metric:

kube_pod_container_status_waiting_reason
container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<container-waiting-reason>
uid=<pod-uid>

Ref: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/workload/pod-metrics.md

So would be great to have a similar metric.

Describe the solution you'd like

  k8s.container.status_waiting_reason:
    enabled: false
    description: Describes the reason the container is currently in waiting state.
    unit: ""
    attributes:
      - reason
    gauge:
      value_type: int

https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/pod.go#L554-L578

Describe alternatives you've considered

No response

Additional context

No response

@ElfoLiNk ElfoLiNk added enhancement New feature or request needs triage New item requiring triage labels Apr 16, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@povilasv
Copy link
Contributor

povilasv commented Apr 24, 2024

FYI I've opened a PR on semconv for last terminated reason -> open-telemetry/semantic-conventions#922 and looks like some refactorings are needed on my PR. So this time let's first agree if we want this and then make a PR to semconv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants