-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kubernetes] Enable state_cronjob toggle breaks data collection #10845
Comments
@graphaelli @mlunadia so to bother but I do not really know which label I should apply on this in order to get someone working on it. |
I run a first round of testing locally with kind 1.20.15 and kube-state-metrics:v2.4.2 As you can see since KSM v2.40 the v1 enhancement was added for Cronjobs Some more details from the k8s cluster: kubectl api-resources -o wide | grep -i cronjob
cronjobs cj batch/v1beta1 true CronJob create,delete,deletecollection,get,list,patch,update,watch all
❯ kubectl version
Server Version: v1.20.15 This shows that batch/v1beta1 is the API for cronjobs and NOT batch/v1 we expect in our implementation Also in KSM logs we can see W0827 14:49:00.631664 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1.CronJob: the server could not find the requested resource
E0827 14:49:00.631716 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.CronJob: failed to list *v1.CronJob: the server could not find the requested resource
W0827 14:49:03.377860 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: the server could not find the requested resource
E0827 14:49:03.377887 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: the server could not find the requested resource To our ES setup now with k8s integration installed:
You can see that we receive metrics: {"log.level":"info","@timestamp":"2024-08-27T14:50:36.569Z","message":"Non-zero metrics in the last 30s","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"kubernetes/metrics-default","type":"kubernetes/metrics"},"log":{"source":"kubernetes/metrics-default"},"log.logger":"monitoring","log.origin":{"file.line":187,"file.name":"log/log.go","function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot"},"service.name":"metricbeat","monitoring":{"ecs.version":"1.6.0","metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":2813865984}}}},"cpu":{"system":{"ticks":4740,"time":{"ms":40}},"total":{"ticks":11710,"time":{"ms":120},"value":11710},"user":{"ticks":6970,"time":{"ms":80}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":19},"info":{"ephemeral_id":"b94ac108-249f-4c69-9708-f7393f4aa056","uptime":{"ms":3630055},"version":"8.14.3"},"memstats":{"gc_next":80328416,"memory_alloc":39507344,"memory_total":1538689880,"rss":208138240},"runtime":{"goroutines":252}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":14}},"output":{"events":{"acked":161,"active":0,"batches":3,"total":161},"read":{"bytes":4639,"errors":3},"write":{"bytes":40944,"latency":{"histogram":{"count":332,"max":92,"mean":21.656626506024097,"median":20,"min":12,"p75":24,"p95":30.349999999999984,"p99":38.669999999999995,"p999":92,"stddev":5.892320768697601}}}},"pipeline":{"clients":14,"events":{"active":37,"published":150,"total":150},"queue":{"acked":161}}},"metricbeat":{"kubernetes":{"state_container":{"events":42,"success":42},"state_daemonset":{"events":9,"success":9},"state_deployment":{"events":9,"success":9},"state_job":{"events":9,"success":9},"state_namespace":{"events":15,"success":15},"state_node":{"events":3,"success":3},"state_pod":{"events":42,"success":42},"state_replicaset":{"events":9,"success":9},"state_service":{"events":9,"success":9},"state_storageclass":{"events":3,"success":3}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":1.64,"15":0.7,"5":0.93,"norm":{"1":0.328,"15":0.14,"5":0.186}}}}},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-08-27T14:54:02.373Z","message":"W0827 14:54:02.373061 33100 reflector.go:324] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.CronJob: the server could not find the requested resource","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"kubernetes/metrics-default","type":"kubernetes/metrics"},"log":{"source":"kubernetes/metrics-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-08-27T14:54:02.373Z","message":"E0827 14:54:02.373092 33100 reflector.go:138] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.CronJob: failed to list *v1.CronJob: the server could not find the requested resource","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"kubernetes/metrics-default","type":"kubernetes/metrics"},"log":{"source":"kubernetes/metrics-default"},"ecs.version":"1.6.0"} Then ingestion stops ! @jlind23 did u see the same behaviour? |
@gizas yes this is exactly the behaviour we observed! |
While working on debugging a case we found out that enabling the state_cronjob put the Elastic Agent leader in some sort of a frozen state and no error/warn logs were generated.
Kubernetes version:
1.20.11
Kube state metrics:
2.4.2
Stack version:
8.14.3
The text was updated successfully, but these errors were encountered: