-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-state-metrics - write: broken pipe #2261
Comments
/triage accepted |
@logicalhan @dgrisonnet any estimation when are you planning fixing it?
|
I don't have any bandwidth to investigate right now. @CatherineF-dev do you perhaps have some time to take a look at this? |
Ok |
|
this is my deployment: |
We're also using VictoriaMetrics and see the same issue. Scraping component is "VMAgent".
We are seeing this always when the cluster is heavily loaded with a bazillion of Pods running and other churn. We use k-s-m metrics to identify nodes in a JOIN and the metrics always drop out when the churn rises. So it must be related to "amount of stuff happening" for lack of a more cogent description. See drop-outs here: Correlated error logs: |
Facing the same issue, current KSM version: v2.8.2 Logs: E0219 03:57:11.745859 1 metrics_handler.go:213] "Failed to write metrics" err="failed to write help text: write tcp 1xx.xxx.63.7:8080->1xx.168.34.xx:55444: write: broken pipe"
E0219 03:57:11.745869 1 metrics_handler.go:213] "Failed to write metrics" err="failed to write help text: write tcp 1xx.xxx.63.7:8080->1xx.168.34.xx:55444: write: broken pipe"
E0219 03:57:11.745878 1 metrics_handler.go:213] "Failed to write metrics" err="failed to write help text: write tcp 1xx.xxx.63.7:8080->1xx.168.34.xx:55444: write: broken pipe" |
KSM version: v2.10.0 I have been observing the same when there are a lot of pods in the cluster(over 5K). This works properly when the number of pods are lower(under 500). |
qq: could anyone help provide detailed steps to reproduce this issue? Thx! |
cc @bengoldenberg09, could you paste the deployment yaml again? The above code formatting messed up. cc @naweeng, do you remember how to reproduce? |
This error is thrown from here https://github.com/kubernetes/kube-state-metrics/blob/v2.8.2/pkg/metricshandler/metrics_handler.go#L210-L215 with error // w http.ResponseWriter
for _, w := range m.metricsWriters {
err := w.WriteAll(writer)
if err != nil {
klog.ErrorS(err, "Failed to write metrics")
}
}
io.Writer .Write() |
I feel this might be related to go version. Could you try v2.9.0+ which uses golang 1.9? cc @naweeng |
Building with go 1.19, still the same error @CatherineF-dev |
Can you check the scrape_duration_seconds for the job that scrapes it? If there's a broken pipe, the TCP/HTTP connection might get terminated early. |
@bxy4543, could you help reproduce with detailed steps? |
im seeing the same issue on our dev cluster (which is fairly large) |
scrape_duration_seconds{cluster="cluster-name",container="kube-state-metrics",endpoint="http",instance="xxx:8080",job="kube-state-metrics",namespace="vm",pod="victoria-metrics-k8s-stack-kube-state-metrics-847ddd64-splbf",prometheus="vm/victoria-metrics-k8s-stack",service="victoria-metrics-k8s-stack-kube-state-metrics"} 1.602 |
|
I found the cause of the problem:
So I expanded the vmagent parameter |
Thx!
qq: where did you find this error log? Is it from prometheus? |
I am using victoria-metrics, which I found out from the vmagent page target. |
@jgagnon44 @decipher27 @naweeng @towolf could you try the above solution If no issues, will close it. |
/close |
@CatherineF-dev: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What's the equivalent of |
What happened:
After noticing some "anomaliles" in some metrics, did a little Googling and checked the kube-state-metrics pod log. Found numerous messages like the following:
What you expected to happen:
Not really sure, but I am sure these error messages are not good.
How to reproduce it (as minimally and precisely as possible):
I am not sure how to reproduce this, as I don't know what causes it to occur. I found it in the pod logs, then decided to restart the pod. Unfortunately this blew away the old logs and it has yet to appear in the new pod logs.
Anything else we need to know?:
Environment:
I used the kube-prometheus-stack Helm chart (https://github.com/prometheus-community/helm-charts/) to install Prometheus, Grafana, etc. to our Kubernetes cluster. The Helm chart version is 52.1.0. After inspecting the Helm chart dependencies, I believe kube-state-metrics is version 5.14.0.
I did a quick inspection of the pod spec and found the following:
kubectl version
):Our K8s cluster is an on-premises installation. The cluster nodes are running Ubuntu 20.04.
The text was updated successfully, but these errors were encountered: