Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.66.0 - Storage & CPU information no longer present #556

Closed
robertgates55 opened this issue Sep 29, 2020 · 4 comments
Closed

1.66.0 - Storage & CPU information no longer present #556

robertgates55 opened this issue Sep 29, 2020 · 4 comments

Comments

@robertgates55
Copy link

Just upgraded from 1.59 to 1.66 using the helm chart. I'm using an external prometheus and previously have had no problems. I'm running on AWS EKS.

I've now noticed that the only costs being synced seem to be 'Memory' however, causing a large underestimate of the cluster cost! Could anyone point to to how to get Storage & Compute costs back?

image

@AjayTripathy
Copy link
Contributor

HI @robertgates55, thanks for this report.

Regarding the missing Compute graph:

Looks like the kubernetes_node label may be missing from your node-exporter metrics. We should still be collecting costs, just not displaying it correctly on that page. We have a staging build with a fix to display the total cost even when these breakdowns don't exist. Instructions on getting that build are available here: https://github.com/kubecost/docs/blob/master/staging.md

However, you also want to be picking up these metrics to make sure you have the breakdown displayed. Can you also run "node_cpu_seconds_total" and send us a few rows of the results so we can see if this is the cause? If it is, you'll want to add our rules to relabel here:
https://github.com/kubecost/cost-analyzer-helm-chart/blob/develop/cost-analyzer/charts/prometheus/values.yaml#L1257

Regarding the missing Storage graph:
Can you run

sum_over_time(sum(container_fs_usage_bytes{device!="tmpfs", id="/"}) by (instance, cluster_id)[24h:1m]) / 1024 / 1024 / 1024
avg_over_time(sum(container_fs_limit_bytes{device!="tmpfs", id="/"}) by (instance, cluster_id)[24h:1m])

and see if this is working for you?

@robertgates55
Copy link
Author

robertgates55 commented Sep 30, 2020

Top result from each query:

node_cpu_seconds_total:

Element Value
node_cpu_seconds_total{cpu="0",endpoint="metrics",instance="10.209.114.50:9100",job="node-exporter",mode="idle",namespace="monitoring",pod="prometheus-operator-prometheus-node-exporter-p926c",service="prometheus-operator-prometheus-node-exporter"} 365414.98

sum_over_time(sum(container_fs_usage_bytes{device!="tmpfs", id="/"}) by (instance, cluster_id)[24h:1m]) / 1024 / 1024 / 1024:

Element Value
{instance="10.209.178.235:10250"} 27483.565273284912

avg_over_time(sum(container_fs_limit_bytes{device!="tmpfs", id="/"}) by (instance, cluster_id)[24h:1m])

Element Value
{instance="10.209.88.181:10250"} 107361579008

@robertgates55
Copy link
Author

As the complexity of the relabellings etc looks to be increasing (I can imagine maintaining backwards compatibility is a continual battle) I was looking at letting kubecost deploy it's own prometheus.

However, I already have a prometheus-operator install in each of our clusters, and would like to reuse the node-exporter and kube-state-metrics deployments if possible, as per the docs:

"For the best experience, we generally recommend teams use the bundled prometheus-server & grafana but reuse their existing kube-state-metrics and node-exporter deployments if they already exist. This setup allows for the easiest installation process, easiest on-going maintenance, minimal duplication of metrics, and more flexible metric retention."

Could you advise what additional configuration is required to achieve this? I presume prometheus.kubeStateMetrics.enabled: false and prometheus.nodeExporter.enabled: false, but how do I then make the kubecost prometheus scrape the necessary from the existing deployments?

@robertgates55
Copy link
Author

Right. I've moved to using a kubecost prometheus instead, as it feels like aligning everything will be too tricky to maintain. Also found the docs section on using my own nodeExp/stateMetrics.

All good now - thanks!

"Using an existing node exporter
You can use an existing node exporter DaemonSet by setting the prometheus.nodeExporter.enabled and prometheus.serviceAccounts.nodeExporter.create Kubecost helm chart config options to false. More configs options shown here. Note: this requires your existing node exporter to be configured to export metrics on the default port."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants