1.66.0 - Storage & CPU information no longer present #556

robertgates55 · 2020-09-29T10:29:25Z

Just upgraded from 1.59 to 1.66 using the helm chart. I'm using an external prometheus and previously have had no problems. I'm running on AWS EKS.

I've now noticed that the only costs being synced seem to be 'Memory' however, causing a large underestimate of the cluster cost! Could anyone point to to how to get Storage & Compute costs back?

AjayTripathy · 2020-09-29T21:44:38Z

HI @robertgates55, thanks for this report.

Regarding the missing Compute graph:

Looks like the kubernetes_node label may be missing from your node-exporter metrics. We should still be collecting costs, just not displaying it correctly on that page. We have a staging build with a fix to display the total cost even when these breakdowns don't exist. Instructions on getting that build are available here: https://github.com/kubecost/docs/blob/master/staging.md

However, you also want to be picking up these metrics to make sure you have the breakdown displayed. Can you also run "node_cpu_seconds_total" and send us a few rows of the results so we can see if this is the cause? If it is, you'll want to add our rules to relabel here:
https://github.com/kubecost/cost-analyzer-helm-chart/blob/develop/cost-analyzer/charts/prometheus/values.yaml#L1257

Regarding the missing Storage graph:
Can you run

sum_over_time(sum(container_fs_usage_bytes{device!="tmpfs", id="/"}) by (instance, cluster_id)[24h:1m]) / 1024 / 1024 / 1024
avg_over_time(sum(container_fs_limit_bytes{device!="tmpfs", id="/"}) by (instance, cluster_id)[24h:1m])

and see if this is working for you?

robertgates55 · 2020-09-30T09:21:22Z

Top result from each query:

node_cpu_seconds_total:

Element	Value
node_cpu_seconds_total{cpu="0",endpoint="metrics",instance="10.209.114.50:9100",job="node-exporter",mode="idle",namespace="monitoring",pod="prometheus-operator-prometheus-node-exporter-p926c",service="prometheus-operator-prometheus-node-exporter"}	365414.98

sum_over_time(sum(container_fs_usage_bytes{device!="tmpfs", id="/"}) by (instance, cluster_id)[24h:1m]) / 1024 / 1024 / 1024:

Element	Value
{instance="10.209.178.235:10250"}	27483.565273284912

avg_over_time(sum(container_fs_limit_bytes{device!="tmpfs", id="/"}) by (instance, cluster_id)[24h:1m])

Element	Value
{instance="10.209.88.181:10250"}	107361579008

robertgates55 · 2020-09-30T11:05:05Z

As the complexity of the relabellings etc looks to be increasing (I can imagine maintaining backwards compatibility is a continual battle) I was looking at letting kubecost deploy it's own prometheus.

However, I already have a prometheus-operator install in each of our clusters, and would like to reuse the node-exporter and kube-state-metrics deployments if possible, as per the docs:

"For the best experience, we generally recommend teams use the bundled prometheus-server & grafana but reuse their existing kube-state-metrics and node-exporter deployments if they already exist. This setup allows for the easiest installation process, easiest on-going maintenance, minimal duplication of metrics, and more flexible metric retention."

Could you advise what additional configuration is required to achieve this? I presume prometheus.kubeStateMetrics.enabled: false and prometheus.nodeExporter.enabled: false, but how do I then make the kubecost prometheus scrape the necessary from the existing deployments?

robertgates55 · 2020-09-30T12:53:35Z

Right. I've moved to using a kubecost prometheus instead, as it feels like aligning everything will be too tricky to maintain. Also found the docs section on using my own nodeExp/stateMetrics.

All good now - thanks!

"Using an existing node exporter
You can use an existing node exporter DaemonSet by setting the prometheus.nodeExporter.enabled and prometheus.serviceAccounts.nodeExporter.create Kubecost helm chart config options to false. More configs options shown here. Note: this requires your existing node exporter to be configured to export metrics on the default port."

robertgates55 closed this as completed Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.66.0 - Storage & CPU information no longer present #556

1.66.0 - Storage & CPU information no longer present #556

robertgates55 commented Sep 29, 2020

AjayTripathy commented Sep 29, 2020

robertgates55 commented Sep 30, 2020 •

edited

robertgates55 commented Sep 30, 2020

robertgates55 commented Sep 30, 2020

1.66.0 - Storage & CPU information no longer present #556

1.66.0 - Storage & CPU information no longer present #556

Comments

robertgates55 commented Sep 29, 2020

AjayTripathy commented Sep 29, 2020

robertgates55 commented Sep 30, 2020 • edited

robertgates55 commented Sep 30, 2020

robertgates55 commented Sep 30, 2020

robertgates55 commented Sep 30, 2020 •

edited