Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix prometheus operator metrics to work out of the box #4617

Merged
merged 1 commit into from Jan 4, 2022

Conversation

nevalla
Copy link
Contributor

@nevalla nevalla commented Dec 28, 2021

This PR brings back group_left joins removed by #3653. The reason why joins are not working is that prometheus-community/kube-prometheus-stack helm chart has invalid kube-state-metrics serviceMonitor custom resource and kube_pod_info metrics is not available (prometheus-community/helm-charts#1631).

Metrics on Lens IDE should work without making any additional relabelings and that's why bringing these group left joins back.

Fixes: #4435

Signed-off-by: Lauri Nevala <lauri.nevala@gmail.com>
@nevalla nevalla added bug Something isn't working area/metrics All the things related to metrics labels Dec 28, 2021
@nevalla nevalla requested a review from a team as a code owner December 28, 2021 20:37
@nevalla nevalla requested review from jweak, jansav and Nokel81 and removed request for a team December 28, 2021 20:37
@aleksfront aleksfront added this to the 5.3.4 milestone Dec 29, 2021
@nevalla nevalla merged commit 02056a6 into master Jan 4, 2022
@nevalla nevalla deleted the re-make-prometheus-operator-work-ootb branch January 4, 2022 13:43
jim-docker pushed a commit that referenced this pull request Jan 19, 2022
Signed-off-by: Lauri Nevala <lauri.nevala@gmail.com>
@jim-docker jim-docker mentioned this pull request Jan 19, 2022
jim-docker pushed a commit that referenced this pull request Jan 19, 2022
Signed-off-by: Lauri Nevala <lauri.nevala@gmail.com>
Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>
jim-docker added a commit that referenced this pull request Jan 20, 2022
* Use electron.clipboard for all clipboard uses (#4535)

Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Fix ERR_UNSAFE_PORT from LensProxy (#4558)

* Fix ERR_UNSAFE_PORT from LensProxy

- Use the current list of ports from chromium as it is much easier to
  just reject using one of those instead of trying to handle the
  ERR_UNSAFE_PORT laod error from a BrowserWindow.on("did-fail-load")

Signed-off-by: Sebastian Malton <sebastian@malton.name>

* Move all port handling into LensProxy

Signed-off-by: Sebastian Malton <sebastian@malton.name>

* don't use so many exceptions

Signed-off-by: Sebastian Malton <sebastian@malton.name>
Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Fix not being able to clear set cluster icon (#4555)

Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Fix extension engine range not working for some ^ ranges (#4554)

Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Fix crash on NetworkPolicy when matchLabels is missing (#4500)

Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Replace all uses of promiseExec with promiseExecFile (#4514)

Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Less noisy metrics-not-available error logging (#4602)

Signed-off-by: Jari Kolehmainen <jari.kolehmainen@gmail.com>
Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Fix close button overflow in Preferences (#4611)

* Adding basic colors to tailwind theme

Signed-off-by: Alex Andreev <alex.andreev.email@gmail.com>

* Using tailwind inline to style close button

Signed-off-by: Alex Andreev <alex.andreev.email@gmail.com>

* Make Select look similar to inputs

Signed-off-by: Alex Andreev <alex.andreev.email@gmail.com>

* Moving styles into separate module

Signed-off-by: Alex Andreev <alex.andreev.email@gmail.com>

* Convert tailwind commands to css

Signed-off-by: Alex Andreev <alex.andreev.email@gmail.com>
Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Fix prometheus operator metrics work out of the box (#4617)

Signed-off-by: Lauri Nevala <lauri.nevala@gmail.com>
Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Fix CRD.getPreferedVersion() to work based on apiVersion (#4553)

* Fix CRD.getPreferedVersion() to work based on apiVersion

Signed-off-by: Sebastian Malton <sebastian@malton.name>

* Add tests

Signed-off-by: Sebastian Malton <sebastian@malton.name>
Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Fix crash for KubeObjectStore.loadAll() (#4675)

Signed-off-by: Sebastian Malton <sebastian@malton.name>
Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

* Convert CloseButton styles out from css modules (#4723)

* Convert CloseButton styles out from css modules

Signed-off-by: Alex Andreev <alex.andreev.email@gmail.com>

* Fix close button styling

Signed-off-by: Alex Andreev <alex.andreev.email@gmail.com>

* release v5.3.4

Signed-off-by: Jim Ehrismann <jehrismann@mirantis.com>

Co-authored-by: Sebastian Malton <sebastian@malton.name>
Co-authored-by: Jari Kolehmainen <jari.kolehmainen@gmail.com>
Co-authored-by: Alex Andreev <alex.andreev.email@gmail.com>
Co-authored-by: Lauri Nevala <lauri.nevala@gmail.com>
@@ -50,7 +50,7 @@ export class PrometheusOperator extends PrometheusProvider {
case "memoryAllocatableCapacity":
return `sum(kube_node_status_allocatable{node=~"${opts.nodes}", resource="memory"})`;
case "cpuUsage":
return `sum(rate(node_cpu_seconds_total{node=~"${opts.nodes}", mode=~"user|system"}[${this.rateAccuracy}]))`;
return `sum(rate(node_cpu_seconds_total{mode=~"user|system"}[${this.rateAccuracy}])* on (pod,namespace) group_left(node) kube_pod_info{node=~"${opts.nodes}"})`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queries are now wrong again 😉
It should be reversed, join on node, take the specific labels

on (node) group_left(pod, namespace)

Or does the query work for you?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work for us. Many reports about the queries you provided as failing that is why we reverted them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that they need to be reintroduced to fix the compatibilty, but they need to get applied correctly to get it to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, so node_cpu_seconds_total should contain pod and namespace. Maybe a comment about that would be useful. The node exporter in my setup doesn't have those labels. That also belongs into the documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the info, would you mind making an issue to add that to the documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcbachmann how did you install prometheus operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an external node exporter running natively on the hosts.
Therefore it only has the node label. I'm doing this because I have still some other services like databases outside kubernetes.

Might be an edge case and I could migrate that into the cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics All the things related to metrics bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No metrics with 5.3.0-latest.2021125.2
5 participants