Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug a missing external metric? #605

Open
pablokbs opened this issue Aug 30, 2023 · 3 comments
Open

How to debug a missing external metric? #605

pablokbs opened this issue Aug 30, 2023 · 3 comments
Assignees
Labels
kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@pablokbs
Copy link

Hello, I've been trying to expose a new (external) metric for a few days now, I can't figure out why is it missing:

This is my current externalRules section in my configmap:

externalRules:
    - metricsQuery: label_replace(label_replace(node_nf_conntrack_entries, "internal_ip",
        "$1", "instance", "([^:]+)(:[0-9]+)?"),"node", "ip-$1-$2-$3-$4.ec2.internal",
        "internal_ip", "(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})") * on (node)
        group_left(label_node_kubernetes_io_node_type) kube_node_labels{label_node_kubernetes_io_node_type="nginx"}
      name:
        as: node_nf_conntrack_entries_nginx
        matches: ""
      resources:
        template: <<.Resource>>
      seriesQuery: '{__name__=~"node_nf_conntrack_entries"}'
    - metricsQuery: label_replace(aws_networkelb_tcp_client_reset_count_sum, "load_balancer_name",
        "${2}_${3}", "load_balancer", "(.*)/(.*)/(.*)")
      name:
        as: aws_networkelb_tcp_client_reset_count_underscore_sum
        matches: ""
      resources:
        template: <<.Resource>>
      seriesQuery: '{__name__=~"aws_networkelb_tcp_client_reset_count_sum"}'
    - metricsQuery: '${1}'
      seriesQuery: '{__name__=~"confluent_kafka_server_consumer_lag_offsets"}'
      resources:
        overrides:
          pod:
            resource: pod
      name:
        matches: ""
        as: "kafka_consumer_lag"

In this example, I have 3 rules, but only 2 of the metrics are showing up when I query the list of metrics:

➜  ~ curl -k http://127.0.0.1:8001/apis/external.metrics.k8s.io/v1beta1/ | jq
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100   536  100   536    0     0    811      0 --:--:-- --:--:-- --:--:--   822
{
 "kind": "APIResourceList",
 "apiVersion": "v1",
 "groupVersion": "external.metrics.k8s.io/v1beta1",
 "resources": [
   {
     "name": "node_nf_conntrack_entries_nginx",
     "singularName": "",
     "namespaced": true,
     "kind": "ExternalMetricValueList",
     "verbs": [
       "get"
     ]
   },
   {
     "name": "aws_networkelb_tcp_client_reset_count_underscore_sum",
     "singularName": "",
     "namespaced": true,
     "kind": "ExternalMetricValueList",
     "verbs": [
       "get"
     ]
   }
 ]
}

I enabled the debug logs in prometheus-adapter, and I see how prometheus-adapter is querying prometheus for the metric and getting a 200.:

I0830 14:38:08.269232       1 api.go:74] GET http://prometheus-operated.monitoring.svc:9090/api/v1/series?match%5B%5D=%7B__name__%3D~%22confluent_kafka_server_consumer_lag_offsets%22%7D&start=1693406228.268 200 OK

Besides that, I don't see anything else in the logs (for example, grepping confluent) that shows me why the metric is not showing up in the external metric list.

How can I debug this?

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 30, 2023
@dgrisonnet
Copy link
Member

/kind support
/assign

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Sep 7, 2023
@dgrisonnet
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 7, 2023
@houms-sony
Copy link

@pablokbs make sure to turn your logLevel to 10 (-v=10). We were having same issue and were able to determine empty data was being returned which is why no metrics is listed in external metrics endpoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants