Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Some values produced by recording rules are inaccurate. #4860

Open
kminehart opened this Issue Nov 12, 2018 · 10 comments

Comments

Projects
None yet
4 participants
@kminehart
Copy link

kminehart commented Nov 12, 2018

Bug Report

What did you do?

One metric I've been collecting via. rules has been inaccurate by up to 500. (in this case, seconds), which is a pretty big difference.

The rule is taking less than 10 ms to complete, so I don't think there's much of a "time drift" issue happening here.

This is the difference between the value in the rule, and the query the rule uses. Since these are, to my understanding, essentially the same thing, they should be 0, if not, then within ±1.

(sum(company:node_cpu_seconds_total) by(mode)) - (sum(node_cpu_seconds_total * on(instance) group_left(node) label_replace(max by(pod_ip, node) (kube_pod_info{pod=~"node-exporter.*"}), "instance", "$1:9100", "pod_ip", "(.*)") * on(node) group_left(label_cs_role) kube_node_labels) by(mode))

The second block would be the query which the rule contains.

image

What did you expect to see?

I expected a reasonable margin of error; this much of a difference in values results in a huge difference in rates.

These graphs tell 2 completely different stories:

This one uses inaccurate data produced by the Rule:

image

This one uses data collected directly from the node_exporters.

image

Environment

  • System information:

I'm using the Prometheus operator on Kubernetes, however I don't think this issue is related to the prometheus operator at all.

  • Prometheus version:

    v2.3.2

  • Prometheus configuration file:

global:
  scrape_interval: 30s
  scrape_timeout: 10s
  evaluation_interval: 30s
  external_labels:
    prom_environment: stage
    prometheus: monitoring/prometheus-remote
    prometheus_replica: prometheus-prometheus-remote-0
alerting:
  alert_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    replacement: $1
    action: labeldrop
rule_files:
- /etc/prometheus/rules/prometheus-prometheus-remote-rulefiles-0/*.yaml
scrape_configs:
- job_name: monitoring/kube-controller-manager-remote/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - kube-system
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: kube-controller-manager
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: http-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_kube_controller_manager]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: http-metrics
    action: replace
- job_name: monitoring/kube-dns-remote/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - kube-system
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: monitoring
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_environment]
    separator: ;
    regex: stage
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: kube-dns
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: http-metrics-dnsmasq
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_kube_dns]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: http-metrics-dnsmasq
    action: replace
- job_name: monitoring/kube-dns-remote/1
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - kube-system
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: monitoring
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_environment]
    separator: ;
    regex: stage
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: kube-dns
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: http-metrics-skydns
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_kube_dns]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: http-metrics-skydns
    action: replace
- job_name: monitoring/kube-scheduler-prometheus-remote/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - kube-system
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: kube-scheduler
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: http-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_kube_scheduler]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: http-metrics
    action: replace
- job_name: monitoring/kube-state-metrics-remote/0
  honor_labels: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - monitoring
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: monitoring
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_environment]
    separator: ;
    regex: stage
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: kube-state-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: http
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: (.+)
    target_label: app
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_environment]
    separator: ;
    regex: (.+)
    target_label: environment
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: (.+)
    target_label: service
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_kube_state_metrics]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: http
    action: replace
- job_name: monitoring/kubelet-remote/0
  honor_labels: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - kube-system
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    insecure_skip_verify: true
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: kubelet
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: https-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_kubelet]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: https-metrics
    action: replace
- job_name: monitoring/kubelet-remote/1
  honor_labels: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics/cadvisor
  scheme: https
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - kube-system
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    insecure_skip_verify: true
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: kubelet
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: https-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_kubelet]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: https-metrics
    action: replace
- job_name: monitoring/node-exporter-remote/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - monitoring
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: monitoring
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_environment]
    separator: ;
    regex: stage
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: node-exporter
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_node_exporter]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: metrics
    action: replace
- job_name: monitoring/prometheus-operator-remote/0
  honor_labels: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - monitoring
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: monitoring
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_environment]
    separator: ;
    regex: stage
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: prometheus-operator
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: http
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: (.+)
    target_label: app
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_environment]
    separator: ;
    regex: (.+)
    target_label: environment
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: (.+)
    target_label: service
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: http
    action: replace
- job_name: monitoring/prometheus-remote/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - monitoring
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: prometheus
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_environment]
    separator: ;
    regex: stage
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: prometheus-remote
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: web
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: (.+)
    target_label: app
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_environment]
    separator: ;
    regex: (.+)
    target_label: environment
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: (.+)
    target_label: service
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_prometheus_remote]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: web
    action: replace
remote_write: <omitted>

Rule config:

record: company:node_cpu_seconds_total expr: node_cpu_seconds_total
  * on(instance) group_left(node) label_replace(max by(pod_ip, node) (kube_pod_info{pod=~"node-exporter.*"}),
  "instance", "$1:9100", "pod_ip", "(.*)") * on(node)
  group_left(label_cs_role) kube_node_labels

@kminehart kminehart changed the title [Bug] Some values produced by Rules are inaccurate. [Bug] Some values produced by recording rules are inaccurate. Nov 13, 2018

@steven-aerts

This comment has been minimized.

Copy link

steven-aerts commented Nov 19, 2018

We are seeing a similar problem on prometheus 2.4.3.

We have a recording rule:

  - record: beagle_register:consumer_in_sync_duration:max_week
    expr: max(max_over_time(beagle_register_consumer_in_sync_duration_seconds{type="effective"}[1w]))

Where the current value of beagle_register:consumer_in_sync_duration:max_week{} is '152849271.75363848' while if we run the expr directly the value we get is 68.377.

We have two prometheus servers running in parallel, and they both give exactly the same values for both queries.

We never saw a deviation before in the values, which we can show with the following graph where we substract both values:
image

We see that the deviation happened on exactly the same timestamp (2018-11-15 06-18-54 UTC) on both servers.

Is there anything we can do to get more information so we can debug the issue?

@kminehart

This comment has been minimized.

Copy link
Author

kminehart commented Nov 21, 2018

Is there anything we can do to get more information so we can debug the issue?

Would also like to know. I'd be happy to debug it if I knew where to start.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Nov 29, 2018

Weird.

@steven-aerts Do you know if there was anything special happening around the timestamp where the rule and normal expression evaluation started diverging? Like a Prometheus version upgrade?

@steven-aerts

This comment has been minimized.

Copy link

steven-aerts commented Nov 29, 2018

@juliusv 2.4.3 was already running for 3 days when we saw this issue.

After four days the issue disappeared again, and the diff became again 0:
image

As this is our only recording rule I cannot compare the behaviour of other rules at that moment in time.

I do not think it is related to server restarts or config reloads, as none of them happened around that time (plotting prometheus_config_last_reload_success_timestamp_seconds):
image

I still find it very suspicious that both 2 replicated prometheus servers saw the issue at exactly the same time.

Thanks,

Steven

@steven-aerts

This comment has been minimized.

Copy link

steven-aerts commented Nov 29, 2018

@juliusv I am now seeing that the diff has shifted over time. As in the graph I posted 10 days ago, the diff starts at 2018-11-15 06-18-54 UTC.
While when I look now on the same servers, the diff has shifted three days to: 2018-11-12 06-51-50 UTC.

That moment is exactly the moment when we upgraded our servers. (verified by looking at prometheus_build_info).

So you are probably right with your guess that it might be upgrade related.

I can not explain the shift, but for me that is not a problem.

Thanks,

Steven

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 29, 2018

The original issue has a sum before rate, so that's likely it.

For the second issue, what does the raw data say?

@steven-aerts

This comment has been minimized.

Copy link

steven-aerts commented Nov 29, 2018

Looking at the raw data for the recording rule I, I get the following when I query beagle_register:consumer_in_sync_duration:max_week[1h] offset 409h:

30.175 @1542013519.653
30.175 @1542013534.653
...
30.175 @1542015514.653
30.175 @1542015529.653
19328319.86419672 @1542015544.653
19328319.86419672 @1542015559.653
...
19328319.86419672 @1542017089.653
19328319.86419672 @1542017104.653

I looked at the raw data for the underlying series by querying beagle_register_consumer_in_sync_duration_seconds{type="effective"}[1h] offset 409h.

I see at the time the recording rule breaks, two new series start publishing data (some data anonymized):

beagle_register_consumer_in_sync_duration_seconds{availability_zone="eu-west-1b",environment="product",instance="***",instance_id="***417",job="beagle-register",role="beagle-register-node",setup="ap"type="effective"}
22.313 @1542015553.56
22.313 @1542015568.56
22.313 @1542015583.56
22.313 @1542015598.56
beagle_register_consumer_in_sync_duration_seconds{availability_zone="eu-west-1b",environment="product",instance="***",instance_id="***9bb",job="beagle-register",role="beagle-register-node",setup="ap",tier_id="tier-ap-b02",type="effective"}
30.289 @1542015558.156
30.289 @1542015573.156
30.289 @1542015588.156

Can this be some kind of race condition?

There are two other series which keep on sending.

Best regards,

Steven

@kminehart

This comment has been minimized.

Copy link
Author

kminehart commented Nov 29, 2018

@brian-brazil

The original issue has a sum before rate, so that's likely it.

Can you elaborate on this?

@steven-aerts

This comment has been minimized.

@kminehart

This comment has been minimized.

Copy link
Author

kminehart commented Nov 29, 2018

oh that makes sense, but I don't think that applies in my case.

In my example, my recording rule is recorded as:

company:node_cpu_seconds_total

and records the value of,

  node_cpu_seconds_total * on(instance) group_left(node) label_replace(
    max by(pod_ip, node) (
        kube_pod_info{pod=~"node-exporter.*"}
    ), "instance", "$1:9100", "pod_ip", "(.*)"
  ) * on(node) group_left(label_cs_role) kube_node_labels) by(mode)

Which is justnode_cpu_seconds_total with some extra labels.

There's no rate being applied to my recording rule to show the inaccuracy being recorded.


We also don't apply a rate to this value in our graphs. We do this label joining so that we can see this metrics without the master node, which is unschedulable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.