Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min_over_time query not working as expected #4367

Closed
eriksjoberg opened this Issue Jul 10, 2018 · 11 comments

Comments

Projects
None yet
4 participants
@eriksjoberg
Copy link

eriksjoberg commented Jul 10, 2018

Bug Report

What did you do?
We had a java based test application which will increment/decrement a specific guage named: metric_1.
Prometheus is configured to scrape this test application for every 15 seconds.
When we tried to get the minimum value of the metric over a time period of 1minute, we observed that the query return value from Prometheus is not correct.

What did you expect to see?

Here is the data available in internal time series database:
curl -g 'http://prometheus:9090/api/v1/query_range?query=metric_1&start=2018-07-10T16:12:00.000Z&end=2018-07-10T16:13:00.000Z&step=1s'

{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"name":"metric_1","app":"pm-testapp","chart":"pm-testapp-1.0.0-1","heritage":"Tiller","instance":"192.168.125.191:8080","job":"pm-testapp","kubernetes_name":"pm-testapp","kubernetes_namespace":"ns","release":"testapp"},"values":[[1531239120,"40"],[1531239121,"40"],[1531239122,"40"],[1531239123,"40"],[1531239124,"40"],[1531239125,"40"],[1531239126,"40"],[1531239127,"40"],[1531239128,"40"],[1531239129,"40"],[1531239130,"40"],[1531239131,"40"],[1531239132,"40"],[1531239133,"40"],[1531239134,"40"],[1531239135,"40"],[1531239136,"40"],[1531239137,"40"],[1531239138,"80"],[1531239139,"80"],[1531239140,"80"],[1531239141,"80"],[1531239142,"80"],[1531239143,"80"],[1531239144,"80"],[1531239145,"80"],[1531239146,"80"],[1531239147,"80"],[1531239148,"80"],[1531239149,"80"],[1531239150,"80"],[1531239151,"80"],[1531239152,"80"],[1531239153,"70"],[1531239154,"70"],[1531239155,"70"],[1531239156,"70"],[1531239157,"70"],[1531239158,"70"],[1531239159,"70"],[1531239160,"70"],[1531239161,"70"],[1531239162,"70"],[1531239163,"70"],[1531239164,"70"],[1531239165,"70"],[1531239166,"70"],[1531239167,"70"],[1531239168,"60"],[1531239169,"60"],[1531239170,"60"],[1531239171,"60"],[1531239172,"60"],[1531239173,"60"],[1531239174,"60"],[1531239175,"60"],[1531239176,"60"],[1531239177,"60"],[1531239178,"60"],[1531239179,"60"],[1531239180,"60"]]}]}}

when we ran min_over_time in the above time period, we should get something like:
curl -g 'http://prometheus:9090/api/v1/query?query=min_over_time(metric_1[1m])&time=2018-07-10T16:13:00.000Z'

{"status":"success","data":{"resultType":"vector","result":[{"metric":{"app":"pm-testapp","chart":"pm-testapp-1.0.0-1","heritage":"Tiller","instance":"192.168.125.191:8080","job":"pm-testapp","kubernetes_name":"pm-testapp","kubernetes_namespace":"ns","release":"testapp"},"value":[1531239180,"40"]}]}}

What did you see instead? Under which circumstances?

curl -g 'http://prometheus:9090/api/v1/query?query=min_over_time(metric_1[1m])&time=2018-07-10T16:13:00.000Z'

{"status":"success","data":{"resultType":"vector","result":[{"metric":{"app":"pm-testapp","chart":"pm-testapp-1.0.0-1","heritage":"Tiller","instance":"192.168.125.191:8080","job":"pm-testapp","kubernetes_name":"pm-testapp","kubernetes_namespace":"ns","release":"testapp"},"value":[1531239180,"60"]}]}}

60 is incorrect value considering the time series information available in prometheus.

Please find the attached graph confirming the same.
metric_1_graph

Environment
Kubernetes

  • System information:
    Run on Kubernetes 1.9.6 version

  • Prometheus version:
    2.2.0

  • Prometheus configuration file:

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 1m
scrape_configs:
- job_name: prometheus
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9090
- job_name: kubernetes-nodes
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: null
    role: node
    namespaces:
      names: []
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: kubernetes.default.svc:443
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics
    action: replace
- job_name: kubernetes-nodes-cadvisor
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: null
    role: node
    namespaces:
      names: []
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: kubernetes.default.svc:443
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    action: replace
- job_name: kubernetes-service-endpoints
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names: []
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    separator: ;
    regex: (https?)
    target_label: __scheme__
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: $1
    action: replace
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    separator: ;
    regex: (.+)(?::\d+);(\d+)
    target_label: __address__
    replacement: $1:$2
    action: replace
  - separator: ;
    regex: __meta_kubernetes_service_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: kubernetes_namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_name
    replacement: $1
    action: replace
- job_name: kubernetes-services
  params:
    module:
    - http_2xx
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /probe
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: service
    namespaces:
      names: []
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    target_label: __param_target
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: blackbox
    action: replace
  - source_labels: [__param_target]
    separator: ;
    regex: (.*)
    target_label: instance
    replacement: $1
    action: replace
  - separator: ;
    regex: __meta_kubernetes_service_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: kubernetes_namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_name
    replacement: $1
    action: replace
- job_name: kubernetes-pods
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: pod
    namespaces:
      names: []
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: $1
    action: replace
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    separator: ;
    regex: (.+):(?:\d+);(\d+)
    target_label: __address__
    replacement: ${1}:${2}
    action: replace
  - separator: ;
    regex: __meta_kubernetes_pod_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: kubernetes_namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_pod_name
    replacement: $1
    action: replace

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 10, 2018

The data shows that the minimum over the minute is 40.

@hoffie

This comment has been minimized.

Copy link

hoffie commented Jul 10, 2018

The data shows that the minimum over the minute is 40.

... yet Prometheus seems to return 60 according the report.

@eriksjoberg: Maybe the output of curl -g 'http://prometheus:9090/api/v1/query?query=metric_1[1m]&time=2018-07-10T16:13:00.000Z' could help finding an explanation for this (see also https://www.robustperception.io/extracting-raw-samples-from-prometheus/)?

@tejaswiniVadlamudi

This comment has been minimized.

Copy link

tejaswiniVadlamudi commented Jul 10, 2018

Prometheus should return 40, but it returns 60 here. Maybe min_over_time function is not working as expected. Is there any way to get more logs from DEBUG log level for this case?

@eriksjoberg

This comment has been minimized.

Copy link
Author

eriksjoberg commented Jul 11, 2018

@hoffie @brian-brazil Hi, yes as Teja wrote I would like to get more DEBUG info. I could not find any useful debug info related to metrics collection or the min value over time using the existing debug.loglevel setting.

The query range command has the below info:

1000 1531239027.674
1000 1531239042.674
1000 1531239057.674
65 1531239072.674
15 1531239087.674
40 1531239102.674
40 1531239117.674
80 1531239132.674
70 1531239147.674
60 1531239162.674
60 1531239177.674

Technically, we should get 40 as mentione in the issue section, but we got 60.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 11, 2018

The query is as expected, the 40 data point is 3 seconds outside the range vector.

@eriksjoberg

This comment has been minimized.

Copy link
Author

eriksjoberg commented Jul 11, 2018

@brian-brazil Hi, thanks for the answer. I am new to Prometheus, but I think I understand now. This is due to the scrape intervals and if the value 40 is there in the beginning of that granularity period but it changes before the first scraping within that granularity period is done it will not be included in the result right?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 11, 2018

It's not to do with scraping, you're most likely getting confused by staleness.

@eriksjoberg

This comment has been minimized.

Copy link
Author

eriksjoberg commented Jul 12, 2018

Hi,
Ok, thanks
So I attached a picture with an example of what we want to confirm.
Please check it.
min_over_time_question

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 12, 2018

That image is confusing, but it'll be 66 if I'm reading it right.

@eriksjoberg

This comment has been minimized.

Copy link
Author

eriksjoberg commented Jul 12, 2018

Ok thanks Brian.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.