Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rate function and telegraf client #1904

Closed
freeseacher opened this Issue Aug 21, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@freeseacher
Copy link

freeseacher commented Aug 21, 2016

What did you do?
i've setup telegraf and prometheus
then ai got data from some telegraf clients to database. then i make a query like this
rate(nsq_client_message_count{topic="metrics",host="wrk01"}[10s])
What did you expect to see?
results from function
What did you see instead? Under which circumstances?
No datapoints found.
pic_8130a8130a8130a

Environment

# docker images | grep prom
prom/prometheus                        latest               62b473b89d8d        4 weeks ago         43.23 MB
prom/alertmanager                      latest               43783869ba8c        6 weeks ago         16.85 MB
  prom:
    image: prom/prometheus:latest
    restart: always
    networks:
      - influx 
    volumes:
      - "/etc/docker-compose/mon/prom:/etc/prometheus"
      - "/data/metrics/prom/:/prometheus-data"
    ports:
      - 9090:9090
    command: -log.level debug -config.file /etc/prometheus/prometheus.yml -storage.local.path /prometheus-data -storage.local.chunk-encoding-version 2  -storage.local.retention 2160h -query.max-concurrency 40 -alertmanager.url=http://alertmanager
  • Prometheus version:
# prometheus -version
prometheus, version 1.0.1 (branch: master, revision: be40190)
  build user:       root@e881b289ce76
  build date:       20160722-19:54:46
  go version:       go1.6.2
  • Prometheus configuration file:
# my global config
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 10s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'tower'

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
        labels:
           env: 'infrastructure'
  - job_name: 'wrk01'
    scrape_interval: 10s
    static_configs:
      - targets: ['10.36.129.11:9126']

telegraf config

# Telegraf configuration

# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.

# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.

# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.

# Global tags can be specified here in key="value" format.
[global_tags]
dc = "BAL" # will tag all metrics with dc=us-east-1
env = "test"

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will cache metric_buffer_limit metrics for each output, and will
  ## flush this buffer on a successful write.
  metric_buffer_limit = 10000
  ## Flush the buffer whenever full, regardless of flush_interval.
  flush_buffer_when_full = true

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. You shouldn't set this below
  ## interval. Maximum flush_interval will be flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## Run telegraf in debug mode
  debug = false
  ## Run telegraf in quiet mode
  quiet = true
  ## Override default hostname, if empty use os.Hostname()
  hostname = "wrk01"

# Configuration for the Prometheus client to spawn
[[outputs.prometheus_client]]
  ## Address to listen on
  listen = "0.0.0.0:9126"

[[inputs.nsq]]
  ## An array of NSQD HTTP API endpoints
  endpoints = ["http://wrk01:4151"]

graph are shown well if i change time to [11s] or more. but i am afraid that data are already resampled. That is not trouble with one metric. the same behaviour for all rate function.
I've already check my time setting from #1022

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 21, 2016

A 10s rate on a 10s scrape will usually not work. I'd suggest at least a 25s rate for this setup so you can handle this, and one failure.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Aug 21, 2016

To expand on this, you can only compute a rate over at least two samples of a counter (this is not a Prometheus restriction, it's a mathematical one). Those two samples will be at least 10 seconds apart when you scrape very 10s. So a 10s window is too small. To consider a larger window of samples, but still use only the latest two points under that interval, you can use irate(foo[1m]) instead of rate(foo[1m]). See also https://prometheus.io/docs/querying/functions/#irate()

@freeseacher

This comment has been minimized.

Copy link
Author

freeseacher commented Aug 21, 2016

Ok, so just to be sure i'm understand well - i can't calculate rate with scrape interval.
that is why scrape interval is about 10 second but always lasts some time.
so if i got metrics rounded to 10s i will got good result. yes?

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.