Skip to content

Telegraf is not flushing the buffer in downtime scenarios #16615

@bogdanml999

Description

@bogdanml999

Relevant telegraf.conf

[agent]
  interval = "1s"
  metric_batch_size = 5
  flush_interval = "5s"
  flush_jitter = "5s"
  metric_buffer_limit = 10
  buffer_strategy = "disk"
  buffer_directory = "/data/var/telegraf/data"
  ## Log at debug level
  debug = true
  ## Log only error level messages
  quiet = false

# Read metrics from MQTT
[[inputs.mqtt_consumer]]
   servers = ["localhost:1883"]
   ## Topics subscribed to.
   topics = [
     "MON"
   ]
   data_format = "json"



# Send metrics to InfluxDB
[[outputs.influxdb_v2]]
  urls = ["http://url:8086"]
  token = "<token>"
  
 
  organization = "<org>"
  bucket = "<bucket name>"

Logs from Telegraf

2025-03-11T14:00:40Z I! Loading config: /data/config/telegraf/telegraf.conf
2025-03-11T14:00:40Z W! Using disk buffer strategy for plugin outputs.influxdb_v2, this is an experimental feature
2025-03-11T14:00:40Z I! Starting Telegraf 1.34.0 brought to you by InfluxData the makers of InfluxDB
2025-03-11T14:00:40Z I! Available plugins: 239 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 6 secret-stores
2025-03-11T14:00:40Z I! Loaded inputs: mqtt_consumer
2025-03-11T14:00:40Z I! Loaded aggregators:
2025-03-11T14:00:40Z I! Loaded processors:
2025-03-11T14:00:40Z I! Loaded secretstores:
2025-03-11T14:00:40Z I! Loaded outputs: influxdb_v2
2025-03-11T14:00:40Z I! Tags enabled: host=xxx
2025-03-11T14:00:40Z I! [agent] Config: Interval:1s, Quiet:false, Hostname:"xxx", Flush Interval:5s
2025-03-11T14:00:40Z W! [agent] The default value of 'skip_processors_after_aggregators' will change to 'true' with Telegraf v1.40.0! If you need the current default behavior, please explicitly set the option to 'false'!
2025-03-11T14:00:40Z D! [agent] Initializing plugins
2025-03-11T14:00:40Z W! [inputs.mqtt_consumer] Server "localhost:1883" should be updated to use `scheme://host:port` format
2025-03-11T14:00:40Z D! [agent] Connecting outputs
2025-03-11T14:00:40Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2025-03-11T14:00:40Z D! [agent] Successfully connected to outputs.influxdb_v2
2025-03-11T14:00:40Z D! [agent] Starting service inputs
2025-03-11T14:00:40Z I! [inputs.mqtt_consumer] Connected [localhost:1883]
2025-03-11T14:00:50Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:00:56Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:01:03Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:01:09Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:01:16Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:01:24Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:01:32Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:01:42Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:01:50Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:01:57Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:02:04Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics
2025-03-11T14:02:12Z D! [outputs.influxdb_v2] Buffer fullness: 55 metrics

System info

Telegraf 1.33.3 and Telegraf 1.34

Docker

No response

Steps to reproduce

  1. Run Telegraf using the configuration given.
  2. Simulate breakage of connection between Telegraf and Influxdb either by stopping Influxdb or changing the token to a bad one inside the Telegraf configuration.
  3. Undo step 2.

Expected behavior

At the moment I try to configure Telegraf so that in a case of a downtime scenario (whether the machine running Telegraf has no internet or Influxdb is down) the metrics gathered while down will be sent after recovery.
I intended to use buffer_strategy "disk" as solution, so that when for example the Influxdb is down, the buffer increases and when the Influxdb is back, telegraf flushes the buffer, buffer becoming empty.

Actual behavior

After running my configurations I noticed two different situations after the simulated downtime:

  1. Telegraf sends new metrics to Influxdb but the buffer fullness remains the same and is never flushed.
  2. All the metrics that Telegraf should send to InfluxDb are stored in the buffer so the buffer keeps increasing and is never flushed.

Additional info

Did I misunderstood how the setting buffer_strategy set to disk is working? Is there another approach to this problem?

Thank you,
Bogdan

Metadata

Metadata

Assignees

Labels

bugunexpected problem or unintended behaviorwaiting for responsewaiting for response from contributor

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions