Telegraf fails to write data to Elasticsearch #5676

aurimasplu · 2019-04-04T07:41:18Z

Relevant telegraf.conf:

[global_tags]
[agent]
  interval = "1s" 
  round_interval = true
  metric_batch_size = 1000 
  metric_buffer_limit = 1000000 
  collection_jitter = "0s"
  flush_interval = "120s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  logfile = "TELEGRAF-LOG"
  hostname = ""
  omit_hostname = true

[[outputs.elasticsearch]]
  urls = [ "https://my-elastic-node1:9200", "https://my-elastic-node2:9200" ]
  timeout = "5s"
  enable_sniffer = false
  health_check_interval = "0s"
  username = "elasticuser"
  password = "elasticpass"
  index_name = "telegraf-{{measurement_tag}}-%Y.%m.%d"
  default_tag_value = "interface"
  manage_template = true
  template_name = "telegraf"
  overwrite_template = false
  insecure_skip_verify = true
  namepass = ["interface", "LoadbalancerVserver"]

[[outputs.elasticsearch]]
 urls = [ "https://my-elastic-node1:9200", "https://my-elastic-node2:9200" ]
  timeout = "5s"
  enable_sniffer = false
  health_check_interval = "0s"
  username = "elasticuser"
  password = "elasticpass"
  index_name = "telegraf-{{measurement_tag}}-%Y.%m"
  default_tag_value = ""
  manage_template = true
  template_name = "telegraf"
  overwrite_template = false
  insecure_skip_verify = true
  namedrop = ["interface", "LoadbalancerVserver"]

System info:

OS: RHEL 7.6
CPU: 8 cores
RAM: 16G
Telegraf versions:
Issue noticed in telegraf versions:
1.9.4-1
1.10.1-1
Everything was working fine in telegraf version:
1.7.4-1 and lower.

Steps to reproduce:

Run telegraf versions 1.9.4-1 or 1.10.1-1.
We have identical RHEL-7.6 virtual servers with identical telegraf configuration. Only difference one server runs telegraf-1.7.4-1 and other 1.10.1-1.
When we run server with telegraf-1.10.1-1 we dont get most of the metrics and telegraf starts generating log provided below. It seems what telegraf fails to communicate with Elasticsearch.

Also I am collecting telegraf self monitoring with inputs.internal and I see that buffer_size behave different, see screenshot.

Expected behavior:

We run thousands inputs.snmp instances to collect various SNMP counters and write data to two Elasticsearch outputs. With telegraf-1.7.4-1 and lower everything worked fine. Metrics was successfully collected and successfully written to output.

Actual behavior:

After upgrading to telegraf-1.10.1-1 we noticed that we are missing data, some random points gets written but most are missing. We also checked telegraf-1.9.4-1 and got same behavior.

Additional info:

[root@my-server PROD:log]# cat messages | grep telegraf | grep elasticsearch
Apr  4 08:14:51 my-server telegraf: 2019-04-04T06:14:51Z I! Loaded outputs: elasticsearch elasticsearch
Apr  4 08:16:05 my-server telegraf: 2019-04-04T06:16:05Z E! [agent] Error writing to output [elasticsearch]: Error sending bulk request to Elasticsearch: Post https://my-elastic-node1:9200/_bulk: context deadline exceeded
Apr  4 08:16:05 my-server telegraf: 2019-04-04T06:16:05Z E! [agent] Error writing to output [elasticsearch]: Error sending bulk request to Elasticsearch: Post https://my-elastic-node1:9200/_bulk: context deadline exceeded
Apr  4 08:16:10 my-server telegraf: 2019-04-04T06:16:10Z E! [agent] Error writing to output [elasticsearch]: Error sending bulk request to Elasticsearch: Post https://my-elastic-node2:9200/_bulk: context deadline exceeded
Apr  4 08:16:36 my-server telegraf: 2019-04-04T06:16:36Z E! [agent] Error writing to output [elasticsearch]: Error sending bulk request to Elasticsearch: Post https://my-elastic-node2:9200/_bulk: context deadline exceeded
Apr  4 08:18:01 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@785c460b; line: 1, column: 119], i_o_exception
Apr  4 08:18:01 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@567dd72e; line: 1, column: 119], i_o_exception
Apr  4 08:18:01 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@3745d633; line: 1, column: 119], i_o_exception
Apr  4 08:18:01 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@147d29e5; line: 1, column: 120], i_o_exception
Apr  4 08:18:01 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@4e3a341b; line: 1, column: 119], i_o_exception
Apr  4 08:18:01 my-server telegraf: 2019-04-04T06:18:01Z E! [agent] Error writing to output [elasticsearch]: W! Elasticsearch failed to index 5 metrics
Apr  4 08:18:04 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@2bb94494; line: 1, column: 119], i_o_exception
Apr  4 08:18:04 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@7a6a2327; line: 1, column: 119], i_o_exception
Apr  4 08:18:04 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@7eefb0f6; line: 1, column: 119], i_o_exception
Apr  4 08:18:04 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@51d4f645; line: 1, column: 119], i_o_exception
Apr  4 08:18:04 my-server telegraf: at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@3ded514; line: 1, column: 120], i_o_exception
Apr  4 08:18:04 my-server telegraf: 2019-04-04T06:18:04Z E! [agent] Error writing to output [elasticsearch]: W! Elasticsearch failed to index 5 metrics

Also one more thing that was not seen before is that them trying to stop service, telegraf-1.10.1 does not correctly:

systemd[1]: telegraf.service stop-sigterm timed out. Killing.
systemd[1]: telegraf.service: main process exited, code=killed, status=9/KILL
systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
systemd[1]: Unit telegraf.service entered failed state.
systemd[1]: telegraf.service failed.

The text was updated successfully, but these errors were encountered:

danielnelson · 2019-04-23T23:51:50Z

What version of Elasticsearch are you using? Did reverting to Telegraf 1.7 fix the issue?

deepaksood619 · 2019-04-24T12:05:52Z

I am facing the same issue.

2019-04-24T06:10:29Z W! [agent] output "elasticsearch" did not complete within its flush interval
2019-04-24T06:10:59Z W! [agent] output "elasticsearch" did not complete within its flush interval
2019-04-24T06:11:00Z E! [agent] Error writing to output [elasticsearch]: Error sending bulk request to Elasticsearch: Post http://elasticsearch.example.com:9200/_bulk: context deadline exceeded

Elasticsearch version: 6.4.2

Telegraf version (same error for both versions of telegraf)

Telegraf 1.9.2 (git: HEAD dda80799)
Telegraf 1.10.2 (git: HEAD 3303f5c3)

Conf

[[outputs.elasticsearch]]
  urls = ["http://elasticsearch.example.com:9200"]
  timeout = "1m"
  flush_interval = "30s"
  enable_sniffer = false
  health_check_interval = "0s"
  index_name = "device_log-%Y.%m.%d"
  manage_template = true
  template_name = "telegraf"
  overwrite_template = false
  namepass = ["tail"]

[[inputs.tail]]
  files = ["/var/log/electric_meter.log", "/var/log/telegraf/telegraf.log", "/var/log/health-log", "/var/log/syslog"]
  from_beginning = false
  interval = "10s"
  pipe = false
  watch_method = "inotify"
  data_format = "value"
  data_type = "string"

aurimasplu · 2019-04-29T10:47:22Z

@danielnelson we are using Elasticsearch 6.1.1. And we write directly to it, we are not using Logstash or any other queuing product.
And yes, reverting back to Telegraf 1.7 fixed the issue,

sjwang90 · 2020-12-04T20:28:45Z

I came across this issue and I wanted to see if this is still persistent with the latest version of Telegraf as it's been quite awhile.

danielnelson added area/elasticsearch bug unexpected problem or unintended behavior labels Apr 23, 2019

sjwang90 closed this as completed Mar 30, 2022

denisgolius mentioned this issue May 8, 2024

Add Telegraf support for VictoriaLogs VictoriaMetrics/VictoriaMetrics#6244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telegraf fails to write data to Elasticsearch #5676

Telegraf fails to write data to Elasticsearch #5676

aurimasplu commented Apr 4, 2019 •

edited by glinton

Loading

danielnelson commented Apr 23, 2019

deepaksood619 commented Apr 24, 2019 •

edited

Loading

aurimasplu commented Apr 29, 2019

sjwang90 commented Dec 4, 2020

Telegraf fails to write data to Elasticsearch #5676

Telegraf fails to write data to Elasticsearch #5676

Comments

aurimasplu commented Apr 4, 2019 • edited by glinton Loading

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

danielnelson commented Apr 23, 2019

deepaksood619 commented Apr 24, 2019 • edited Loading

aurimasplu commented Apr 29, 2019

sjwang90 commented Dec 4, 2020

aurimasplu commented Apr 4, 2019 •

edited by glinton

Loading

deepaksood619 commented Apr 24, 2019 •

edited

Loading