Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sort logs by timestamp before writing to Loki #9571

Merged
merged 4 commits into from
Aug 5, 2021

Conversation

jhychan
Copy link
Contributor

@jhychan jhychan commented Aug 1, 2021

Required for all PRs:

  • Updated associated README.md.
  • Wrote appropriate unit tests.

Related to #9114

Loki has very strict ordering requirements. You cannot submit a log line with a timestamp older than what already exists (for a matching label set). One way around the issue is to overwrite the timestamp using a processor, as discussed in the related PR. However this is not ideal when we want to send our logs with the timestamp of the input source, not the time telegraf processed the metric.

This PR adds an additional step in the Loki plugin to sort the log data by timestamp in each log stream batch of metrics by timestamp before they are serialised and written to Loki.

@eraac - looping you in as the author of the original Loki plugin.
@loganmc10 - may be of interest to you

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Aug 1, 2021
@jhychan
Copy link
Contributor Author

jhychan commented Aug 1, 2021

Last commit to fix a mistake - unix timestamp in loki stream is a string so logs were lexically sorted. Updated code sorts the entire metrics batch against native time type to avoid messing around with type conversions.

Edit: unsure if there are any potential performance concerns with sorting the entire batch of metrics under resource constrained systems or buffer build up. I'm not a programmer so very open to suggestions for improving performance or efficiency.

@jhychan jhychan changed the title Lokiordered Sort logs by timestamp before writing to Loki Aug 2, 2021
@jhychan
Copy link
Contributor Author

jhychan commented Aug 3, 2021

Any idea what's wrong with the tail plugin? I have not changed any of the code there and the test passed on my earlier commits.

@jhychan jhychan changed the title Sort logs by timestamp before writing to Loki fix: sort logs by timestamp before writing to Loki Aug 3, 2021
Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not fix issues with metrics arriving late, but I see no harm in making this change anyway.

@ssoroka ssoroka merged commit e6abb46 into influxdata:master Aug 5, 2021
@jhychan
Copy link
Contributor Author

jhychan commented Aug 6, 2021

Might not fix issues with metrics arriving late, but I see no harm in making this change anyway.

Yeah that's right. From my point of view that's a problem with the log source, and also server-side Loki problem. Not unique to the Telegraf agent as other agents would have exactly the same problem dealing with those scenarios. Thanks for the approval and merge!

@jhychan jhychan deleted the lokiordered branch August 9, 2021 21:10
phemmer added a commit to phemmer/telegraf that referenced this pull request Aug 13, 2021
* origin/master: (183 commits)
  fix: CrateDB replace dots in tag keys with underscores (influxdata#9566)
  feat: Pull metrics from multiple AWS CloudWatch namespaces (influxdata#9386)
  fix: improve Clickhouse corner cases for empty recordset in aggregation queries, fix dictionaries behavior (influxdata#9401)
  fix(opcua): clean client on disconnect so that connect works cleanly (influxdata#9583)
  fix: Refactor ec2 init for config-api (influxdata#9576)
  fix: sort logs by timestamp before writing to Loki (influxdata#9571)
  fix: muting tests for udp_listener (influxdata#9578)
  fix: Do not return on disconnect to avoid breaking reconnect (influxdata#9524)
  fix: Fixing k8s nodes and pods parsing error (influxdata#9581)
  feat: OpenTelemetry output plugin (influxdata#9228)
  feat: Support AWS Web Identity Provider (influxdata#9411)
  fix: upgraded sensu/go to v2.9.0 (influxdata#9577)
  fix: Normalize unix socket path (influxdata#9554)
  docs: fix aws ec2 readme inconsistency (influxdata#9567)
  feat: Modbus Rtu over tcp enhancement (influxdata#9570)
  docs: information on new conventional commit format (influxdata#9573)
  docs: Add logo (influxdata#9574)
  docs: Adding links to net_irtt and dht_sensor external plugins (influxdata#9569)
  Upgrade hashicorp/consul/api to 1.9.1 (influxdata#9565)
  Update vmware/govmomi to v0.26.0 (influxdata#9552)
  Do not skip good quality nodes after a bad quality node is encountered (influxdata#9550)
  fix test so it hits a fake service (influxdata#9564)
  Update changelog
  Fix procstat plugin README to match sample config (influxdata#9553)
  Fix metrics reported as written but not actually written  (influxdata#9526)
  Prevent segfault in persistent volume claims (influxdata#9549)
  Update procstat to support cgroup globs & include systemd unit children (Copy of influxdata#7890) (influxdata#9488)
  Fix attempt to connect to an empty list of servers. (influxdata#9503)
  Fix handling bool in sql input plugin (influxdata#9540)
  Suricata alerts (influxdata#9322)
  Linter fixes for plugins/inputs/[fg]* (influxdata#9387)
  For Prometheus Input add ability to query Consul Service catalog (influxdata#5464)
  Support Landing page on Prometheus landing page (influxdata#8641)
  [Docs] Clarify tagging behavior (influxdata#9461)
  Change the timeout from all queries to per query (influxdata#9471)
  Attach the pod labels to the `kubernetes_pod_volume` & `kubernetes_pod_network` metrics. (influxdata#9438)
  feat(http_listener_v2): allows multiple paths and add path_tag (influxdata#9529)
  Bug Fix Snmp empty metric name (influxdata#9519)
  Worktable workfile stats (influxdata#8587)
  Update Go to v1.16.6 (influxdata#9542)
  ...
reimda pushed a commit that referenced this pull request Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants