-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: sort logs by timestamp before writing to Loki #9571
Conversation
Last commit to fix a mistake - unix timestamp in loki stream is a string so logs were lexically sorted. Updated code sorts the entire metrics batch against native time type to avoid messing around with type conversions. Edit: unsure if there are any potential performance concerns with sorting the entire batch of metrics under resource constrained systems or buffer build up. I'm not a programmer so very open to suggestions for improving performance or efficiency. |
Any idea what's wrong with the tail plugin? I have not changed any of the code there and the test passed on my earlier commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not fix issues with metrics arriving late, but I see no harm in making this change anyway.
Looks like new artifacts were built from this PR. Get them here!Artifact URLs |
Yeah that's right. From my point of view that's a problem with the log source, and also server-side Loki problem. Not unique to the Telegraf agent as other agents would have exactly the same problem dealing with those scenarios. Thanks for the approval and merge! |
* origin/master: (183 commits) fix: CrateDB replace dots in tag keys with underscores (influxdata#9566) feat: Pull metrics from multiple AWS CloudWatch namespaces (influxdata#9386) fix: improve Clickhouse corner cases for empty recordset in aggregation queries, fix dictionaries behavior (influxdata#9401) fix(opcua): clean client on disconnect so that connect works cleanly (influxdata#9583) fix: Refactor ec2 init for config-api (influxdata#9576) fix: sort logs by timestamp before writing to Loki (influxdata#9571) fix: muting tests for udp_listener (influxdata#9578) fix: Do not return on disconnect to avoid breaking reconnect (influxdata#9524) fix: Fixing k8s nodes and pods parsing error (influxdata#9581) feat: OpenTelemetry output plugin (influxdata#9228) feat: Support AWS Web Identity Provider (influxdata#9411) fix: upgraded sensu/go to v2.9.0 (influxdata#9577) fix: Normalize unix socket path (influxdata#9554) docs: fix aws ec2 readme inconsistency (influxdata#9567) feat: Modbus Rtu over tcp enhancement (influxdata#9570) docs: information on new conventional commit format (influxdata#9573) docs: Add logo (influxdata#9574) docs: Adding links to net_irtt and dht_sensor external plugins (influxdata#9569) Upgrade hashicorp/consul/api to 1.9.1 (influxdata#9565) Update vmware/govmomi to v0.26.0 (influxdata#9552) Do not skip good quality nodes after a bad quality node is encountered (influxdata#9550) fix test so it hits a fake service (influxdata#9564) Update changelog Fix procstat plugin README to match sample config (influxdata#9553) Fix metrics reported as written but not actually written (influxdata#9526) Prevent segfault in persistent volume claims (influxdata#9549) Update procstat to support cgroup globs & include systemd unit children (Copy of influxdata#7890) (influxdata#9488) Fix attempt to connect to an empty list of servers. (influxdata#9503) Fix handling bool in sql input plugin (influxdata#9540) Suricata alerts (influxdata#9322) Linter fixes for plugins/inputs/[fg]* (influxdata#9387) For Prometheus Input add ability to query Consul Service catalog (influxdata#5464) Support Landing page on Prometheus landing page (influxdata#8641) [Docs] Clarify tagging behavior (influxdata#9461) Change the timeout from all queries to per query (influxdata#9471) Attach the pod labels to the `kubernetes_pod_volume` & `kubernetes_pod_network` metrics. (influxdata#9438) feat(http_listener_v2): allows multiple paths and add path_tag (influxdata#9529) Bug Fix Snmp empty metric name (influxdata#9519) Worktable workfile stats (influxdata#8587) Update Go to v1.16.6 (influxdata#9542) ...
(cherry picked from commit e6abb46)
Required for all PRs:
Related to #9114
Loki has very strict ordering requirements. You cannot submit a log line with a timestamp older than what already exists (for a matching label set). One way around the issue is to overwrite the timestamp using a processor, as discussed in the related PR. However this is not ideal when we want to send our logs with the timestamp of the input source, not the time telegraf processed the metric.
This PR adds an additional step in the Loki plugin to sort the
log data by timestamp in each log streambatch of metrics by timestamp before they are serialised and written to Loki.@eraac - looping you in as the author of the original Loki plugin.
@loganmc10 - may be of interest to you