-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loki.source.kubernetes
should handle log rotation
#5040
Comments
Is there any potential for a workaround on this? My fear is that even if k8s makes a fix in a future version, this will still be a problem for years after that until all the cloud providers make that version GA, and everybody upgrades, which can be real slow. |
@slim-bean has a potential workaround involving detecting when logs drop off and force resetting the connection. It's nasty, but it works. |
Nice to hear there is a potential workaround, @slim-bean would you be able to share what you have so far? |
any news here? |
btw. I see the same behaviour with |
I don't have an exact timeline, but I've been told the code that was written to fix this should be shared within the next few weeks.
loki.source.podlogs and loki.source.kubernetes share the same code for tailing logs, so the fix will resolve the behavior seen in both components. |
kubernetes/kubernetes#115702 got merged, which will fix the problem for us, but it's unclear how many versions of Kubernetes the fix will be backported to, and how quickly users will upgrade to get the fix. In general, we'll still need a workaround for versions of Kubernetes that don't have the fix available. |
Versions of Kubernetes that do not contain kubernetes/kubernetes#115702 will fail to detect rolled log files, causing the API to stop sending logs to the agent for processing. To work around this, this commit intorduces a rolling average calculator to determine the average delta between log entries per target. If 3x the normal delta time has elapsed since the last entry, the tailer is restarted. False positives here are acceptable, but false negatives mean that log lines may not appear for an extended period of time until the rolling detection succeeds. Closes grafana#5040
Versions of Kubernetes that do not contain kubernetes/kubernetes#115702 will fail to detect rolled log files, causing the API to stop sending logs to the agent for processing. To work around this, this commit intorduces a rolling average calculator to determine the average delta between log entries per target. If 3x the normal delta time has elapsed since the last entry, the tailer is restarted. False positives here are acceptable, but false negatives mean that log lines may not appear for an extended period of time until the rolling detection succeeds. Closes grafana#5040 Co-authored-by: Edward Welch <edward.welch@grafana.com>
Versions of Kubernetes that do not contain kubernetes/kubernetes#115702 will fail to detect rolled log files, causing the API to stop sending logs to the agent for processing. To work around this, this commit intorduces a rolling average calculator to determine the average delta between log entries per target. If 3x the normal delta time has elapsed since the last entry, the tailer is restarted. False positives here are acceptable, but false negatives mean that log lines may not appear for an extended period of time until the rolling detection succeeds. Closes grafana#5040 Co-authored-by: Edward Welch <edward.welch@grafana.com>
* component/prometheus: fix panic in interceptor when child isn't set This commit fixes a panic in prometheus.Interceptor where an interceptor which doesn't forward samples to another appendable panics when appending data. Co-authored-by: Edward Welch <edward.welch@grafana.com> * loki.source.kubernetes: improve detection of rolled log files Versions of Kubernetes that do not contain kubernetes/kubernetes#115702 will fail to detect rolled log files, causing the API to stop sending logs to the agent for processing. To work around this, this commit intorduces a rolling average calculator to determine the average delta between log entries per target. If 3x the normal delta time has elapsed since the last entry, the tailer is restarted. False positives here are acceptable, but false negatives mean that log lines may not appear for an extended period of time until the rolling detection succeeds. Closes #5040 Co-authored-by: Edward Welch <edward.welch@grafana.com> * loki.source.kubernetes: support clustering Add support for loki.source.kubernetes to distribute targets using clustering. Closes #4502 Co-authored-by: Edward Welch <edward.welch@grafana.com> * loki.source.podlogs: support clustering Add support for loki.source.podlogs to distribute targets using clustering. * service/cluster: add common block for clustering arguments * remove irrelevant TODO comment #5623 (comment) --------- Co-authored-by: Edward Welch <edward.welch@grafana.com>
Background
Over on the k8s monitoring helm repo, we found that
loki.source.kubernetes
stops tailing logs after log file rotation.Proposal
Test again once the underlying issue in Kubernetes being tracked here is resolved:
The text was updated successfully, but these errors were encountered: