Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loki.source.kubernetes should handle log rotation #5040

Closed
skl opened this issue Aug 31, 2023 · 7 comments · Fixed by #5623
Closed

loki.source.kubernetes should handle log rotation #5040

skl opened this issue Aug 31, 2023 · 7 comments · Fixed by #5623
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. proposal Proposal or RFC

Comments

@skl
Copy link
Contributor

skl commented Aug 31, 2023

Background

Over on the k8s monitoring helm repo, we found that loki.source.kubernetes stops tailing logs after log file rotation.

Proposal

Test again once the underlying issue in Kubernetes being tracked here is resolved:

@captncraig
Copy link
Contributor

Is there any potential for a workaround on this? My fear is that even if k8s makes a fix in a future version, this will still be a problem for years after that until all the cloud providers make that version GA, and everybody upgrades, which can be real slow.

@rfratto
Copy link
Member

rfratto commented Aug 31, 2023

@slim-bean has a potential workaround involving detecting when logs drop off and force resetting the connection. It's nasty, but it works.

@n888
Copy link

n888 commented Sep 11, 2023

Nice to hear there is a potential workaround, @slim-bean would you be able to share what you have so far?

@sharovmerk
Copy link

any news here?

@sharovmerk
Copy link

btw. I see the same behaviour with loki.source.podlogs

@rfratto
Copy link
Member

rfratto commented Oct 11, 2023

I don't have an exact timeline, but I've been told the code that was written to fix this should be shared within the next few weeks.

btw. I see the same behaviour with loki.source.podlogs

loki.source.podlogs and loki.source.kubernetes share the same code for tailing logs, so the fix will resolve the behavior seen in both components.

@rfratto
Copy link
Member

rfratto commented Oct 26, 2023

kubernetes/kubernetes#115702 got merged, which will fix the problem for us, but it's unclear how many versions of Kubernetes the fix will be backported to, and how quickly users will upgrade to get the fix.

In general, we'll still need a workaround for versions of Kubernetes that don't have the fix available.

rfratto added a commit to rfratto/agent that referenced this issue Oct 26, 2023
Versions of Kubernetes that do not contain kubernetes/kubernetes#115702
will fail to detect rolled log files, causing the API to stop sending
logs to the agent for processing.

To work around this, this commit intorduces a rolling average calculator
to determine the average delta between log entries per target. If 3x the
normal delta time has elapsed since the last entry, the tailer is
restarted.

False positives here are acceptable, but false negatives mean that log
lines may not appear for an extended period of time until the rolling
detection succeeds.

Closes grafana#5040
rfratto added a commit to rfratto/agent that referenced this issue Oct 26, 2023
Versions of Kubernetes that do not contain kubernetes/kubernetes#115702
will fail to detect rolled log files, causing the API to stop sending
logs to the agent for processing.

To work around this, this commit intorduces a rolling average calculator
to determine the average delta between log entries per target. If 3x the
normal delta time has elapsed since the last entry, the tailer is
restarted.

False positives here are acceptable, but false negatives mean that log
lines may not appear for an extended period of time until the rolling
detection succeeds.

Closes grafana#5040

Co-authored-by: Edward Welch <edward.welch@grafana.com>
rfratto added a commit to rfratto/agent that referenced this issue Oct 26, 2023
Versions of Kubernetes that do not contain kubernetes/kubernetes#115702
will fail to detect rolled log files, causing the API to stop sending
logs to the agent for processing.

To work around this, this commit intorduces a rolling average calculator
to determine the average delta between log entries per target. If 3x the
normal delta time has elapsed since the last entry, the tailer is
restarted.

False positives here are acceptable, but false negatives mean that log
lines may not appear for an extended period of time until the rolling
detection succeeds.

Closes grafana#5040

Co-authored-by: Edward Welch <edward.welch@grafana.com>
rfratto added a commit that referenced this issue Oct 30, 2023
* component/prometheus: fix panic in interceptor when child isn't set

This commit fixes a panic in prometheus.Interceptor where an interceptor
which doesn't forward samples to another appendable panics when
appending data.

Co-authored-by: Edward Welch <edward.welch@grafana.com>

* loki.source.kubernetes: improve detection of rolled log files

Versions of Kubernetes that do not contain kubernetes/kubernetes#115702
will fail to detect rolled log files, causing the API to stop sending
logs to the agent for processing.

To work around this, this commit intorduces a rolling average calculator
to determine the average delta between log entries per target. If 3x the
normal delta time has elapsed since the last entry, the tailer is
restarted.

False positives here are acceptable, but false negatives mean that log
lines may not appear for an extended period of time until the rolling
detection succeeds.

Closes #5040

Co-authored-by: Edward Welch <edward.welch@grafana.com>

* loki.source.kubernetes: support clustering

Add support for loki.source.kubernetes to distribute targets using
clustering.

Closes #4502

Co-authored-by: Edward Welch <edward.welch@grafana.com>

* loki.source.podlogs: support clustering

Add support for loki.source.podlogs to distribute targets using
clustering.

* service/cluster: add common block for clustering arguments

* remove irrelevant TODO comment

#5623 (comment)

---------

Co-authored-by: Edward Welch <edward.welch@grafana.com>
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. proposal Proposal or RFC
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

6 participants