[loki.source.kubernetes] decrease logLevel of log msg #264

TheRealNoob · 2024-01-25T23:34:06Z

What's wrong?

I am receiving the follow log message multiple times a second. This seems to be expected behavior from this component, however it's also expected behavior from my (dozens of ceph osd) pod. I feel the appropriate solution here would be to decrease this log to level=debug, or alternatively to allow configuration of it somehow.

ts=2024-01-17T01:28:07.837938857Z level=info msg="have not seen a log line in 3x average time between lines, closing and re-opening tailer" target=ceph/rook-ceph-osd-16-6d87ddf895-kghz8:osd component=loki.source.kubernetes.allPods rolling_average=2s time_since_last=6.739761385s

Steps to reproduce

NA

System information

No response

Software version

docker.io/grafana/agent:v0.39.0

Configuration

Running agent using the helm chart in Daemonset Flow mode

Logs

NA

The text was updated successfully, but these errors were encountered:

mattdurham · 2024-01-26T15:42:51Z

How many times is this being logged for how many targets over 15 minutes?

TheRealNoob · 2024-01-26T21:13:27Z

The screenshot is the results of this query - so it's showing how many log lines per minute. This is over a 3 hour timewindow. Note this is an empty cluster running just system components so the workload is quite light at 176 running pods. I included the tailer stopped; will retry log msg because it seems to be a reaction to the reported event (log msg).

sum by(level) (count_over_time({namespace="monitoring", pod=~"grafana-agent.+"} | logfmt | msg =~ `(have not seen a log line in 3x average time between lines, closing and re-opening tailer|tailer stopped; will retry)` [1m]))

TheRealNoob · 2024-01-26T23:09:23Z

It looks like 55 targets are graphed in this 3 hour window. So almost 1/3rd of my pods. That seems excessive.

on second thought, is this feature really that useful? can it be disabled or configured?

hainenber · 2024-01-27T15:28:38Z

It looks like 55 targets are graphed in this 3 hour window. So almost 1/3rd of my pods. That seems excessive.

on second thought, is this feature really that useful? can it be disabled or configured?

If your K8s version is < v1.29.1, it's required. Above that, it's not, see kubernetes/kubernetes#115702 and grafana/agent#5623

I made a PR to approach the issue. PTAL if you have time :D

TheRealNoob · 2024-01-29T18:14:52Z

Thank you for the explanation that clarifies a lot. I will test it out today/tomorrow. The only thing I spotted in the PR is that the log msg is still level=info. Is that intentional? Does it make sense to change it to debug?

TheRealNoob · 2024-01-30T01:55:40Z

@hainenber I upgraded my cluster from 1.28.1 to 1.29.1 today and it looks to still be closing the connection after 3x avg duration. It seems to be doing it significantly less often, But I think that might be the pods; I don't think I had built the image and implemented it into my values file by the timestamp I'm seeing. I will follow up on this part. On that topic I couldn't find a pre-built image off your branch in the CICD so I built it myself. Below are the steps I took, perhaps I did it wrong?

git clone git@github.com:hainenber/agent.git
git checkout not-restart-tailers-for-k8s-v1.29.1+
DOCKER_BUILDKIT=1 docker build --file cmd/grafana-agent/Dockerfile -t <repo:tag> .

Second thing I noticed is that the kubernetes/kubernetes/pull/115702 PR looks to have been released in 1.29.0, not 1.29.1 (see changelog -- search for 115702).

mattdurham · 2024-01-30T14:39:02Z

I agree this should be dropped down to debug

hainenber · 2024-02-05T09:37:27Z

@TheRealNoob @mattdurham thank you for the feedback! I've made the corrections accordingly.

Btw, re: building a Agent's image, I'd suggest using make agent-image :D (at least that's what I've been using)

TheRealNoob · 2024-02-13T20:45:34Z

Thank you @hainenber. I rebuilt my image using your latest commit and it seems to work as expected. However, looking at the code I think I see why it didn't work for me before (again i'm running 1.29.1) and that it's still not quite right. This line checks to see if the k8s version is less than or equal to 1.29. It should just be less than, since 1.29.0 is when the bug was fixed.

Second small thing is a few things like the changelog and a few comments need to be updated to reflect the above.

Thank you

hainenber · 2024-02-15T09:44:11Z

thanks @TheRealNoob for the testing and findings! I've done all items you've found to reflect them :D

Once again, thanks 🙏

github-actions · 2024-03-17T00:11:06Z

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

rfratto · 2024-04-11T20:25:23Z

Hi there 👋

On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.

To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)

TheRealNoob added the bug Something isn't working label Jan 25, 2024

hainenber mentioned this issue Jan 27, 2024

feat(loki/src/k8s): not restart tailers in loki.source.kubernetes component by above-average time deltas if K8s version is > 1.29.0 grafana/agent#6263

Merged

1 task

github-actions bot added the needs-attention label Mar 17, 2024

rfratto transferred this issue from grafana/agent Apr 11, 2024

github-actions bot removed the needs-attention label Apr 13, 2024

tpaschalis closed this as completed in grafana/agent#6263 Apr 18, 2024

github-actions bot added the frozen-due-to-age label May 19, 2024

github-actions bot locked as resolved and limited conversation to collaborators May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[loki.source.kubernetes] decrease logLevel of log msg #264

[loki.source.kubernetes] decrease logLevel of log msg #264

TheRealNoob commented Jan 25, 2024

mattdurham commented Jan 26, 2024

TheRealNoob commented Jan 26, 2024 •

edited

TheRealNoob commented Jan 26, 2024

hainenber commented Jan 27, 2024

TheRealNoob commented Jan 29, 2024

TheRealNoob commented Jan 30, 2024 •

edited

mattdurham commented Jan 30, 2024

hainenber commented Feb 5, 2024

TheRealNoob commented Feb 13, 2024

hainenber commented Feb 15, 2024

github-actions bot commented Mar 17, 2024

rfratto commented Apr 11, 2024

[loki.source.kubernetes] decrease logLevel of log msg #264

[loki.source.kubernetes] decrease logLevel of log msg #264

Comments

TheRealNoob commented Jan 25, 2024

What's wrong?

Steps to reproduce

System information

Software version

Configuration

Logs

mattdurham commented Jan 26, 2024

TheRealNoob commented Jan 26, 2024 • edited

TheRealNoob commented Jan 26, 2024

hainenber commented Jan 27, 2024

TheRealNoob commented Jan 29, 2024

TheRealNoob commented Jan 30, 2024 • edited

mattdurham commented Jan 30, 2024

hainenber commented Feb 5, 2024

TheRealNoob commented Feb 13, 2024

hainenber commented Feb 15, 2024

github-actions bot commented Mar 17, 2024

rfratto commented Apr 11, 2024

TheRealNoob commented Jan 26, 2024 •

edited

TheRealNoob commented Jan 30, 2024 •

edited