Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[loki.source.kubernetes] decrease logLevel of log msg #264

Closed
TheRealNoob opened this issue Jan 25, 2024 · 12 comments · Fixed by grafana/agent#6263
Closed

[loki.source.kubernetes] decrease logLevel of log msg #264

TheRealNoob opened this issue Jan 25, 2024 · 12 comments · Fixed by grafana/agent#6263
Labels
bug Something isn't working frozen-due-to-age

Comments

@TheRealNoob
Copy link

What's wrong?

I am receiving the follow log message multiple times a second. This seems to be expected behavior from this component, however it's also expected behavior from my (dozens of ceph osd) pod. I feel the appropriate solution here would be to decrease this log to level=debug, or alternatively to allow configuration of it somehow.

ts=2024-01-17T01:28:07.837938857Z level=info msg="have not seen a log line in 3x average time between lines, closing and re-opening tailer" target=ceph/rook-ceph-osd-16-6d87ddf895-kghz8:osd component=loki.source.kubernetes.allPods rolling_average=2s time_since_last=6.739761385s

Steps to reproduce

NA

System information

No response

Software version

docker.io/grafana/agent:v0.39.0

Configuration

Running agent using the helm chart in Daemonset Flow mode

Logs

NA
@TheRealNoob TheRealNoob added the bug Something isn't working label Jan 25, 2024
@mattdurham
Copy link
Collaborator

How many times is this being logged for how many targets over 15 minutes?

@TheRealNoob
Copy link
Author

TheRealNoob commented Jan 26, 2024

The screenshot is the results of this query - so it's showing how many log lines per minute. This is over a 3 hour timewindow. Note this is an empty cluster running just system components so the workload is quite light at 176 running pods. I included the tailer stopped; will retry log msg because it seems to be a reaction to the reported event (log msg).

sum by(level) (count_over_time({namespace="monitoring", pod=~"grafana-agent.+"} | logfmt | msg =~ `(have not seen a log line in 3x average time between lines, closing and re-opening tailer|tailer stopped; will retry)` [1m]))

Screenshot_20240126_131117

@TheRealNoob
Copy link
Author

It looks like 55 targets are graphed in this 3 hour window. So almost 1/3rd of my pods. That seems excessive.

on second thought, is this feature really that useful? can it be disabled or configured?

@hainenber
Copy link
Contributor

It looks like 55 targets are graphed in this 3 hour window. So almost 1/3rd of my pods. That seems excessive.

on second thought, is this feature really that useful? can it be disabled or configured?

If your K8s version is < v1.29.1, it's required. Above that, it's not, see kubernetes/kubernetes#115702 and grafana/agent#5623

I made a PR to approach the issue. PTAL if you have time :D

@TheRealNoob
Copy link
Author

Thank you for the explanation that clarifies a lot. I will test it out today/tomorrow. The only thing I spotted in the PR is that the log msg is still level=info. Is that intentional? Does it make sense to change it to debug?

@TheRealNoob
Copy link
Author

TheRealNoob commented Jan 30, 2024

@hainenber I upgraded my cluster from 1.28.1 to 1.29.1 today and it looks to still be closing the connection after 3x avg duration. It seems to be doing it significantly less often, But I think that might be the pods; I don't think I had built the image and implemented it into my values file by the timestamp I'm seeing. I will follow up on this part. On that topic I couldn't find a pre-built image off your branch in the CICD so I built it myself. Below are the steps I took, perhaps I did it wrong?

git clone git@github.com:hainenber/agent.git
git checkout not-restart-tailers-for-k8s-v1.29.1+
DOCKER_BUILDKIT=1 docker build --file cmd/grafana-agent/Dockerfile -t <repo:tag> .

Second thing I noticed is that the kubernetes/kubernetes/pull/115702 PR looks to have been released in 1.29.0, not 1.29.1 (see changelog -- search for 115702).

@mattdurham
Copy link
Collaborator

I agree this should be dropped down to debug

@hainenber
Copy link
Contributor

@TheRealNoob @mattdurham thank you for the feedback! I've made the corrections accordingly.

Btw, re: building a Agent's image, I'd suggest using make agent-image :D (at least that's what I've been using)

@TheRealNoob
Copy link
Author

Thank you @hainenber. I rebuilt my image using your latest commit and it seems to work as expected. However, looking at the code I think I see why it didn't work for me before (again i'm running 1.29.1) and that it's still not quite right. This line checks to see if the k8s version is less than or equal to 1.29. It should just be less than, since 1.29.0 is when the bug was fixed.

Second small thing is a few things like the changelog and a few comments need to be updated to reflect the above.

Thank you

@hainenber
Copy link
Contributor

thanks @TheRealNoob for the testing and findings! I've done all items you've found to reflect them :D

Once again, thanks 🙏

Copy link
Contributor

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

@rfratto
Copy link
Member

rfratto commented Apr 11, 2024

Hi there 👋

On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.

To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)

@rfratto rfratto transferred this issue from grafana/agent Apr 11, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working frozen-due-to-age
Projects
None yet
4 participants