Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicate logs from docker containers #11563

Merged
merged 4 commits into from Jan 19, 2024
Merged

Fix duplicate logs from docker containers #11563

merged 4 commits into from Jan 19, 2024

Conversation

ptqa
Copy link
Contributor

@ptqa ptqa commented Dec 29, 2023

This fixes various issues with docker_sd on promtail. Mostly related to duplicate logs being send to loki from promtail.

Root case of issue is that positions file is updated after process function, but in memory field since of Target struct is not updated.

Which issue(s) this PR fixes:
Fixes #7382 #7103 and some others

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@ptqa ptqa requested a review from a team as a code owner December 29, 2023 14:04
@CLAassistant
Copy link

CLAassistant commented Dec 29, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

Trivy scan found the following vulnerabilities:

  • HIGH, Target: docker.io/grafana/loki:main-76bf505 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
  • HIGH, Target: docker.io/grafana/loki:main-76bf505 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0
    \nTo see more details on these vulnerabilities, and how/where to fix them, please run docker build -t grafana/loki:main-76bf505 -f cmd/loki/Dockerfile .
    trivy i grafana/loki:main-76bf505 on your branch. If these were not introduced by your PR, please considering fixing them in via a subsequent PR. Thanks!

@choooze
Copy link

choooze commented Jan 9, 2024

bump

@cstyan
Copy link
Contributor

cstyan commented Jan 10, 2024

A test should be added here to show the bug and that this is fixing it.

@ptqa
Copy link
Contributor Author

ptqa commented Jan 17, 2024

@cstyan I would like to point out few things:

  1. This bug was confirmed a year ago by one of the contributors (see Duplicate log entries for short-lived container #7382 (comment))
  2. Since we only can test docker API interactions via mock API, to reproduce this bug with mock API I would need to:
  • fully reproduce log endpoint behavior with timestamp support
  • make mock API dynamic, so it would simulate container start/stop/start

This is non-trivial and requires lots of work for a simple bug in logic. Feel like too much for this.

@cstyan
Copy link
Contributor

cstyan commented Jan 17, 2024

Given that it's been more than a year since someone on our team looked into it and we don't use the promtail with THE docker target ourselves (and actually aren't running promtail anymore, we run https://github.com/grafana/agent so this fix is likely relevant to them as well), some kind of a test to validate the behaviour feels like a reasonable ask.

On paper this does seem like the right fix, your explanation of the root cause makes sense. There's an existing test in target_test.go that creates an instance of the docker client, a positions file, and a set of fake data to send to the client via a mock entry handler. It seems like this could be extended to then stop the targets process loop, start it again, resend the same dataset, and ensure that we threw away all of those log lines because t.since was set properly. If you take a look at everything here and it still doesn't seem feasible to add a test then please let me know.

@ptqa
Copy link
Contributor Author

ptqa commented Jan 19, 2024

@cstyan gotcha, your suggested approach indeed is simpler and doesn't require this much work. I've added tests to this bug, can you check?

@cstyan
Copy link
Contributor

cstyan commented Jan 19, 2024

Very nice, confirmed the test fails without the t.since change and passes with it. Just waiting to confirm that the drone build is green 👍

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

green, lgtm

@cstyan
Copy link
Contributor

cstyan commented Jan 19, 2024

@ptqa not sure what the cause of the conflict is but since there isn't a file specified my first assumption is that it's within your fork somehow? I'd suggest merging grafana/loki:main into your lidofinance:main branch.

@ptqa
Copy link
Contributor Author

ptqa commented Jan 19, 2024

@cstyan yeah it's a CHANGELOG file, I've merged main branch

@cstyan cstyan merged commit b4b0bd7 into grafana:main Jan 19, 2024
7 checks passed
@cstyan
Copy link
Contributor

cstyan commented Jan 19, 2024

@ptqa awesome, thanks for your contribution and patience

@ptqa
Copy link
Contributor Author

ptqa commented Jan 19, 2024

@cstyan thank you for the guidance and merging this PR

rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
This fixes various issues with docker_sd on promtail. Mostly related to
duplicate logs being send to loki from promtail.

Root case of issue is that positions file is updated after process
function, but in memory field `since` of `Target` struct is not updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate log entries for short-lived container
4 participants