New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[25.0 backport] logger/journald: fix tailing logs with systemd 255 #47243
Merged
thaJeztah
merged 10 commits into
moby:25.0
from
corhere:backport-25.0/fix-journald-logs-systemd-255
Feb 3, 2024
Merged
[25.0 backport] logger/journald: fix tailing logs with systemd 255 #47243
thaJeztah
merged 10 commits into
moby:25.0
from
corhere:backport-25.0/fix-journald-logs-systemd-255
Feb 3, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Following logs with a non-negative tail when the container log is empty is broken on the journald driver when used with systemd 255. Add tests which cover this edge case to our loggertest suite. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit 9315680) Signed-off-by: Cory Snider <csnider@mirantis.com>
Synthesize a boot ID for journal entries fed into systemd-journal-remote, as required by systemd 255. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit 71bfffd) Signed-off-by: Cory Snider <csnider@mirantis.com>
The journald reader uses a timer to set an upper bound on how long to wait for the final log message of a stopped container. However, the timer channel is only received from in non-blocking select statements! There isn't enough benefit of using a timer to offset the cost of having to manage the timer resource. Setting a deadline and comparing the current time is just as effective, without having to manage the lifecycle of any runtime resources. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit e94ec80) Signed-off-by: Cory Snider <csnider@mirantis.com>
While it doesn't really matter if the reader waits for an extra arbitrary period beyond an arbitrary hardcoded timeout, it's also trivial and cheap to implement, and nice to have. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit d70fe88) Signed-off-by: Cory Snider <csnider@mirantis.com>
errDrainDone is a sentinel error which is never supposed to escape the package. Consequently, it needs to be filtered out of returns all over the place, adding boilerplate. Forgetting to filter out these errors would be a logic bug which the compiler would not help us catch. Replace it with boolean multi-valued returns as they can't be accidentally ignored or propagated. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit 905477c) Signed-off-by: Cory Snider <csnider@mirantis.com>
corhere
added
area/logging
area/systemd
impact/changelog
kind/bugfix
PR's that fix bugs
labels
Jan 29, 2024
vvoland
approved these changes
Jan 29, 2024
cpuguy83
approved these changes
Jan 29, 2024
Moving this to draft, as there's a follow-up that should be included with this; |
The Go race detector was detecting a data race when running the TestLogRead/Follow/Concurrent test against the journald logging driver. The race was in the test harness, specifically syncLogger. The waitOn field would be reassigned each time a log entry is sent to the journal, which is not concurrency-safe. Make it concurrency-safe using the same patterns that are used in the log follower implementation to synchronize with the logger. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit 982e777) Signed-off-by: Cory Snider <csnider@mirantis.com>
Writing the systemd-journal-remote command output directly to os.Stdout and os.Stderr makes it nearly impossible to tell which test case the output is related to when the tests are not run in verbose mode. Extend the journald sender fake to redirect output to the test log so they interleave with the rest of the test output. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit 5792bf7) Signed-off-by: Cory Snider <csnider@mirantis.com>
- Check the return value when logging messages - Log the stream (stdout/stderr) and list of messages that were not read - Wait until the logger is closed before returning early (panic/fatal) Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit 39c5c16) Signed-off-by: Cory Snider <csnider@mirantis.com>
The journald reader test harness injects an artificial asynchronous delay into the logging pipeline: a logged message won't be written to the journal until at least 150ms after the Log() call returns. If a test returns while log messages are still in flight to be written, the logs may attempt to be written after the TempDir has been cleaned up, leading to spurious errors. The logger read tests which interleave writing and reading have to include explicit synchronization points to work reliably with this delay in place. On the other hand, tests should not be required to sync the logger explicitly before returning. Override the Close() method in the test harness wrapper to wait for in-flight logs to be flushed to disk. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit d53b7d7) Signed-off-by: Cory Snider <csnider@mirantis.com>
If a reader has caught up to the logger and is waiting for the next message, it should stop waiting when the logger is closed. Otherwise the reader will unnecessarily wait the full closedDrainTimeout for no log messages to arrive. This case was overlooked when the journald reader was recently overhauled to be compatible with systemd 255, and the reader tests only failed when a logical race happened to settle in such a way to exercise the bugged code path. It was only after implicit flushing on close was added to the journald test harness that the Follow tests would repeatably fail due to this bug. (No new regression tests are needed.) Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit 987fe37) Signed-off-by: Cory Snider <csnider@mirantis.com>
thaJeztah
approved these changes
Feb 3, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
- What I did
- How I did it
By fixing the journald reader to no longer rely upon an invalid assumption about the
sd-journal
reader API. See the individual commit messages for more details.- How to verify it
The journald reader tests pass in the dev container / CI (systemd 253) and in an archlinux container with systemd 255 installed.
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)