Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to avoid getting stuck when draining the journal #298

Merged

Conversation

nalind
Copy link

@nalind nalind commented Feb 8, 2018

- What I did

Try to fix dockerd-keeps-deleted-journal-files-open-forever problems by keeping the reading goroutine from blocking indefinitely.

- How I did it

When draining the journal, if the output message channel is full, stop reading, at least for now. This keeps us from blocking indefinitely when a client hangs up on us and the channel is full. Doing this requires that we make multiple calls to drainJournal(), even in non-follow mode.

In non-follow mode, keep trying to read until we run out of entries or hit the "until" cutoff, which if not specified, we set to "now". If we don't do that, and we start reading but can't keep up with the journal,
we'll never catch up even after we get to journal messages which were logged long after we started looking for the then-present logs.

The journal API opens the journal files when the journal handle is opened by sd_journal_open(), and uses an inotify descriptor to notice when files have been added or removed, but it doesn't set up that
inotify descriptor until the first time that sd_journal_get_fd() is called (either by a client, or as part of sd_journal_wait()). We hadn't been doing that until our initial read-through of entries was done,
meaning that we've been missing file deletion events that occurred during our first pass at reading the journal. Make that window shorter.

Periodically call sd_journal_process() when reading the journal, to let it skip over and close handles to journal files which have been deleted since we started reading the journal, for cases where our keeping them open contributes to a shortage of disk space.

Treat SD_JOURNAL_INVALIDATE like SD_JOURNAL_APPEND, since sd_journal_process() prefers to return INVALIDATE over APPEND when both are applicable.

Clean up a deferred function call in the journal reading logic.

- How to verify it

One way to force a reasonably high turnover rate in the journal is to set RateLimitBurst=0 and SystemMaxFileSize=10M in /etc/systemd/journald.conf, and use the busybox image running seq with a large number (say, a million or ten million) as its argument.

Running docker logs -f should follow the container's output until the container exits or the daemon closes the logging handle internally. docker logs should follow along until the container exits or it hits the end of the log. After the container exits, either command should run to completion and not hang. If either command is paused at the client (using ctrl-s, waiting several seconds, then ctrl-q to unpause) while running, possibly with an intervening journalctl --vacuum-size1 pruning down the size of the journal, despite any gaps in the output, the ordering between messages received by the client should be preserved. If instead of unpausing the client, the client is stopped with ctrl-c, open descriptors that dockerd was using for reading the journal should be closed.

At pretty much any time, we shouldn't be holding open descriptors to deleted journal files for more than a fraction of a second.

- Description for the changelog

journald log reader: avoid blocking indefinitely / holding open deleted files

When draining the journal, if the output message channel is full, stop
reading.  In non-follow mode, keep reading until we start seeing entries
that post-date the time when we started reading the journal.  Otherwise,
if we can't keep up with the journal, we never actually stop reading.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
The journal API opens the journal files when the journal handle is
opened by sd_journal_open(), and uses an inotify descriptor to notice
when files have been removed, but it doesn't set up that descriptor
until the first time that sd_journal_get_fd() is called (either by a
client, or as part of sd_journal_wait()).

We hadn't been doing that until our initial read-through of entries was
done, meaning that we've been missing file deletion events that occurred
while we were reading the journal.  Make that window shorter.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Periodically call sd_journal_process() when reading the journal, to skip
over journal files which have been deleted since we started reading the
journal, for cases where our keeping them open contributes to a shortage
of disk space.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
When checking for updates to the journal, treat INVALIDATE results like
we do APPEND results, because the library can't signal both, and in
either case we need to figure out what's going on.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Clean up a deferred function call in the journal reading logic.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
When reading from the journal, handle seeking to the read location
outside of the main reading loop.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
If the output channel gets stalled while reading the journal in
non-follow mode, don't stop right away.  Doing this correctly means that
we need to detect when we've hit the end of the log, so start returning
a value indicating whether or not we ran out of entries to read.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Fix a comment.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Log an error if sd_journal_previous() returns an error code.  If it
returns 0, keep accepting that as an indication that we've hit the
beginning of the journal, and that there's nothing to be done.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
@rhatdan
Copy link
Member

rhatdan commented Feb 11, 2018

LGTM

@rhatdan rhatdan merged commit f43d177 into projectatomic:docker-1.13.1 Feb 11, 2018
@nalind nalind deleted the docker-1.13.1-journal-process branch August 23, 2019 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants