Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting stuck on Journal Rotation #70

Closed
errm opened this issue Feb 9, 2017 · 8 comments
Closed

Getting stuck on Journal Rotation #70

errm opened this issue Feb 9, 2017 · 8 comments

Comments

@errm
Copy link
Contributor

errm commented Feb 9, 2017

Hi I am trying to debug some issues with my fluentd plugin: fluent-plugins-nursery/fluent-plugin-systemd#27

Essentially when the journal is rotated we stop receiving events... my first go at trying to fix this was to new up a Systemd::Journal every time #move_next returned false... this does not seem to work however.

I am not sure if this is something that should be handled by my plugin or Systemd::Journal, but I hoped you might have some insight that would help fix this?

@ledbettj
Copy link
Owner

ledbettj commented Feb 9, 2017

Hi! That's a good question. It seems like you can tell when an "invalidate" event occurs if you are using wait, which the sd_journal_get_events man page says is:

If SD_JOURNAL_INVALIDATE, journal files were added
or removed (possibly due to rotation). In the latter event, live-view
UIs should probably refresh their entire display, while in the case of
SD_JOURNAL_APPEND, it is sufficient to simply continue reading at the
previous end of the journal.

So it should be sufficient to just call sd_journal_open again (ie, create a new Journal instance) when invalidate returns. However, it also seems like:

my first go at trying to fix this was to new up a Systemd::Journal every time #move_next returned false

Should have solved it as well, although a little heavy-handed.

@ledbettj
Copy link
Owner

ledbettj commented Feb 9, 2017

Let me see if I can reproduce in testing and come up with anything concrete.

@ledbettj
Copy link
Owner

ledbettj commented Feb 9, 2017

I was able to reproduce the issue like so:

#!/usr/bin/env ruby

require 'systemd/journal'

j = Systemd::Journal.new
j.seek(:end)
j.move_previous

loop do
  j.wait 
  while j.move_next
    puts j.current_entry.inspect
  end
end
  • trigger an event that should display (e.g, sudo ls should log to journal).
  • sudo journalctl --rotate will then force a rotation and log it to the current journal.
  • subsequent writes to the journal will not be reported, e.g. sudo ls.

My initial attempt to solve this was the following:

#!/usr/bin/env ruby

require 'systemd/journal'

j = Systemd::Journal.new

j.seek(:end)
j.move_previous
cursor = j.cursor

loop do
  if j.wait == :invalidate
    puts 'received invalidate'
    j = Systemd::Journal.new
    puts "seek: #{j.seek(cursor)}"
    puts "move next: #{j.move_next}"
  end

  while j.move_next
    puts j.current_entry.inspect
    cursor = j.cursor
  end
end

However, it seems the very first time you call sd_journal_wait(0) it _always returns :invalidate. So this blew up pretty hard.

This eventually worked for me:

#!/usr/bin/env ruby

require 'systemd/journal'

j = Systemd::Journal.new

j.seek(:end)
j.move_previous
j.wait(0) # why do we need a wait here?
cursor = j.cursor

loop do
  if j.wait == :invalidate
    puts 'received invalidate'
    j = Systemd::Journal.new
    puts "fake wait returned #{j.wait(0)} " # why do we need a wait here?
    puts "seek: #{j.seek(cursor)}"
    puts "move next: #{j.move_next}"
  end

  while j.move_next
    puts j.current_entry.inspect
    cursor = j.cursor
  end
end

After forcing a rotation with sudo journalctl --rotate, the journal is re-opened (several times, strangely...) and new events are picked up.

So, this is a long winded way of saying, I'm not sure what the fix is here :) It might be worthwhile trying to get some information about why sd_journal_wait is always returning SD_JOURNAL_INVALIDATE on the first invocation, and what the proper response is to that event.

@errm
Copy link
Contributor Author

errm commented Feb 11, 2017

Thanks for your help... digging into the documentation but it is not entirely clear about when invalidate SD_JOURNAL_INVALIDATE is returned https://www.freedesktop.org/software/systemd/man/sd_journal_get_fd.html

I guess I am not that bothered, so long as the behaviour is something I know about...

I will do some more testing, but it is still strange that my heavy handed approach did not work...

@ledbettj
Copy link
Owner

ledbettj commented Feb 13, 2017

I browsed through the source a bit and found that it's expected that the very first call to sd_journal_wait after opening a new journal instance returns SD_JOURNAL_INVALIDATE. journalctl handles this by explicitly opening the fd first.

You can mimic this either by calling journal.wait(0) or journal.send(:file_descriptor) right after opening the journal (and ignoring the results).

FWIW I asked on IRC about the right behaviour when the journal is rotated, but didn't ever get an answer. I would be inclined to try something like this and see if it solves for you:

#!/usr/bin/env ruby

require 'systemd/journal'

j = Systemd::Journal.new
j.wait(0) # make sure initial call to wait doesn't return :invalidate

j.seek(:end)
j.move_previous
cursor = j.cursor

loop do
  r = j.wait
  puts '==================================================================='
  puts "got #{r}"
  puts '==================================================================='

  if r == :invalidate
    sleep(0.25) # without this we see several invalidates in a row.
                # prevent creating new instances unecessarily.
    
    j = Systemd::Journal.new
    j.wait(0) # make sure the next wait returns a useful result.
    j.seek(cursor)
    j.move_next
  end

  while j.move_next
    puts j.current_entry.inspect
    cursor = j.cursor
  end
end

I will look into adding functionality to handle this automatically since it seems like a useful usecase. If we're going to be creating new journal instances frequently, I'll probably also need to update the cleanup code to run immediately instead of relying on a finalizer.

errm added a commit to fluent-plugins-nursery/fluent-plugin-systemd that referenced this issue Feb 17, 2017
@portante
Copy link

@ledbettj, the above code does not seem right, given systemd/systemd#7998 (comment). It seems that as long as sd_journal_get_fd() is called immediately after calling sd_journal_open(), clients can ignore invalidate since the inotify FD is properly tracking the change. And re-opening the journal will likely cause you to miss journal entries, or re-read journal entries at times (we saw that with rsyslog), since the cursor state can become invalid between close and open.

If we instead, expose sd_journal_get_fd() for clients to call, or recommend they fetch the file_descriptor, or always call sd_journal_get_fd() after sd_journal_open(), we'd likely resolve these issues with clients.

errm added a commit to errm/systemd-journal that referenced this issue Jun 12, 2018
This ensures that inotify properly tracks journal rotation fixing ledbettj#70
and should also guard against the issues in systemd/systemd#7998 and rsyslog/rsyslog#2437
@errm
Copy link
Contributor Author

errm commented Jun 12, 2018

So in fluent-plugin-systemd where I am using the new version of systemd-journal that calls file_descriptor on open we no longer need to make the extra call to wait directly after opening...

We do however still need to reload on SD_JOURNAL_INVALIDATE ...

So it seems that this:

clients can ignore invalidate since the inotify FD is properly tracking the change

Is unfortunately not the case (unless I am missing something)

@portante
Copy link

Why do we still need to reload on invalidate? The C code in systemd/systemd#7998 (comment) seems to show that it is not needed, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants