Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging: add comment about use of unconnected sockets #138

Closed
wants to merge 1 commit into from

Conversation

dfreese
Copy link
Contributor

@dfreese dfreese commented Dec 23, 2022

This references some earlier discussion as to why connect is not used when setting up the socket used to write to the systemd journal. Looking at error reports, it seems as though most of the restarts to journald were due to watchdog timeouts when calling fsync under heavy I/O load. This appears to be fixed in v230, however, the same rationale still holds; defensively specify the address each time instead of relying on journald to stay running.

See: https://www.suse.com/support/kb/doc/?id=000019921

@dfreese
Copy link
Contributor Author

dfreese commented Dec 23, 2022

Looks like connected_to_journal returns false on CI despite the socket being available. I've gone ahead and removed that check (and the nested Option it implied) for now.

@swsnr
Copy link
Collaborator

swsnr commented Dec 23, 2022

Are you looking for a Log implementation for the journal? In this case I've written one already: https://github.com/swsnr/systemd-journal-logger.rs

@swsnr
Copy link
Collaborator

swsnr commented Dec 23, 2022

@dfreese By your commit and your comment, I think you misunderstood what connected_to_journal does. You can't use this function to check that journald runs on the current system. To check that you'd just check that the socket exists, and–if you'd like to be extra sure–send an empty message over the socket (journald explicitly ignores this, so an empty message is the way to check that journald's listening on the other side).

Instead connected_to_journal checks whether the current process can safely upgrade its logging from unstructured text to stderr to structured messages for journald. A process can only do this of course, if its logging output ends up in the journal anyway, i.e. if the stderr of the process directly writes to the systemd journal (instead of, say, a tty or a log file), and that's what connected_to_journal checks. It's normally only ever true for processes directly spawned by systemd (or indirectly if they inherit their parents stderr).

That's a very specific use case, and shouldn't be done in general, because there are valid reasons to write to the journal in other circumstances as well: For instance, if a configuration file explicitly configures logging to journal, a process would not check connected_to_journal but would log to journald anyway.

Hence, adding a connected_to_journal to your new struct would have been quite wrong.

Edit: You can see how this works in our corresponding test. It explicitly starts the test executable under systemd-run to test the function.

Copy link
Collaborator

@swsnr swsnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure whether I like this change, to be honest.

It makes the internal code more complex, but doesn't add anything in return, in my opinion. new only checks whether we can create unbound datagram sockets. This call will realistically only ever fail if the process has run out of fds (which in and by itself is quite unlikely on modern systems), and it's moot to try and catch this error here, because FDs are a dynamic resource.

Effectively we don't even move the time of this check: The once_cell still exists, and we still initialize the socket on the first call to a public function, so we're not even handling this error early.

Since JournalWriter isn't pub, and application can't even use it directly if it wants to check logging early on.

The other new thing is that we can now create JournalWriters for different socket addresses, but then again we can't, because JournalWriter isn't exposed, and not used like this anywhere in the current module. I also don't see a use case to do this, not even for testing (we're trying to talk to journald, so the only way we can test it is with the real journald).

src/logging.rs Outdated
@@ -14,10 +14,10 @@ use std::os::unix::net::UnixDatagram;
use std::str::FromStr;

/// Default path of the systemd-journald `AF_UNIX` datagram socket.
pub static SD_JOURNAL_SOCK_PATH: &str = "/run/systemd/journal/socket";
pub const SD_JOURNAL_SOCK_PATH: &str = "/run/systemd/journal/socket";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Is that a good idea? As far as I understand const it gets inlined into callers, which I don't think is a good thing to do for global environment definitions (what if this path changes), and wouldn't that constitute an ABI break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically it allows a broader set of usages and optimizations.

Rust doesn't have a stable ABI, but I don't think it would be considered an API break to change the value of a const variable. The API is that it's a const &str. You're guaranteeing that any value there will be created using a const value or function, which is a stronger guarantee than static. All that said, I've reverted this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry you're right. So perhaps this is a good change; I believe I don't know Rust so well as to say. @lucab I think this is your call 🙂

@lucab
Copy link
Owner

lucab commented Dec 23, 2022

I basically agree with @swsnr's review above, I wouldn't land this change.

This mostly touches on the same topic of #37. I'm still of the idea that a log interface should be built on top of this, not directly baked in. This is what @swsnr did, but we should probably tweak the docs to advertise this a bit more prominently.

Copy link
Contributor Author

@dfreese dfreese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I conflated my common use-case, when it came to connected_to_journal, (i.e. running as a systemd unit) with how the library should work, sorry about that.

I've gone ahead and added a commit that better describes my final intention here, and I think why this is different that #37. My intention was for JournalWriter to eventually be public, but I held back on making an API change right away. I've just gone ahead and added what I'd expect to be public eventually.

@lucab, I agree with you that this should be a tool for people like @swsnr to build log adapters on top of it. However, one of the problems that I have with the current API is that no potential errors are checked until the first call to journal_send, which I feel like is a problem.

I feel like this also exists in https://github.com/swsnr/systemd-journal-logger.rs, which stores itself in pub static LOG: JournalLog. How I've designed similar loggers to that is using OnceCell and having the init functions call get_or_try_init with the equivalent of JournalWriter:: system_default. This allows programs to fail fast when there's an issue with logging, rather than waiting until the first log invocation to panic.

The goal with this is that users like systemd-journal-logger.rs would be able to call self.journal.send(...) instead of libsystemd::logging::journal_send(...).

src/logging.rs Outdated
@@ -14,10 +14,10 @@ use std::os::unix::net::UnixDatagram;
use std::str::FromStr;

/// Default path of the systemd-journald `AF_UNIX` datagram socket.
pub static SD_JOURNAL_SOCK_PATH: &str = "/run/systemd/journal/socket";
pub const SD_JOURNAL_SOCK_PATH: &str = "/run/systemd/journal/socket";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically it allows a broader set of usages and optimizations.

Rust doesn't have a stable ABI, but I don't think it would be considered an API break to change the value of a const variable. The API is that it's a const &str. You're guaranteeing that any value there will be created using a const value or function, which is a stronger guarantee than static. All that said, I've reverted this.

@swsnr
Copy link
Collaborator

swsnr commented Dec 23, 2022

However, one of the problems that I have with the current API is that no potential errors are checked until the first call to journal_send, which I feel like is a problem.

I understand the idea but as said in my previous comment, I don't think these changes check any relevant errors upfront. As said, I believe all you'll ever see with your changes is a process having exhausted its file descriptors, but you'll likely not see this as early in a process as logging setup (which is likely the very first thing you'll do in main()), and then I'd say it's the applications fault for not having raised its FD soft limits early on.

Any "real" error (i.e. journald not running) will still not surface before the first call to journal_send, because even with these changes you still create an unbound socket.

And the unbound socket is by design to allow application logging survive journald crashes, and it's also how upstream does it, so binding early to see if journald is running is not an option, in my opinion.

You could just check that the socket exists early on, e.g. by sending an empty payload (that's what tracing-journald does), but I don't think it's worth a major refactoring, because it doesn't guarantee that journald is still running by the time you try to send something 🤷🏿

We already discussed to wrap the socket in a struct in #87, and didn't see a convincing use case back then. While I see that your change is kinda the "natural" way to design an API around the journald socket, I also believe that it's an API without a use case 🤷🏿

This references some earlier discussion as to why connect is not used
when setting up the socket used to write to the systemd journal.
Looking at error reports, it seems as though most of the restarts to
journald were due to watchdog timeouts when calling fsync under heavy
I/O load.  This appears to be fixed in v230, however, the same rationale
still holds; defensively specify the address each time instead of
relying on journald to stay running.

See: https://www.suse.com/support/kb/doc/?id=000019921
@dfreese dfreese changed the title logging: consolidate internal logic to use JournalWriter struct logging: add comment about use of unconnected sockets Dec 23, 2022
@dfreese
Copy link
Contributor Author

dfreese commented Dec 23, 2022

@swsnr thanks for the context. I didn't realize that was the reason an unconnected socket was used. It does seem like something is being pessimized by not using a reconnect-on-failure approach, but it seems like most implementations disagree with me, and I haven't measured that.

Since it seems votes are typically against wrapping the socket in a struct I've updated this to just add a comment providing the context you provided here.

@dfreese dfreese closed this Jan 1, 2023
@dfreese dfreese deleted the struct branch January 1, 2023 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants