logging: add comment about use of unconnected sockets #138

dfreese · 2022-12-23T05:21:20Z

This references some earlier discussion as to why connect is not used when setting up the socket used to write to the systemd journal. Looking at error reports, it seems as though most of the restarts to journald were due to watchdog timeouts when calling fsync under heavy I/O load. This appears to be fixed in v230, however, the same rationale still holds; defensively specify the address each time instead of relying on journald to stay running.

See: https://www.suse.com/support/kb/doc/?id=000019921

dfreese · 2022-12-23T05:34:32Z

Looks like connected_to_journal returns false on CI despite the socket being available. I've gone ahead and removed that check (and the nested Option it implied) for now.

swsnr · 2022-12-23T07:34:24Z

Are you looking for a Log implementation for the journal? In this case I've written one already: https://github.com/swsnr/systemd-journal-logger.rs

swsnr · 2022-12-23T08:10:29Z

@dfreese By your commit and your comment, I think you misunderstood what connected_to_journal does. You can't use this function to check that journald runs on the current system. To check that you'd just check that the socket exists, and–if you'd like to be extra sure–send an empty message over the socket (journald explicitly ignores this, so an empty message is the way to check that journald's listening on the other side).

Instead connected_to_journal checks whether the current process can safely upgrade its logging from unstructured text to stderr to structured messages for journald. A process can only do this of course, if its logging output ends up in the journal anyway, i.e. if the stderr of the process directly writes to the systemd journal (instead of, say, a tty or a log file), and that's what connected_to_journal checks. It's normally only ever true for processes directly spawned by systemd (or indirectly if they inherit their parents stderr).

That's a very specific use case, and shouldn't be done in general, because there are valid reasons to write to the journal in other circumstances as well: For instance, if a configuration file explicitly configures logging to journal, a process would not check connected_to_journal but would log to journald anyway.

Hence, adding a connected_to_journal to your new struct would have been quite wrong.

Edit: You can see how this works in our corresponding test. It explicitly starts the test executable under systemd-run to test the function.

swsnr

I'm unsure whether I like this change, to be honest.

It makes the internal code more complex, but doesn't add anything in return, in my opinion. new only checks whether we can create unbound datagram sockets. This call will realistically only ever fail if the process has run out of fds (which in and by itself is quite unlikely on modern systems), and it's moot to try and catch this error here, because FDs are a dynamic resource.

Effectively we don't even move the time of this check: The once_cell still exists, and we still initialize the socket on the first call to a public function, so we're not even handling this error early.

Since JournalWriter isn't pub, and application can't even use it directly if it wants to check logging early on.

The other new thing is that we can now create JournalWriters for different socket addresses, but then again we can't, because JournalWriter isn't exposed, and not used like this anywhere in the current module. I also don't see a use case to do this, not even for testing (we're trying to talk to journald, so the only way we can test it is with the real journald).

swsnr · 2022-12-23T08:14:19Z

src/logging.rs

@@ -14,10 +14,10 @@ use std::os::unix::net::UnixDatagram;
 use std::str::FromStr;

 /// Default path of the systemd-journald `AF_UNIX` datagram socket.
-pub static SD_JOURNAL_SOCK_PATH: &str = "/run/systemd/journal/socket";
+pub const SD_JOURNAL_SOCK_PATH: &str = "/run/systemd/journal/socket";


Why this change? Is that a good idea? As far as I understand const it gets inlined into callers, which I don't think is a good thing to do for global environment definitions (what if this path changes), and wouldn't that constitute an ABI break?

Basically it allows a broader set of usages and optimizations.

Rust doesn't have a stable ABI, but I don't think it would be considered an API break to change the value of a const variable. The API is that it's a const &str. You're guaranteeing that any value there will be created using a const value or function, which is a stronger guarantee than static. All that said, I've reverted this.

Ah, sorry you're right. So perhaps this is a good change; I believe I don't know Rust so well as to say. @lucab I think this is your call 🙂

lucab · 2022-12-23T08:48:51Z

I basically agree with @swsnr's review above, I wouldn't land this change.

This mostly touches on the same topic of #37. I'm still of the idea that a log interface should be built on top of this, not directly baked in. This is what @swsnr did, but we should probably tweak the docs to advertise this a bit more prominently.

dfreese

I conflated my common use-case, when it came to connected_to_journal, (i.e. running as a systemd unit) with how the library should work, sorry about that.

I've gone ahead and added a commit that better describes my final intention here, and I think why this is different that #37. My intention was for JournalWriter to eventually be public, but I held back on making an API change right away. I've just gone ahead and added what I'd expect to be public eventually.

@lucab, I agree with you that this should be a tool for people like @swsnr to build log adapters on top of it. However, one of the problems that I have with the current API is that no potential errors are checked until the first call to journal_send, which I feel like is a problem.

I feel like this also exists in https://github.com/swsnr/systemd-journal-logger.rs, which stores itself in pub static LOG: JournalLog. How I've designed similar loggers to that is using OnceCell and having the init functions call get_or_try_init with the equivalent of JournalWriter:: system_default. This allows programs to fail fast when there's an issue with logging, rather than waiting until the first log invocation to panic.

The goal with this is that users like systemd-journal-logger.rs would be able to call self.journal.send(...) instead of libsystemd::logging::journal_send(...).

dfreese · 2022-12-23T12:26:33Z

src/logging.rs

@@ -14,10 +14,10 @@ use std::os::unix::net::UnixDatagram;
 use std::str::FromStr;

 /// Default path of the systemd-journald `AF_UNIX` datagram socket.
-pub static SD_JOURNAL_SOCK_PATH: &str = "/run/systemd/journal/socket";
+pub const SD_JOURNAL_SOCK_PATH: &str = "/run/systemd/journal/socket";


Basically it allows a broader set of usages and optimizations.

Rust doesn't have a stable ABI, but I don't think it would be considered an API break to change the value of a const variable. The API is that it's a const &str. You're guaranteeing that any value there will be created using a const value or function, which is a stronger guarantee than static. All that said, I've reverted this.

swsnr · 2022-12-23T13:28:19Z

However, one of the problems that I have with the current API is that no potential errors are checked until the first call to journal_send, which I feel like is a problem.

I understand the idea but as said in my previous comment, I don't think these changes check any relevant errors upfront. As said, I believe all you'll ever see with your changes is a process having exhausted its file descriptors, but you'll likely not see this as early in a process as logging setup (which is likely the very first thing you'll do in main()), and then I'd say it's the applications fault for not having raised its FD soft limits early on.

Any "real" error (i.e. journald not running) will still not surface before the first call to journal_send, because even with these changes you still create an unbound socket.

And the unbound socket is by design to allow application logging survive journald crashes, and it's also how upstream does it, so binding early to see if journald is running is not an option, in my opinion.

You could just check that the socket exists early on, e.g. by sending an empty payload (that's what tracing-journald does), but I don't think it's worth a major refactoring, because it doesn't guarantee that journald is still running by the time you try to send something 🤷🏿

We already discussed to wrap the socket in a struct in #87, and didn't see a convincing use case back then. While I see that your change is kinda the "natural" way to design an API around the journald socket, I also believe that it's an API without a use case 🤷🏿

This references some earlier discussion as to why connect is not used when setting up the socket used to write to the systemd journal. Looking at error reports, it seems as though most of the restarts to journald were due to watchdog timeouts when calling fsync under heavy I/O load. This appears to be fixed in v230, however, the same rationale still holds; defensively specify the address each time instead of relying on journald to stay running. See: https://www.suse.com/support/kb/doc/?id=000019921

dfreese · 2022-12-23T18:54:37Z

@swsnr thanks for the context. I didn't realize that was the reason an unconnected socket was used. It does seem like something is being pessimized by not using a reconnect-on-failure approach, but it seems like most implementations disagree with me, and I haven't measured that.

Since it seems votes are typically against wrapping the socket in a struct I've updated this to just add a comment providing the context you provided here.

swsnr reviewed Dec 23, 2022

View reviewed changes

dfreese commented Dec 23, 2022

View reviewed changes

dfreese force-pushed the struct branch from eb7a6ae to ef81f83 Compare December 23, 2022 18:13

dfreese changed the title ~~logging: consolidate internal logic to use JournalWriter struct~~ logging: add comment about use of unconnected sockets Dec 23, 2022

dfreese closed this Jan 1, 2023

dfreese deleted the struct branch January 1, 2023 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logging: add comment about use of unconnected sockets #138

logging: add comment about use of unconnected sockets #138

dfreese commented Dec 23, 2022 •

edited

Loading

dfreese commented Dec 23, 2022

swsnr commented Dec 23, 2022

swsnr commented Dec 23, 2022 •

edited

Loading

swsnr left a comment

swsnr Dec 23, 2022

dfreese Dec 23, 2022

swsnr Dec 23, 2022

lucab commented Dec 23, 2022

dfreese left a comment

dfreese Dec 23, 2022

swsnr commented Dec 23, 2022 •

edited

Loading

dfreese commented Dec 23, 2022

logging: add comment about use of unconnected sockets #138

logging: add comment about use of unconnected sockets #138

Conversation

dfreese commented Dec 23, 2022 • edited Loading

dfreese commented Dec 23, 2022

swsnr commented Dec 23, 2022

swsnr commented Dec 23, 2022 • edited Loading

swsnr left a comment

Choose a reason for hiding this comment

swsnr Dec 23, 2022

Choose a reason for hiding this comment

dfreese Dec 23, 2022

Choose a reason for hiding this comment

swsnr Dec 23, 2022

Choose a reason for hiding this comment

lucab commented Dec 23, 2022

dfreese left a comment

Choose a reason for hiding this comment

dfreese Dec 23, 2022

Choose a reason for hiding this comment

swsnr commented Dec 23, 2022 • edited Loading

dfreese commented Dec 23, 2022

dfreese commented Dec 23, 2022 •

edited

Loading

swsnr commented Dec 23, 2022 •

edited

Loading

swsnr commented Dec 23, 2022 •

edited

Loading