man pages, journald: document the relationship between journald restarts, stdout/stderr log stream termination and SIGPIPE #6620

AndriiNikitin · 2017-08-15T16:37:46Z

Submission type

Bug report

systemd version the issue has been seen with

systemd-233-6.fc26.x86_64

Used distribution

Fedora 26 (Server Edition)

In case of bug report: Expected behaviour you didn't see

my service's messages in journal

In case of bug report: Unexpected behaviour you saw

In case of bug report: Steps to reproduce the problem

Issue is similar to https://bugs.freedesktop.org/show_bug.cgi?id=84923

$ cat /a.sh
#!/bin/bash
while true ; do sleep 5; echo still alive ; done
$ cat /etc/systemd/system/a.service
[Unit]
Description=stillalive

[Service]
User=myuser
ExecStart=/a.sh

[Install]
WantedBy=multi-user.target

So I configured simple service above, started it as below and see expected messages every 5 sec in journal:

$ sudo systemctl start a
$ sudo systemctl status a
● a.service - stillalive
   Loaded: loaded (/etc/systemd/system/a.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-08-15 16:16:01 UTC; 19s ago
 Main PID: 1391 (a.sh)
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/a.service
           ├─1391 /bin/bash /a.sh
           └─1404 sleep 5

Aug 15 16:16:01 fedora26-amd64 systemd[1]: Started stillalive.
Aug 15 16:16:06 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:16:11 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:16:16 fedora26-amd64 a.sh[1391]: still alive

Now I restart journal and new messages are not arriving to journal until my custom service is restarted again:

$ sudo systemctl stop systemd-journald
Warning: Stopping systemd-journald.service, but it can still be activated by:
  systemd-journald.socket
  systemd-journald-dev-log.socket
  systemd-journald-audit.socket
$ sudo systemctl status a
● a.service - stillalive
   Loaded: loaded (/etc/systemd/system/a.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-08-15 16:16:01 UTC; 5min ago
 Main PID: 1391 (a.sh)
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/a.service
           ├─1391 /bin/bash /a.sh
           └─1498 sleep 5

Aug 15 16:20:21 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:26 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:31 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:36 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:41 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:46 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:51 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:56 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:21:01 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:21:06 fedora26-amd64 a.sh[1391]: still alive
$ sudo systemctl status a
● a.service - stillalive
   Loaded: loaded (/etc/systemd/system/a.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-08-15 16:16:01 UTC; 7min ago
 Main PID: 1391 (a.sh)
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/a.service
           ├─1391 /bin/bash /a.sh
           └─1528 sleep 5

Aug 15 16:20:21 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:26 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:31 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:36 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:41 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:46 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:51 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:20:56 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:21:01 fedora26-amd64 a.sh[1391]: still alive
Aug 15 16:21:06 fedora26-amd64 a.sh[1391]: still alive

The text was updated successfully, but these errors were encountered:

evverx · 2017-08-15T18:10:44Z

Could you clarify why systemctl stop is being used instead of systemctl restart to restart journald ?

AndriiNikitin · 2017-08-15T19:53:43Z

Since the service was automatically started anyway - I didn't see much difference. Please provide confirmation if I need to confirm the testcase with 'restart' instead of 'stop' and/or if you believe it is user error to use 'stop' here

evverx · 2017-08-15T21:04:47Z

journald stores its file descriptors in systemd and systemd closes all of them when systemctl stop is used to stop the daemon. That is briefly mentioned in the documentation:

systemd/man/systemd.service.xml

Lines 862 to 866 in 43ee6a8

    
           descriptor. Defaults to 0, i.e. no file descriptors may be stored in the service manager. All file descriptors 
        
           passed to the service manager from a specific service are passed back to the service's main process on the next 
        
           service restart. Any file descriptors passed to the service manager are automatically closed when 
        
           <constant>POLLHUP</constant> or <constant>POLLERR</constant> is seen on them, or when the service is fully 
        
           stopped and no job is queued or being executed for it.</para></listitem>

I don't think you need to check whether systemctl restart works properly, because the CI regularly runs the following tests:

systemd/test/TEST-04-JOURNAL/test-journal.sh

Lines 54 to 68 in 43ee6a8

    
           # Don't lose streams on restart 
        
           systemctl start forever-print-hola 
        
           sleep 3 
        
           systemctl restart systemd-journald 
        
           sleep 3 
        
           systemctl stop forever-print-hola 
        
           [[ ! -f "/i-lose-my-logs" ]] 
        
           # https://github.com/systemd/systemd/issues/4408 
        
           rm -f /i-lose-my-logs 
        
           systemctl start forever-print-hola 
        
           sleep 3 
        
           systemctl kill --signal=SIGKILL systemd-journald 
        
           sleep 3 
        
           [[ ! -f "/i-lose-my-logs" ]]

AndriiNikitin · 2017-08-16T07:37:06Z

Thank you for clarification - I should agree that from abstract systemd point of view everything looks good. And I just verified for myself that 'restart' shows correct behavior and logging continues properly.

But conceptually from OS point of view: is it expected that services will be muted if journalctl is stopped by human mistake, by some script bug or intentional attack? Do you see any security issue here? Any advice except 'dont stop it'? (Because e.g. I don't always control everything but am always interested in reliable logging).

evverx · 2017-08-16T09:38:07Z

Everything might stop working suddenly, so to minimise the risk of data loss it probably makes sense to try sending messages to many different places such as syslog, journal, a bunch of remote servers and local disks. In addition, loggers probably shouldn't ruin everything when they fail to write something to wherever they write, but it depends on what is logged of course. I've read https://jira.mariadb.org/browse/MDEV-13517 and I'm not sure that I understand why sending an informational message to stderr is such a critical thing that failing to do that stops everything.

evverx · 2017-08-16T10:07:17Z

On the other hand, in my opinion, a service with StandardError=syslog should be connected to /dev/log directly, because it seems that destroying /dev/log by mistake is much harder than stopping journald accidentally.

evverx · 2017-08-16T12:55:55Z

Unfortunately, by design it doesn't seem to be possible to connect a service directly to /dev/log without breaking too many things, so, on second thought, I think that adding RefuseManualStop=yes to systemd-journald.service should be good enough. The downsides of stopping systemd-journald should probably be documented as well.

evverx · 2017-08-16T13:14:46Z

RefuseManualStop=yes also prevents systemctl restart, so it is not as good as I thought. I'm out of ideas.

moschlar · 2017-08-16T13:19:16Z

Thank you both for properly investigating this issue!

zdzichu · 2017-08-16T17:40:04Z

So restart no longer equals stop+start? Sad.

evverx · 2017-08-17T15:16:54Z

It is even sadder that the man page states the following

systemd/man/systemctl.xml

Lines 779 to 785 in 43ee6a8

    
           <term><command>restart <replaceable>PATTERN</replaceable>…</command></term> 
        
           <listitem> 
        
             <para>Stop and then start one or more units specified on the 
        
             command line. If the units are not running yet, they will 
        
             be started.</para> 
        
           </listitem>

Talking of stopping services, maybe, something like AllowManualRestarts= could be implemented so that any service containing RefuseManualStop=yes and AllowManualRestart=yes cannot be stopped but can be restarted. I'm not sure that I like that idea though.

AndriiNikitin · 2017-08-18T13:46:46Z

I can think about new parameter StopIsRestart= , there yes would indicate that service cannot be manually stopped, but only restarted.

evverx · 2017-08-18T21:42:45Z

By the way, even if systemd refuses to stop journald, it will be possible for any service with Standard[Output|Error] to get EPIPE, so it would probably make sense to keep the workaround mentioned in https://jira.mariadb.org/browse/MDEV-13517 until EPIPE is ignored by sst. Also, calling syslog or/and anything else that doesn't rely on the streams of journald might make logging a bit more robust.

poettering · 2017-08-30T09:54:23Z

But conceptually from OS point of view: is it expected that services will be muted if journalctl is stopped by human mistake, by some script bug or intentional attack? Do you see any security issue here? Any advice except 'dont stop it'? (Because e.g. I don't always control everything but am always interested in reliable logging).

Well, if you shut down the logging service fully, then all ongoing connections to it will be severed, that's expected and I don't think too surprising.

Restarts are handled in a special way though: journald stores all fds of ongoing connections in PID 1, and they aren't flushed out unless a unit is fully stopped and no jobs are queued anymore for it. As during a restart that's never the case they aren't lost across a restart.

But yeah, this needs better documentation, and we probably should add a restart mode which will flush out all saved fds before starting again.

poettering · 2017-08-30T09:57:47Z

I filed #6697 as an RFE issue asking for a way to restart a service with flushing of the fdstore.

poettering · 2017-08-30T10:00:57Z

On the other hand, in my opinion, a service with StandardError=syslog should be connected to /dev/log directly, because it seems that destroying /dev/log by mistake is much harder than stopping journald accidentally.

/dev/log expects complete syslog datagrams. Applications otoh usually expect something free-form character based as stderr. i.e. lines written to stderr are usually not prefixed by syslog-compatibile date/hostname/priority/facility and so on, and might appear piecemeal, while syslog only accepts full records in each datagram, hence it's not possible connect this directly.

Moreover, if the server side fd of /dev/log is closed, then all clients that used connect() on it earlier will also lose their communication for good, and if /dev/log is restored later on the old inode cannot be reused, a new one is created, which means there's really no improvement in this regard over the status quo with /dev/log

poettering · 2017-08-30T10:05:31Z

I think that adding RefuseManualStop=yes to systemd-journald.service should be good enough.

I am not convinced this would be a good idea. I think if people in emergency mode (or something similarly minimal) should have every right to stop and start journald at any point in time they like, prohibiting that appears to heavy weight.

In general, the same way as people probably generally understand that terminating httpd severs all ongoing HTTP downloads it's not too hard to grok that terminating the logging service severs all ongoing stdout/stderr log streams, and is hence not only expected but probably even the desired effect in this case.

The downsides of stopping systemd-journald should probably be documented as well.

Yes, definitely, we should add a comment about that somewhree. In fact I think we should turn this bug into a documentation bug about this I think.

poettering · 2017-08-30T10:09:05Z

By the way, even if systemd refuses to stop journald, it will be possible for any service with Standard[Output|Error] to get EPIPE, so it would probably make sense to keep the workaround mentioned in https://jira.mariadb.org/browse/MDEV-13517 until EPIPE is ignored by sst. Also, calling syslog or/and anything else that doesn't rely on the streams of journald might make logging a bit more robust.

Note that systemd will by default disable SIGPIPE by default for all services it starts, so that journald terminating doesn't generate SIGPIPE all over the place, bringing the whole system down. If services unblock that again, then they have to do that explicitly, and hence it's kind on them to deal with it.

(Note that there's very little point in unblocking SIGPIPE, it's only purpose are shell pipelines. Hence, unless your daemon is actually a shell it should never unblock it)

But yeah, we really should document what it means to have stdout/stderr connected to the journald means regarding SIGPIPE

poettering · 2017-08-30T10:11:20Z

Also, calling syslog() or/and anything else that doesn't rely on the streams of journald might make logging a bit more robust.

note that using /dev/log is only more robust than logging to stdout/stderr regarding journald termination if you log using sendto(), i.e. each time specify the address to send your logs to anew. If you use connect()+send()/write() (i.e. only specify the address once, and not each time) you win very little.

poettering · 2017-08-30T10:13:41Z

Anyway, summarizing: I think we need to document the following better:

document that stopping journald will sever all stdout/stderr logging connections
document that restarting journald, won't sever them
document the relationship between blocking/unmasking SIGPIPE and stdout/stderr logging

Anything I forgot?

evverx · 2017-08-30T11:45:12Z

@poettering my point is that those who care about their logs should probably send them not only to journald (or wherever they send logs to) but also to other places.

Note that systemd will by default disable SIGPIPE by default for all services it starts

I wrote about EPIPE, which is returned by write when it tries to send data to a closed pipe.

Anything I forgot?

I would probably document that the number of streams that can be handled by journald is somewhat limited, so that people won't try to do something like #6599.

evverx · 2017-08-30T12:02:30Z

I also think that the documentation should make it clear that every service has to be ready to get EPIPE when writing to stderr/stdout even if journald is not stopped.

Fixes systemd#6620.

Details about EPIPE/SIGPIPE handling, metadata and more. Fixes: systemd#6620

poettering · 2017-09-15T12:19:28Z

Fix waiting in #6839.

evverx added the journal label Aug 16, 2017

poettering changed the title ~~services writing to stdout become silent after journal restart~~ man pages, journald: document the relationship between journald restarts, stdout/stderr log stream termination and SIGPIPE Aug 30, 2017

poettering added this to the v235 milestone Aug 30, 2017

poettering added the documentation label Aug 30, 2017

keszybz added a commit to keszybz/systemd that referenced this issue Sep 15, 2017

man: add discussion of restarting systemd-journald

8166219

Fixes systemd#6620.

poettering added a commit to poettering/systemd that referenced this issue Sep 15, 2017

man: add a whole section detailing journal stdout/stderr stream logging

f310183

Details about EPIPE/SIGPIPE handling, metadata and more. Fixes: systemd#6620

poettering mentioned this issue Sep 15, 2017

man: add a whole section detailing journal stdout/stderr stream logging #6839

Closed

poettering added the has-pr label Sep 15, 2017

poettering closed this as completed in 157148d Sep 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

man pages, journald: document the relationship between journald restarts, stdout/stderr log stream termination and SIGPIPE #6620

man pages, journald: document the relationship between journald restarts, stdout/stderr log stream termination and SIGPIPE #6620

AndriiNikitin commented Aug 15, 2017 •

edited

Loading

evverx commented Aug 15, 2017

AndriiNikitin commented Aug 15, 2017 •

edited

Loading

evverx commented Aug 15, 2017 •

edited

Loading

AndriiNikitin commented Aug 16, 2017 •

edited

Loading

evverx commented Aug 16, 2017 •

edited

Loading

evverx commented Aug 16, 2017

evverx commented Aug 16, 2017

evverx commented Aug 16, 2017 •

edited

Loading

moschlar commented Aug 16, 2017

zdzichu commented Aug 16, 2017

evverx commented Aug 17, 2017

AndriiNikitin commented Aug 18, 2017 •

edited

Loading

evverx commented Aug 18, 2017 •

edited

Loading

poettering commented Aug 30, 2017

poettering commented Aug 30, 2017

poettering commented Aug 30, 2017 •

edited

Loading

poettering commented Aug 30, 2017

poettering commented Aug 30, 2017 •

edited

Loading

poettering commented Aug 30, 2017

poettering commented Aug 30, 2017

evverx commented Aug 30, 2017 •

edited

Loading

evverx commented Aug 30, 2017 •

edited

Loading

poettering commented Sep 15, 2017

man pages, journald: document the relationship between journald restarts, stdout/stderr log stream termination and SIGPIPE #6620

man pages, journald: document the relationship between journald restarts, stdout/stderr log stream termination and SIGPIPE #6620

Comments

AndriiNikitin commented Aug 15, 2017 • edited Loading

Submission type

systemd version the issue has been seen with

Used distribution

In case of bug report: Expected behaviour you didn't see

In case of bug report: Unexpected behaviour you saw

In case of bug report: Steps to reproduce the problem

evverx commented Aug 15, 2017

AndriiNikitin commented Aug 15, 2017 • edited Loading

evverx commented Aug 15, 2017 • edited Loading

AndriiNikitin commented Aug 16, 2017 • edited Loading

evverx commented Aug 16, 2017 • edited Loading

evverx commented Aug 16, 2017

evverx commented Aug 16, 2017

evverx commented Aug 16, 2017 • edited Loading

moschlar commented Aug 16, 2017

zdzichu commented Aug 16, 2017

evverx commented Aug 17, 2017

AndriiNikitin commented Aug 18, 2017 • edited Loading

evverx commented Aug 18, 2017 • edited Loading

poettering commented Aug 30, 2017

poettering commented Aug 30, 2017

poettering commented Aug 30, 2017 • edited Loading

poettering commented Aug 30, 2017

poettering commented Aug 30, 2017 • edited Loading

poettering commented Aug 30, 2017

poettering commented Aug 30, 2017

evverx commented Aug 30, 2017 • edited Loading

evverx commented Aug 30, 2017 • edited Loading

poettering commented Sep 15, 2017

AndriiNikitin commented Aug 15, 2017 •

edited

Loading

AndriiNikitin commented Aug 15, 2017 •

edited

Loading

evverx commented Aug 15, 2017 •

edited

Loading

AndriiNikitin commented Aug 16, 2017 •

edited

Loading

evverx commented Aug 16, 2017 •

edited

Loading

evverx commented Aug 16, 2017 •

edited

Loading

AndriiNikitin commented Aug 18, 2017 •

edited

Loading

evverx commented Aug 18, 2017 •

edited

Loading

poettering commented Aug 30, 2017 •

edited

Loading

poettering commented Aug 30, 2017 •

edited

Loading

evverx commented Aug 30, 2017 •

edited

Loading

evverx commented Aug 30, 2017 •

edited

Loading