New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Journald doesn't receive the FDSTOREd file descriptor back after a crash #4408

Closed
utezduyar opened this Issue Oct 19, 2016 · 6 comments

Comments

3 participants
@utezduyar
Contributor

utezduyar commented Oct 19, 2016

Submission type

  • Bug report

NOTE: Do not submit anything other than bug reports or RFEs via the issue tracker!

systemd version the issue has been seen with

225 & 231

NOTE: Do not submit bug reports about anything but the two most recently released systemd versions upstream!

Used distribution

Own Distro & Debian Testing

In case of bug report: Expected behaviour you didn't see

I have terminated systemd-journald with kill -9. I was expecting to receive the sockets that had been sent to systemd back.

  1. Journald is up, has a STREAM socket for another service ABC..
  2. Same socket is sent to PID 1 and stored by PID 1 via FDSTORE=
  3. Journald is killed with kill -9.
  4. Journald comes up again but without the FDSTOREd file descriptors.

I have tracked it down that PID 1 calls service_release_resources() for the dead service which terminates the only copy of the stored fds.

The bad part is, service ABCs stdout still points to a socket. Write on the socket doesn't yield anything since journal is not aware of it. My guess is eventually the socket buffer will fill up and all prints to stdout will halt.

Am I missing something here?

In case of bug report: Unexpected behaviour you saw

In case of bug report: Steps to reproduce the problem

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Oct 19, 2016

Member

hmm, journald uses Restart= and the stored fds should really stick around in that case. Basically, they are supposed to stick around as long as the service is around or there's a job queued for it. The moment that a service is stopped and no job is queued anymore for it they are closed.

Given your described behaviour this suggests the logic is borked somehow and not implemented correctly...

Member

poettering commented Oct 19, 2016

hmm, journald uses Restart= and the stored fds should really stick around in that case. Basically, they are supposed to stick around as long as the service is around or there's a job queued for it. The moment that a service is stopped and no job is queued anymore for it they are closed.

Given your described behaviour this suggests the logic is borked somehow and not implemented correctly...

@poettering poettering added this to the v232 milestone Oct 19, 2016

evverx added a commit to evverx/systemd that referenced this issue Oct 21, 2016

keszybz added a commit to keszybz/systemd that referenced this issue Oct 23, 2016

keszybz added a commit to keszybz/systemd that referenced this issue Oct 23, 2016

core: when restarting services, don't close fds
We would close all the stored fds in service_release_resources(), which of
course broke the whole concept of storing fds over service restart.

Fixes #4408.

@keszybz keszybz added the has-pr label Oct 23, 2016

@utezduyar

This comment has been minimized.

Show comment
Hide comment
@utezduyar

utezduyar Oct 23, 2016

Contributor

Thanks for looking in to this and fixing it @keszybz. I am confused though, maybe I misunderstood @poettering's comment but I thought some kind of simple line has been deleted or something like that. Seems like you have done a full-blown implementation. Was this functionality ever working?

Contributor

utezduyar commented Oct 23, 2016

Thanks for looking in to this and fixing it @keszybz. I am confused though, maybe I misunderstood @poettering's comment but I thought some kind of simple line has been deleted or something like that. Seems like you have done a full-blown implementation. Was this functionality ever working?

@keszybz

This comment has been minimized.

Show comment
Hide comment
@keszybz

keszybz Oct 25, 2016

Member
Member

keszybz commented Oct 25, 2016

keszybz added a commit to keszybz/systemd that referenced this issue Oct 26, 2016

keszybz added a commit to keszybz/systemd that referenced this issue Oct 26, 2016

core: when restarting services, don't close fds
We would close all the stored fds in service_release_resources(), which of
course broke the whole concept of storing fds over service restart.

Fixes #4408.

keszybz added a commit to keszybz/systemd that referenced this issue Oct 29, 2016

keszybz added a commit to keszybz/systemd that referenced this issue Oct 29, 2016

core: when restarting services, don't close fds
We would close all the stored fds in service_release_resources(), which of
course broke the whole concept of storing fds over service restart.

Fixes #4408.
@utezduyar

This comment has been minimized.

Show comment
Hide comment
@utezduyar

utezduyar Oct 31, 2016

Contributor

I have tested this on one of our product which has systemd 208 on it and it is same problem there.

Contributor

utezduyar commented Oct 31, 2016

I have tested this on one of our product which has systemd 208 on it and it is same problem there.

@poettering poettering closed this in f0bfbfa Nov 2, 2016

@utezduyar

This comment has been minimized.

Show comment
Hide comment
@utezduyar

utezduyar Nov 3, 2016

Contributor

@poettering @keszybz I don't understand why the client side of the journald-server connections didn't get notified of the closure of the server side of the socket. If we had a way to enable that we would have caught this bug long time ago. Care to explain?

Contributor

utezduyar commented Nov 3, 2016

@poettering @keszybz I don't understand why the client side of the journald-server connections didn't get notified of the closure of the server side of the socket. If we had a way to enable that we would have caught this bug long time ago. Care to explain?

@keszybz keszybz removed the has-pr label Nov 3, 2016

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Nov 3, 2016

Member

@utezduyar so, the stream socket from the daemons to journald are only open for writing, not for reading. When they are closed on the server side, then writing to them will normally raise a SIGPIPE. However, in systemd we decided to mask SIGPIPE by default for all system services, as such behaviour only makes sense in shell pipelines really. If SIGPIPE is turned off writing to the socket will result in an EPIPE error, which is good. However, pretty much nobody actually checks the error code when logging with fprintf(stderr, "…"); or something similar, hence you'll never see any error raised by this. And it's actually the right thing to ignore any errors then, because what could you do in this case: log another error? then you'd enter an endless loop: you want to log that logging didn't work, which won't work either, so you log about it which won't work, so you log about it which won't work, so you log about it which won't work, ad infinitum.

Member

poettering commented Nov 3, 2016

@utezduyar so, the stream socket from the daemons to journald are only open for writing, not for reading. When they are closed on the server side, then writing to them will normally raise a SIGPIPE. However, in systemd we decided to mask SIGPIPE by default for all system services, as such behaviour only makes sense in shell pipelines really. If SIGPIPE is turned off writing to the socket will result in an EPIPE error, which is good. However, pretty much nobody actually checks the error code when logging with fprintf(stderr, "…"); or something similar, hence you'll never see any error raised by this. And it's actually the right thing to ignore any errors then, because what could you do in this case: log another error? then you'd enter an endless loop: you want to log that logging didn't work, which won't work either, so you log about it which won't work, so you log about it which won't work, so you log about it which won't work, ad infinitum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment