New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle SIGPIPE #7040

Closed
rossmcdonald opened this Issue Jul 20, 2016 · 11 comments

Comments

Projects
None yet
5 participants
@rossmcdonald
Copy link
Contributor

rossmcdonald commented Jul 20, 2016

Earlier versions of journald would send a SIGPIPE to any systemd processes when journald was restarted. So far we have bypassed this issue by redirecting STDOUT and STDERR to the traditional log paths, however since we're no longer doing this it would make sense to handle SIGPIPEs in a graceful manner to ensure the service doesn't crash when journald is restarted.

A similar fix was applied to Docker here: jwhonce/docker@b7f60f7

@jwilder jwilder added this to the 1.0.0 milestone Jul 21, 2016

@jsternberg jsternberg self-assigned this Jul 21, 2016

@jsternberg

This comment has been minimized.

Copy link
Contributor

jsternberg commented Jul 21, 2016

I concur with @nathanielc's conclusion here for Kapacitor: influxdata/kapacitor#737 (comment)

@rossmcdonald I think the proper answer is to upgrade systemd since it seems to be their problem. Unfortunately, we can't modify Go to notify us of a SIGPIPE on only fd 1 or 2 so that we can exit gracefully with an error exit code so systemd causes us to restart and I don't think ignoring SIGPIPE is correct. If this is absolutely necessary though, we can still just ignore the signal and warn people that InfluxDB will not stop if the file descriptor dies. Either way, there's going to need to be a note in the documentation about this issue with a caveat.

@rossmcdonald

This comment has been minimized.

Copy link
Contributor Author

rossmcdonald commented Jul 21, 2016

I'm actually seeing this issue on both systemd 208 and 219, so this might be something we need to investigate further. Here is the output from 219:

$ systemctl restart systemd-journald
$ systemctl status influxdb
● influxdb.service - InfluxDB is an open-source, distributed, time series database
   Loaded: loaded (/usr/lib/systemd/system/influxdb.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2016-07-21 19:58:25 CEST; 1min 37s ago
     Docs: https://docs.influxdata.com/influxdb/
 Main PID: 2573 (code=killed, signal=PIPE)

Depending on how we want to handle this, I think we can either:

  • Go the Docker route, and simply ignore SIGPIPE signals. I'm concerned that this means that logs may not be emitted correctly if journald is restarted, though.
  • Add a Restart=always to the unit file, so that InfluxDB will be restarted even in the event of a SIGPIPE. The downside here is that users will have to use systemctl to stop the daemon, as opposed to sending a SIGKILL SIGTERM.
  • Add a RestartForceExitStatus=SIGPIPE to the unit file, which tells the service to interpret SIGPIPE as a failure status and restart the process.

I'm thinking the third option is the cleanest, as we're not changing any other behavior apart from making SIGPIPE an exit status.

@jsternberg

This comment has been minimized.

Copy link
Contributor

jsternberg commented Jul 21, 2016

I would favor the third option, but for the second option, I don't think we should encourage people to use SIGKILL. It's generally very bad to use that signal. I don't think it would cause any corruption, but it's still not something we want to encourage people to use.

@rossmcdonald

This comment has been minimized.

Copy link
Contributor Author

rossmcdonald commented Jul 21, 2016

Oops, sorry, I meant SIGTERM. Agreed that people shouldn't be using SIGKILL on a normal basis. 😄

@sparrc

This comment has been minimized.

Copy link
Contributor

sparrc commented Jul 22, 2016

So what is the conclusion? Add RestartForceExitStatus=SIGPIPE to the systemd file? If anyone goes ahead and does this please apply patch to telegraf and kapacitor as well :)

cc @nathanielc

@rossmcdonald

This comment has been minimized.

Copy link
Contributor Author

rossmcdonald commented Jul 22, 2016

Well it turns out the RestartForceExitStatus= option was not added until systemd 215, so the third option might not be very helpful for users stuck on older versions. I think in that case Restart=always might be the better option.

@jsternberg

This comment has been minimized.

Copy link
Contributor

jsternberg commented Jul 22, 2016

I think Restart=always should be fine. We shouldn't encourage people to kill the daemon with SIGTERM but tell them to use the appropriate service manager anyway.

@nathanielc

This comment has been minimized.

Copy link
Contributor

nathanielc commented Jul 22, 2016

Which versions are affected by this issue? I just tried it with systemd 229 and no issue. Restarting journald did not send a SIGPIPE and logs are still arriving in the journal after the restart.

Why can't we instruct uses with broken versions of systemd to add RestartForceExitStatus=SIGPIPE or Restart=always as an override config?

http://askubuntu.com/questions/659267/how-do-i-override-or-configure-systemd-services

Seems like a simple solution that doesn't risk creating issues for our wider user base.

@rossmcdonald

This comment has been minimized.

Copy link
Contributor Author

rossmcdonald commented Jul 22, 2016

So far I've only tested with versions 208 and 219, which both exhibited this issue. We can certainly add some documentation around overriding the default unit file if using below a specific systemd version, I was just hoping to find a clean out-of-the-box fix we could use that would have minimal/no impact to current and outdated systemd users alike.

Since this isn't as simple of a fix as I was thinking it might be, we might just want to move this over as a docs issue.

@jwilder

This comment has been minimized.

Copy link
Contributor

jwilder commented Jul 25, 2016

@rossmcdonald @jsternberg @nathanielc Should we close this an re-open as a docs issue?

@rossmcdonald

This comment has been minimized.

Copy link
Contributor Author

rossmcdonald commented Jul 26, 2016

Moving this to a doc issue here: influxdata/docs.influxdata.com#548

We can create a separate issue if we want to change the default restart behavior when running on systemd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment