-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass correct OS PID to systemd-notify #664
Comments
Some testing reveals that there is a problem with guessing MAINPID. os:cmd("systemd-notify --ready MAINPID=" ++ os:getpid()), because it's the top-level shell process PID that should be passed to systemd. |
Maybe additional argument for |
After some investigation I think that even with `sd_notify` MAINPID detection didn't work 100% percent correctly. Foreground broker was started through a shell process that was converting signals into graceful shutdown commands. And it's pid of that proxy that was to be communicated to systemd. Automatic pid inference worked in case epmd was already running on machine, otherwise it was epmd who was chosen as main process. This commit introduces new environment variable to prevent creation of such a proxy process. And this var should be specified in unit file (example added to docs/). New variable is needed due to following reasons: - No way to detect whether we are running under systemd or not - I was thinking about adding command-line switch instead, but there is no way to work with "$@" in portable shell script. Another improvement is writing message to stderr about graceful shutdown, just to be sure that shutdown was indeed graceful. In case of systemd it will automatically appear in the journal. Fixes rabbitmq#664
Hi! So I'm testing your pull request on Debian Jessie (8.3, systemd), freshly upgraded from Wheezy (sysvinit). As this is the first time I play with systemd, I'm going to post comments which will probably seem obvious to you, @binarin. I just want to do it because it helps me to understand the problem and correctly test the PR. Feel free to correct me what I say doesn't make any sense :-)
Now, I'm going to install the unit file and test further. In the end, the probable conclusion is that our RPM and Debian packages should install this unit file out-of-the-box, but I need to see how to handle both init systems with a single package if possible. |
Just a note while I'm on it: Debian official package of |
epmd dependency isn't a problem - it's implicit because jessie uses socket activation for epmd. |
I will continue to debug this and learn more about systemd. |
Ok, perfect :-) Thanks! |
So looks |
It's some sort of permission problems with |
|
Good job finding the reported issue. I tried to debug this with strace(1), but the problem never occurred with. This led me in the direction of a race with the exit of the shell, but I couldn't anything else. I'm not a fan of the Here are two comments:
|
I was thinking about Perl myself, but wasn't sure it would be acceptable solution 😄 Should I amend this PR with your suggestions from above or are you going to do it yourself? |
Another alternative to perl is using I'm using following implementation of
|
@binarin is |
|
@binarin I agree. |
I tried socat(1) too yesterday and it worked fine in my tests, but I guess we could still hit the problem. Here is the paragraph from the systemd issue (systemd/systemd#2737) I'm referring:
About the availability, socat is obviously available as a package on all distributions, but it's not installed by default. Our RPM/Debian package can depend on it of course. |
About the footprint on disk: Perl is already installed on the Debian Jessie VM I used to test. |
By the way, we may have another issue waiting for us after this systemd bug: systemd kills the service if it didn't report readyness in time. However, RabbitMQ may take several dozen minutes to load queue data or sync mirrored queues, all this before reporting "I'm ready!" to systemd. I don't know if we can disable this timeout from systemd for |
|
Nice! Your solution is better. I didn't think of making the state check in the pipe itself to maintain A few comments:
About the startup timeout, I saw the |
Hopefully this patch will fix all systemd-related problems: - Proxy shell process (which converts signals to `rabbitmqtl stop`) will no longer be started under systemd. There is no need in it, as systemd unit already contains instructions for graceful shutdown. - Ready notification for systemd will be sent with the help of `socat`, as `systemd-notify` is currently broken for non-root users. `socat` is the most lightweight way to do it (other options are using NIF or some external helper in Perl). - epmd will not be able to interfere by sending it's own ready notifications. - systemd journal will contain some additional messages about startup and shutdown sequence, just to be sure that everything is working correctly. Fixes rabbitmq#664
Hopefully this patch will fix all systemd-related problems: - Proxy shell process (which converts signals to `rabbitmqtl stop`) will no longer be started under systemd. There is no need in it, as systemd unit already contains instructions for graceful shutdown. - Ready notification for systemd will be sent with the help of `socat`, as `systemd-notify` is currently broken for non-root users. `socat` is the most lightweight way to do it (other options are using NIF or some external helper in Perl). - epmd will not be able to interfere by sending it's own ready notifications. - systemd journal will contain some additional messages about startup and shutdown sequence, just to be sure that everything is working correctly. Fixes rabbitmq#664
Nice patch, thank you! I didn't test it yet, but is there any benefit from keeping support for the NIF? |
Compatibility. The NIF was never a requirement but lets keep it around for another feature release or so.
|
I'm still debugging one more issue with this patch, so please don't merge even if it passes your tests ) Initially idea was to add |
Hopefully this patch will fix all systemd-related problems: - Proxy shell process (which converts signals to `rabbitmqtl stop`) will no longer be started under systemd. There is no need in it, as systemd unit already contains instructions for graceful shutdown. - Ready notification for systemd will be sent with the help of `socat`, as `systemd-notify` is currently broken for non-root users. `socat` is the most lightweight way to do it (other options are using NIF or some external helper in Perl). - epmd will not be able to interfere by sending it's own ready notifications. - systemd journal will contain some additional messages about startup and shutdown sequence, just to be sure that everything is working correctly. Fixes rabbitmq#664
Now the patch is ready. Unless I've missed something this time ) |
Added |
Testing uncovered some more issues, fixed them also. |
I merged the pull request. Thank you! |
Change the default install version of RabbitMQ Server to 3.6.2. Tests have been updated to upgrade from 3.6.1 to 3.6.2 and an unused test vars folder has been removed. socat has been added as a new dependency to work around a known issue with systemd: rabbitmq/rabbitmq-server#664 Change-Id: I2a4d127a852e7780dd95f905f3e1d5d1df60028a
See the discussion in #573.
The text was updated successfully, but these errors were encountered: