Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
example rabbitmq-server.service.example systemd service unit should automatically restart rmq #1359
Comments
lukebakken
added
question
wontfix
labels
Sep 12, 2017
I suspect it is because there are (rare) failure situations where auto-restart is not a good solution, or that an auto-restart may clobber data that could be used to diagnose the failure. @michaelklishin probably has more historical information about this. |
|
@rgl there is no big idea behind not having that line. I recall a similar discussion and a similar question about the Windows service. Feel free to submit a PR that adds |
michaelklishin
added
effort-tiny
usability
and removed
question
wontfix
labels
Sep 12, 2017
|
We may want to use There is some discussion here. |
|
@rgl would you like to submit a PR or should our team handle this? |
lukebakken
referenced this issue
Sep 14, 2017
Closed
"rabbitmqctl stop" should always exit with 0 #1362
rgl
commented
Sep 17, 2017
|
Ah, those are good points! Now I'm in a catch-22 "to restart, or not to restart, that is the question" moment! But, I'm lingering towards starting indefinitely... with cat >/etc/systemd/system/test.service <<'EOF'
[Unit]
After=network.target
StartLimitIntervalSec=0
[Service]
Type=simple
ExecStart=/bin/bash -c "date '+%F %T.%%N';exit 1;"
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable test
systemctl start test
journalctl --follow -u test
systemctl disable testBut I'm not really sure what should be the default for rabbitmq :-/ |
|
Restarting forever may be OK but not rate limiting (in particular, not limiting concurrency of possible restart attempts) sounds like a very bad idea to me. I |
rgl
commented
Sep 18, 2017
|
But is there other way to keep restarting forever without disabling rate limiting with StartLimitIntervalSec=0? |
|
Why do we need to restart forever? Is it really such a good idea? I don't know. I'm no systemd expert but my reading of the docs is that they have the same "restart intensity" settings as in Erlang: a time interval and how many restart attempts ("bursts") in that time frame are considered to be reasonable. The only caveat is
and
So it sounds like this will effectively restart forever unless things are so broken that it restarts more than N times in T seconds. In which case maybe "1 restart a second" or two seconds should be the limit. Because…
|
rgl
commented
Sep 18, 2017
|
Oh, I now realize that I failed to mention the reason why I've initially created this issue... I had a disk outage, freed the disk, and much latter noticed rmq was stopped, and scratched my head why rmq was never (re)started by systemd. It turns out that rmq stopped due to the disk outage and was never restarted due to the systemd unit configuration. This made me realize that I needed to have alerting in place and perhaps change the systemd unit to keep restarting rmq forever. Hence this issue was created. In my particular case, having systemd restart (forever) rmq would have helped. |
|
RabbitMQ does not intentionally stop when the disk is full: enough I/O operations fail and cause certain critically important parts to shut down. Restarting forever is a great idea on the surface, not so much in practice. We highly recommend replacing nodes that ran out of disk space. |
|
With a full disk restarts would have failed as well, possibly forever. Picking a cut-off frequency is pretty challenging for the general case but according to the docs quotes above 1 time per second should be reasonable. |
|
At this time, I don't see a way in the |
added a commit
that referenced
this issue
Sep 19, 2017
lukebakken
referenced this issue
Sep 19, 2017
Merged
Add optional Restart and RestartSec configuration #1368
michaelklishin
added this to the 3.6.13 milestone
Sep 19, 2017
|
@rgl we updated the example and will make |
rgl commentedSep 12, 2017
The rabbitmq-server.service.example is not configured to automatically restart when there is an error. Is there a reason not to?
i.e. it should contain the line
Restart=alwaysorRestart=on-failureand maybeRestartSec=10.For reference see the Restart= documentation.