-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Systemd startup timeout on 7.6 #60140
Comments
Pinging @elastic/es-core-infra (:Core/Infra/Packaging) |
Thanks very much for your interest in Elasticsearch. Your issue is most likely configuration and/or environmental, as we have tests for basic cases like "does the systemd service start?". While it may still be a bug on our end, we like to direct these kinds of things to the forums. There's an active community there that can help diagnose your issue, and provide suggestions on changes if necessary. This allows us to use GitHub for verified bug reports, feature requests, and pull requests. Additionally, please note that images of text are both difficult to read, and are not searchable, so we ask that any text like systemd output be provided as quoted text. And even if "no error messages" exist in the As this is unlikely to be a bug in Elasticsearch, I hope you don't mind that I close this issue. It can always be reopened in the future if further evidence points to a bug. |
hi @rjernst, the elasticsearch systemd service file, and notify integration is something shipped by elasticsearch. It is definitely something to look at. The reason I included a screenshot it because I was on a console, and obtaining text output would been too cumbersome. All relevant information is included and this is easily reproducible. I am incredibly disappointed at your response to this report. |
@rossengeorgiev After looking back at our notify callback, I do see one potential edge case. We currently schedule a timer to fire every 15 seconds to extend the systemd timeout, but the first invocation is not immediate. So, if more than 15 seconds is taken for plugin initialization to happen, we would never get to our first extension, and the default systemd timeout would trigger. I'm going to reopen this, and open a PR to set the initial systemd timeout to 75 seconds. This will give us 60 seconds for plugin initialization to occur. However, note that something is still odd in your system. Not a lot happens before plugin initialization, so something else is likely going on which you should investigate. It is also concerning that your Elasticsearch log shows nothing. It would be helpful to turn on trace logging ( |
For systemd, while we are starting up, we notify the system every 15 seconds that we are still in the middle of starting up. However, if initial startup before plugin initialization is slower than 15 seconds, we won't ever get the chance to run the first timeout extension. This commit sets the initial timeout to 75 seconds, up from the default 30 seconds used by systemd. closes elastic#60140
For systemd, while we are starting up, we notify the system every 15 seconds that we are still in the middle of starting up. However, if initial startup before plugin initialization is slower than 15 seconds, we won't ever get the chance to run the first timeout extension. This commit sets the initial timeout to 75 seconds, up from the default 30 seconds used by systemd. closes #60140
For systemd, while we are starting up, we notify the system every 15 seconds that we are still in the middle of starting up. However, if initial startup before plugin initialization is slower than 15 seconds, we won't ever get the chance to run the first timeout extension. This commit sets the initial timeout to 75 seconds, up from the default 30 seconds used by systemd. closes #60140
Hi @rjernst, appreciate you coming back to the issue and taking a second look. More context, I encounter the issue on VM after being restored from snapshot. Note that stopping elasticsearch prior to snapshot doesn't seem to make a difference. When I restore the VM, start up is slow, which I assume is due to data checking. Anyway, I went back to try again, and I've got you a trace log: |
Thanks for filing and fixing this. FWIW: I think I ran into it when upgrading 6.8.6 to 7.5.2. Debug logs wouldn't show anything wrong when "stopping ...", didn't even realize it was systemd itself doing it. After a few service restarts it stayed up. From TRACE logs below maybe the problem is upgrades/cleanup of 6.8.6 indices took unexpectedly long? I'll try adding the 75s timeout but not sure that's enough.
|
Elasticsearch version (
bin/elasticsearch --version
):Version: 7.6.0, Build: default/rpm/7f634e9f44834fbc12724506cc1da681b0c3b1e3/2020-02-06T00:09:00.449973Z, JVM: 13.0.2
Plugins installed: None
JVM version (
java -version
): 13.0.2OS version (
uname -a
if on a Unix-like system): CentOS 7Description of the problem including expected versus actual behavior:
elasticsearch service is killed after timeout when attempting to start it. Expected behaviour would be to just start.
I noticed
TimeoutStartSec
is not set, and the service is usingnotify
. I'm guessing the service is not sending a notification to extend the default timeout, and systemd kills itSteps to reproduce:
No error message or anything in the
elasticsearch.log
Provide logs (if relevant):
The text was updated successfully, but these errors were encountered: