Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend systemd startup timeout to 900s #91338

Merged

Conversation

DaveCTurner
Copy link
Contributor

Extends the default systemd startup timeout from 75s to 900s.

Relates #86476

Extends the default `systemd` startup timeout from 75s to 900s.

Relates elastic#86476
@DaveCTurner DaveCTurner added :Core/Infra/Core Core issues without another label Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. v8.6.0 labels Nov 7, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Nov 7, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@DaveCTurner
Copy link
Contributor Author

We seem to have quite a few users running older systemd versions that don't support the EXTEND_TIMEOUT_USEC feature, and in larger clusters 75s is not nearly enough time for Elasticsearch to start up today. I'm opening this to prompt a discussion about making the default timeout much longer. Do we need it to be so short? I am not aware of any cases where Elasticsearch gets completely stuck during startup and needs to be killed so soon, nor cases where killing it promptly and retrying would help anything.

@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

@grcevski
Copy link
Contributor

grcevski commented Nov 7, 2022

I guess one issue I see is that if we have a wrong systemd configuration, such that the signal that ES has started never makes it to systemd, it will take 900 seconds to kill the instance. At that much later point, killing the instance might cause even more confusion on the customer side.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. @grcevski To answer your concern, I think the benefit (less startup churn/killing of nodes recovering large indices) outweighs the potential problems (a misconfiguration of systemd, which should be caught by our testing).

@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
@DaveCTurner DaveCTurner merged commit d956501 into elastic:main Nov 17, 2022
@DaveCTurner DaveCTurner deleted the 2022-11-07-extend-systemd-startup-timeout branch November 17, 2022 11:16
@DaveCTurner
Copy link
Contributor Author

Thanks Ryan & Nikola. WDYT about backporting this to 7.17 too?

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Nov 17, 2022
The docs introduced in elastic#91333 apply to older versions in which the
`systemd` startup timeout was 75s by default, but in elastic#91338 we extended
the `systemd` startup timeout to 900s from 8.7 onwards. This commit
adjusts the docs to match.
@rjernst
Copy link
Member

rjernst commented Nov 17, 2022

WDYT about backporting this to 7.17 too?

+1

DaveCTurner added a commit that referenced this pull request Nov 17, 2022
Extends the default `systemd` startup timeout from 75s to 900s.

Relates #86476
DaveCTurner added a commit that referenced this pull request Nov 17, 2022
The docs introduced in #91333 apply to older versions in which the `systemd`
startup timeout was 75s by default, but in #91338 we extended the `systemd`
startup timeout to 900s from 8.7.0 and 7.17.8 onwards. This commit adjusts the
docs to match.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Nov 17, 2022
The docs introduced in elastic#91333 apply to older versions in which the `systemd`
startup timeout was 75s by default, but in elastic#91338 we extended the `systemd`
startup timeout to 900s from 8.7.0 and 7.17.8 onwards. This commit adjusts the
docs to match.
elasticsearchmachine pushed a commit that referenced this pull request Nov 17, 2022
The docs introduced in #91333 apply to older versions in which the `systemd`
startup timeout was 75s by default, but in #91338 we extended the `systemd`
startup timeout to 900s from 8.7.0 and 7.17.8 onwards. This commit adjusts the
docs to match.
@grcevski
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. Team:Core/Infra Meta label for core/infra team v7.17.8 v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants