Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling timeout exceptions on watcher startup #90421

Merged
merged 4 commits into from Sep 28, 2022

Conversation

masseyke
Copy link
Member

Right now if watcher throws an exception while starting up (for example a TimeoutException while waiting for a refresh of .watches or .triggered_watches to complete) then watcher gets into a state where it will never be restarted automatically, and is incredibly difficult to start manually. This PR catches those exceptions and sets the state to STOPPED so that when the next cluster change event comes through it will attempt to start watcher again.
Closes #44981
Relates #69482

@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@masseyke
Copy link
Member Author

@elasticmachine update branch

@masseyke masseyke marked this pull request as ready for review September 28, 2022 17:16
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Sep 28, 2022
Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@masseyke masseyke added v7.17.7 v8.5.1 auto-backport-and-merge Automatically create backport pull requests and merge when ready and removed v8.5.0 labels Sep 28, 2022
@masseyke masseyke merged commit 45a6490 into elastic:main Sep 28, 2022
@masseyke masseyke deleted the fix/watcher-hang-on-start branch September 28, 2022 18:04
masseyke added a commit to masseyke/elasticsearch that referenced this pull request Sep 28, 2022
Right now if watcher throws an exception while starting up (for example a TimeoutException
while waiting for a refresh of .watches or .triggered_watches to complete) then watcher gets
into a state where it will never be restarted automatically, and is incredibly difficult to start
manually. This PR catches those exceptions and sets the state to STOPPED so that when the
next cluster change event comes through it will attempt to start watcher again.
masseyke added a commit to masseyke/elasticsearch that referenced this pull request Sep 28, 2022
Right now if watcher throws an exception while starting up (for example a TimeoutException
while waiting for a refresh of .watches or .triggered_watches to complete) then watcher gets
into a state where it will never be restarted automatically, and is incredibly difficult to start
manually. This PR catches those exceptions and sets the state to STOPPED so that when the
next cluster change event comes through it will attempt to start watcher again.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.17
8.5

elasticsearchmachine pushed a commit that referenced this pull request Sep 28, 2022
Right now if watcher throws an exception while starting up (for example a TimeoutException
while waiting for a refresh of .watches or .triggered_watches to complete) then watcher gets
into a state where it will never be restarted automatically, and is incredibly difficult to start
manually. This PR catches those exceptions and sets the state to STOPPED so that when the
next cluster change event comes through it will attempt to start watcher again.
elasticsearchmachine pushed a commit that referenced this pull request Sep 28, 2022
Right now if watcher throws an exception while starting up (for example a TimeoutException
while waiting for a refresh of .watches or .triggered_watches to complete) then watcher gets
into a state where it will never be restarted automatically, and is incredibly difficult to start
manually. This PR catches those exceptions and sets the state to STOPPED so that when the
next cluster change event comes through it will attempt to start watcher again.
javanna pushed a commit to javanna/elasticsearch that referenced this pull request Oct 4, 2022
Right now if watcher throws an exception while starting up (for example a TimeoutException
while waiting for a refresh of .watches or .triggered_watches to complete) then watcher gets
into a state where it will never be restarted automatically, and is incredibly difficult to start
manually. This PR catches those exceptions and sets the state to STOPPED so that when the
next cluster change event comes through it will attempt to start watcher again.
@csoulios csoulios added v8.5.0 and removed v8.5.1 labels Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-and-merge Automatically create backport pull requests and merge when ready >bug :Data Management/Watcher Team:Data Management Meta label for data/management team v7.17.7 v8.5.0 v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Watcher can get stuck while starting if there is an error while reading .watches or .triggered_watches
5 participants