Handling timeout exceptions on watcher startup #90421

masseyke · 2022-09-27T17:12:05Z

Right now if watcher throws an exception while starting up (for example a TimeoutException while waiting for a refresh of .watches or .triggered_watches to complete) then watcher gets into a state where it will never be restarted automatically, and is incredibly difficult to start manually. This PR catches those exceptions and sets the state to STOPPED so that when the next cluster change event comes through it will attempt to start watcher again.
Closes #44981
Relates #69482

elasticsearchmachine · 2022-09-27T17:13:08Z

Hi @masseyke, I've created a changelog YAML for you.

masseyke · 2022-09-27T18:18:00Z

@elasticmachine update branch

elasticsearchmachine · 2022-09-28T17:17:21Z

Pinging @elastic/es-data-management (Team:Data Management)

jakelandis

LGTM

Right now if watcher throws an exception while starting up (for example a TimeoutException while waiting for a refresh of .watches or .triggered_watches to complete) then watcher gets into a state where it will never be restarted automatically, and is incredibly difficult to start manually. This PR catches those exceptions and sets the state to STOPPED so that when the next cluster change event comes through it will attempt to start watcher again.

elasticsearchmachine · 2022-09-28T18:05:44Z

💚 Backport successful

Status	Branch	Result
✅	7.17
✅	8.5

Right now if watcher throws an exception while starting up (for example a TimeoutException while waiting for a refresh of .watches or .triggered_watches to complete) then watcher gets into a state where it will never be restarted automatically, and is incredibly difficult to start manually. This PR catches those exceptions and sets the state to STOPPED so that when the next cluster change event comes through it will attempt to start watcher again.

masseyke added 2 commits September 27, 2022 10:54

Handling exceptions on watcher start

4066b1f

adding a unit test

3c01836

masseyke added the :Data Management/Watcher label Sep 27, 2022

elasticsearchmachine added the v8.6.0 label Sep 27, 2022

masseyke added >bug v8.5.0 labels Sep 27, 2022

Update docs/changelog/90421.yaml

3b724f7

Merge branch 'main' into fix/watcher-hang-on-start

b903d55

masseyke marked this pull request as ready for review September 28, 2022 17:16

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Sep 28, 2022

jakelandis approved these changes Sep 28, 2022

View reviewed changes

masseyke added v7.17.7 v8.5.1 auto-backport-and-merge Automatically create backport pull requests and merge when ready and removed v8.5.0 labels Sep 28, 2022

masseyke merged commit 45a6490 into elastic:main Sep 28, 2022

masseyke deleted the fix/watcher-hang-on-start branch September 28, 2022 18:04

This was referenced Sep 28, 2022

[7.17] Handling timeout exceptions on watcher startup (#90421) #90480

Merged

[8.5] Handling timeout exceptions on watcher startup (#90421) #90481

Merged

csoulios added v8.5.0 and removed v8.5.1 labels Nov 1, 2022

sakurai-youhei mentioned this pull request Feb 12, 2024

Watcher should retry a failed reload #69842

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling timeout exceptions on watcher startup #90421

Handling timeout exceptions on watcher startup #90421

masseyke commented Sep 27, 2022

elasticsearchmachine commented Sep 27, 2022

masseyke commented Sep 27, 2022

elasticsearchmachine commented Sep 28, 2022

jakelandis left a comment

elasticsearchmachine commented Sep 28, 2022

Handling timeout exceptions on watcher startup #90421

Handling timeout exceptions on watcher startup #90421

Conversation

masseyke commented Sep 27, 2022

elasticsearchmachine commented Sep 27, 2022

masseyke commented Sep 27, 2022

elasticsearchmachine commented Sep 28, 2022

jakelandis left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Sep 28, 2022

💚 Backport successful