scheduler: Host checks don't detect graceful shutdown #2182

lmars · 2015-11-26T03:27:41Z

If a flynn-host daemon is gracefully shutting down, it will stop heartbeating before stopping it's jobs, and also before closing the HTTP listener, which means when the scheduler receives the down event, it will successfully get the status from the host (because it is still listening for HTTP requests), so marks it as healthy again. Then, whilst all the jobs are stopping, the scheduler tries to restart them all because it think the host is healthy.

I think having the host should stop responding to status requests when it is shutting down, specifically before it stops heartbeating.

lmars · 2015-11-26T03:38:37Z

FYI due to #1922 (which I have fixed in #2171), when the host is finally marked as down, the "crashed" jobs are still in-memory, so a rectify thinks we now have too many jobs (e.g. 4 discoverd jobs, 1 of them crashed, rather than 3), and then kills a running job, which breaks the cluster.

I am noting this to make sure it is considered when testing that this issue is fixed.

Fixes #2182. Signed-off-by: Lewis Marshall <lewis@lmars.net>

lmars · 2016-02-07T04:18:24Z

I have made a potential fix in #2421 which sets shutdown=true in the host's service metadata, causing the scheduler to unfollow the host immediately on receipt of the down event, thus avoiding re-following and attempting to restart any jobs on that host.

lmars added kind/bug component/host component/scheduler labels Nov 26, 2015

lmars added a commit that referenced this issue Feb 7, 2016

host,scheduler: Handle graceful host shutdown using service metadata

1542c91

Fixes #2182. Signed-off-by: Lewis Marshall <lewis@lmars.net>

lmars mentioned this issue Feb 7, 2016

Test fixes #2421

Merged

lmars closed this as completed in #2421 Feb 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: Host checks don't detect graceful shutdown #2182

scheduler: Host checks don't detect graceful shutdown #2182

lmars commented Nov 26, 2015

lmars commented Nov 26, 2015

lmars commented Feb 7, 2016

scheduler: Host checks don't detect graceful shutdown #2182

scheduler: Host checks don't detect graceful shutdown #2182

Comments

lmars commented Nov 26, 2015

lmars commented Nov 26, 2015

lmars commented Feb 7, 2016