[5.x] Fix supervisor reprovisioning #1288

PrinsFrank · 2023-06-21T13:46:22Z

When a supervisor gets killed and the status code is not 0, 2 or 13, the supervisor tries to restart by calling the reprovision method. That restarting always fails, because the reprovision method puts the AddSupervisor on the queue, which results in the SupervisorCommand being executed. One of the first things that are done there is to check if there are already supervisors running with the same name by calling the ensureNoDuplicateSupervisors method, which checks if a supervisor with that name already exists.

At the point of restarting, that key will already exist, so reprovisioning doesn't work with the current code.

Why we were running into this issue specifically: We are running horizon on multiple k8s pods. We noticed a bunch of pods with only a few of the supervisors running after a certain amount of time. With the help of #1284 we found that we got a bunch of SuperVisorDiedEvents with exit code 137 (Indicating the pod memory limit was reached), immediately followed by the same event with exit code 13(Indicating that a supervisor with the same name already exists).

We have been running this fixed code in production, and the second event with status code 13 is now not triggered anymore in this scenario, and our supervisors gracefully restart after exiting with code 137.

This PR makes sure that the key is removed before attempting to restart the supervisor.

Also fixes the underlying issue in #1273

PrinsFrank changed the title ~~Fix supervisor reprovisioning~~ [5.x] Fix supervisor reprovisioning Jun 21, 2023

PrinsFrank force-pushed the fix-supervisor-reprovisioning branch from 4b95069 to f68f79d Compare June 21, 2023 13:52

Fix supervisor reprovisioning

03a7adb

PrinsFrank force-pushed the fix-supervisor-reprovisioning branch from f68f79d to 03a7adb Compare June 21, 2023 13:55

Update SupervisorProcess.php

8b53dfd

taylorotwell merged commit 029cf34 into laravel:5.x Jun 21, 2023
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[5.x] Fix supervisor reprovisioning #1288

[5.x] Fix supervisor reprovisioning #1288

PrinsFrank commented Jun 21, 2023

[5.x] Fix supervisor reprovisioning #1288

[5.x] Fix supervisor reprovisioning #1288

Conversation

PrinsFrank commented Jun 21, 2023