Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.x] Fix supervisor reprovisioning #1288

Merged
merged 2 commits into from
Jun 21, 2023

Conversation

PrinsFrank
Copy link
Contributor

When a supervisor gets killed and the status code is not 0, 2 or 13, the supervisor tries to restart by calling the reprovision method. That restarting always fails, because the reprovision method puts the AddSupervisor on the queue, which results in the SupervisorCommand being executed. One of the first things that are done there is to check if there are already supervisors running with the same name by calling the ensureNoDuplicateSupervisors method, which checks if a supervisor with that name already exists.

At the point of restarting, that key will already exist, so reprovisioning doesn't work with the current code.

Why we were running into this issue specifically: We are running horizon on multiple k8s pods. We noticed a bunch of pods with only a few of the supervisors running after a certain amount of time. With the help of #1284 we found that we got a bunch of SuperVisorDiedEvents with exit code 137 (Indicating the pod memory limit was reached), immediately followed by the same event with exit code 13(Indicating that a supervisor with the same name already exists).

We have been running this fixed code in production, and the second event with status code 13 is now not triggered anymore in this scenario, and our supervisors gracefully restart after exiting with code 137.

This PR makes sure that the key is removed before attempting to restart the supervisor.

Also fixes the underlying issue in #1273

@PrinsFrank PrinsFrank changed the title Fix supervisor reprovisioning [5.x] Fix supervisor reprovisioning Jun 21, 2023
@PrinsFrank PrinsFrank force-pushed the fix-supervisor-reprovisioning branch from 4b95069 to f68f79d Compare June 21, 2023 13:52
@PrinsFrank PrinsFrank force-pushed the fix-supervisor-reprovisioning branch from f68f79d to 03a7adb Compare June 21, 2023 13:55
@taylorotwell taylorotwell merged commit 029cf34 into laravel:5.x Jun 21, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants