-
Notifications
You must be signed in to change notification settings - Fork 547
[Rest Server] Update restart policy to avoid stuck pending pods #3856
Conversation
Update restart policy to avoid stuck pending pods #3760
@@ -319,7 +319,7 @@ const generateTaskRole = (taskRole, labels, config) => { | |||
}, | |||
spec: { | |||
privileged: false, | |||
restartPolicy: 'Never', | |||
restartPolicy: gangAllocation === 'true' ? 'Never' : 'OnFailure', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments for this? We should revert this change after we update k8s to 16.2 or above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
Add comments
User may not want to forever retry on failure. Could alert auto delete the pod if detected such issue? @Binyang2014 |
Maybe we can use alert-manager webhook to achieve this web-hook. Create another service receive such alert and do some actions. For better experience, admin can config this webhook's reaction when alert is received. For now, if we want to auto delete the pod. I think we should find a way to let admin know what will happen, and admin can turn off such feature if it causes other issues. |
Update restart policy to avoid stuck pending pods #3760.