Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Enable control plane deployment restart on demand #376

Closed
csrwng opened this issue Jul 23, 2021 · 4 comments · Fixed by #2150 or #3245
Closed

Proposal: Enable control plane deployment restart on demand #376

csrwng opened this issue Jul 23, 2021 · 4 comments · Fixed by #2150 or #3245
Assignees

Comments

@csrwng
Copy link
Contributor

csrwng commented Jul 23, 2021

Service providers need a way to restart control plane pods on demand (See #236). When control plane pods reach a bad state, it should be possible to determine that this is the case through liveness probes and let the system restart them automatically. Any other bad state that is not detectable is a bug that should be fixed. However, reality is that there will be times when a restart of the pods is needed to clear wrong state.

When using kubectl to restart a deployment through kubectl rollout restart, the result of the command is the addition of an annotation (kubectl.kubernetes.io/restartedAt) to the deployment's pod template:
https://github.com/kubernetes/kubectl/blob/47df52af297ea787c44f3d4d7da11e7e4e0d83a8/pkg/polymorphichelpers/objectrestarter.go#L44-L52

For Hypershift, we can do something similar, in one of 2 ways:

  1. We allow a hypershift.openshift.io/restartedAt annotation in the HostedCluster to be propagated to the HostedControlPlane and then to each deployment's pod template in the control plane. Then we can expose a similar command to the kubectl command in the hypershift CLI: hypershift restart hostedcluster NAME -n NAMESPACE that will populate this annotation.

  2. We add a field to the spec of HostedCluster that allows us to either a) specify a restartedAt value, or b) an integer that can be incremented, resulting in a new restartedAt value we can store in status.

Given that this is not necessarily part of the spec of a HostedCluster, my preference would be to go with option 1.

@relyt0925
Copy link
Contributor

I agree that I prefer option 1

@relyt0925
Copy link
Contributor

/assign @csrwng

@relyt0925
Copy link
Contributor

/unassign

@ironcladlou
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants