New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If openshift-infra is in terminating state when restarting server, nothing works #3274
Comments
policy can't express denies, so you couldn't prevent a cluster admin from deleting via policy |
I think all of our controller clients will cease to work if these service accounts are deleted, no? Also, its not possible to make a 'terminating' namespace stop terminating unless there is a way to nil out a DeletionTimestamp that I am missing. |
not all, but important ones |
I vote for option 2 as well. It might be useful to support a list of "protected" resources in the master-config. |
should probably move |
that would have helped clean things up, though it still might have taken a couple restarts |
I don't think we have a controller ensuring that our serviceaccounts are always present (I think its only on startup), so we might want to express: "don't delete these serviceaccounts" as well. |
not sure I care down to that level... a restart would fix that and that's unlikely to happen as a mass delete |
I need to modify upstream to expose Name on admission control since an object being deleted has no 'object' on input. Sent from my iPhone
|
As a simple fix let's have the project command ignore deletes for a set of whitespaced project names. Admin can still delete them with namespace rest API but this prevents stupid stuff. |
the project API, you mean? |
also, we should move |
The API. On Tue, Aug 18, 2015 at 10:31 AM, Jordan Liggitt notifications@github.com
Clayton Coleman | Lead Engineer, OpenShift |
Moving relevant discussion points from #4228:
|
We do have the concept of immortal namespaces in the NamespaceLifecycle admission controller. |
added openshift-infra to immortal namespaces list in #4318, I think we should close this |
I concur. Closing |
Long may openshift-infra live! |
I had a population with following:
500 projects, each project has 1 rc, 3 pods, and the associated system stuff
Fat fingered me did the following:
osc delete projects --all
This marked all the projects as terminating, but it also marked openshift-infra as terminating.
At this point, there are a lot of replication controllers trying to create pods, but being rejected by admission control, and that was fine BUT when the openshift-infra namespace was being purged, it deleted the service accounts. The service accounts were used by the replication controller to function. This results in log messages that say the replication controller needs to present credentials.
I then was wondering what was going on, and saw that openshift-infra was terminating. It may have eventually terminated if I left it running, but there was a lot of client traffic contention, so I restarted openshift.
At this point, openshift had 498 projects terminating, and the openshift-infra project had no service accounts. You then get a message in log reporting the following and openshift fails to start:
I realize this was an operator error, but there is no way to recover, and I am strongly inclined to believe that I will not be the first operator to make this error.
I think one of the following:
@smarterclayton @liggitt @deads2k - opinions on preferred route? I vote for 2.
The text was updated successfully, but these errors were encountered: