Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If openshift-infra is in terminating state when restarting server, nothing works #3274

Closed
derekwaynecarr opened this issue Jun 17, 2015 · 18 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/P2

Comments

@derekwaynecarr
Copy link
Member

I had a population with following:

500 projects, each project has 1 rc, 3 pods, and the associated system stuff

Fat fingered me did the following:

osc delete projects --all

This marked all the projects as terminating, but it also marked openshift-infra as terminating.

At this point, there are a lot of replication controllers trying to create pods, but being rejected by admission control, and that was fine BUT when the openshift-infra namespace was being purged, it deleted the service accounts. The service accounts were used by the replication controller to function. This results in log messages that say the replication controller needs to present credentials.

I then was wondering what was going on, and saw that openshift-infra was terminating. It may have eventually terminated if I left it running, but there was a lot of client traffic contention, so I restarted openshift.

At this point, openshift had 498 projects terminating, and the openshift-infra project had no service accounts. You then get a message in log reporting the following and openshift fails to start:

Jun 17 17:09:34 openshiftdev.local openshift[30991]: F0617 17:09:34.334335   30991 start_master.go:403] Could not get client for replication controller: Could not get token for openshift-infra/re...on-controller
Ju

I realize this was an operator error, but there is no way to recover, and I am strongly inclined to believe that I will not be the first operator to make this error.

I think one of the following:

  1. policy should never allow that namespace to be deleted
  2. admission control should never allow that namespace to be deleted (since its essential to system function)

@smarterclayton @liggitt @deads2k - opinions on preferred route? I vote for 2.

@liggitt
Copy link
Contributor

liggitt commented Jun 17, 2015

policy can't express denies, so you couldn't prevent a cluster admin from deleting via policy

@derekwaynecarr
Copy link
Member Author

I think all of our controller clients will cease to work if these service accounts are deleted, no?

Also, its not possible to make a 'terminating' namespace stop terminating unless there is a way to nil out a DeletionTimestamp that I am missing.

@liggitt
Copy link
Contributor

liggitt commented Jun 17, 2015

I think all of our controller clients will cease to work if these service accounts are deleted, no?

not all, but important ones

@deads2k
Copy link
Contributor

deads2k commented Jun 17, 2015

I vote for option 2 as well. It might be useful to support a list of "protected" resources in the master-config.

@liggitt
Copy link
Contributor

liggitt commented Jun 17, 2015

should probably move openshiftConfig.RunOriginNamespaceController() up to the special list of controllers that get started first

@liggitt
Copy link
Contributor

liggitt commented Jun 17, 2015

that would have helped clean things up, though it still might have taken a couple restarts

@deads2k
Copy link
Contributor

deads2k commented Jun 17, 2015

I don't think we have a controller ensuring that our serviceaccounts are always present (I think its only on startup), so we might want to express: "don't delete these serviceaccounts" as well.

@liggitt
Copy link
Contributor

liggitt commented Jun 17, 2015

not sure I care down to that level... a restart would fix that and that's unlikely to happen as a mass delete

@danmcp danmcp added kind/bug Categorizes issue or PR as related to a bug. priority/P2 labels Jun 17, 2015
@derekwaynecarr
Copy link
Member Author

I need to modify upstream to expose Name on admission control since an object being deleted has no 'object' on input.

Sent from my iPhone

On Jun 17, 2015, at 1:43 PM, Dan McPherson notifications@github.com wrote:

Assigned #3274 to @derekwaynecarr.


Reply to this email directly or view it on GitHub.

@smarterclayton
Copy link
Contributor

As a simple fix let's have the project command ignore deletes for a set of whitespaced project names. Admin can still delete them with namespace rest API but this prevents stupid stuff.

@liggitt
Copy link
Contributor

liggitt commented Aug 18, 2015

the project command ignore deletes

the project API, you mean?

@liggitt
Copy link
Contributor

liggitt commented Aug 18, 2015

also, we should move RunOriginNamespaceController() up to the special list of controllers that get started first, so it can clean things up even if the service account token fetcher has issues

@smarterclayton
Copy link
Contributor

The API.

On Tue, Aug 18, 2015 at 10:31 AM, Jordan Liggitt notifications@github.com
wrote:

also, we should move RunOriginNamespaceController() up to the special
list of controllers that get started first, so it can clean things up even
if the service account token fetcher has issues


Reply to this email directly or view it on GitHub
#3274 (comment).

Clayton Coleman | Lead Engineer, OpenShift

@liggitt
Copy link
Contributor

liggitt commented Aug 18, 2015

Moving relevant discussion points from #4228:

  1. Move RunOriginNamespaceController() to first controller group
  2. Prevent delete project --all from deleting "default", "openshift-infra" (configurable), "kube-system" (maybe?), and others? Possible mechanisms:
    • annotation on a namespace. requires fetch before delete
    • special list of projects to disallow deleting via the API

@derekwaynecarr
Copy link
Member Author

We do have the concept of immortal namespaces in the NamespaceLifecycle admission controller.

@liggitt
Copy link
Contributor

liggitt commented May 10, 2016

added openshift-infra to immortal namespaces list in #4318, I think we should close this

@derekwaynecarr
Copy link
Member Author

I concur. Closing

@derekwaynecarr
Copy link
Member Author

Long may openshift-infra live!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/P2
Projects
None yet
Development

No branches or pull requests

6 participants