Skip to content
This repository has been archived by the owner on Aug 17, 2023. It is now read-only.

possible race/stall condition in operator when namespace terminating #465

Open
thoraxe opened this issue Dec 2, 2020 · 0 comments
Open

Comments

@thoraxe
Copy link

thoraxe commented Dec 2, 2020

Scenario:

Create namespace ns1
Create a kfdef in ns1
Delete namespace ns1, which puts it into terminating status
Create namespace ns2
Create a kfdef in ns2

Result:
The operator gets stalled/hung/races on resources. For example:

time="2020-12-02T14:34:11Z" level=warning msg="Encountered error applying application jupyterhub:  (kubeflow.error): Code 500 with message: Apply.Run : [error when creating \"/tmp/kout449770606\": secrets \"jupyterhub\" is forbidden: unable to create new content in namespace workflowsz because it is being terminated, error when creating \"/tmp/kout449770606\": configmaps \"parameters\" is forbidden: unable to create new content in namespace workflowsz because it is being terminated, error when creating \"/tmp/kout449770606\": configmaps \"spark-cluster-template\" is forbidden: unable to create new content in namespace workflowsz because it is being terminated, error when creating \"/tmp/kout449770606\": configmaps \"jupyter-singleuser-profiles\" is forbidden: unable to create new content in namespace workflowsz because it is being terminated, error when creating \"/tmp/kout449770606\": configmaps \"jupyterhub-cfg\" is forbidden: unable to create new content in namespace workflowsz because it is being termi...
time="2020-12-02T14:34:11Z" level=warning msg="Will retry in 19 seconds."
time="2020-12-02T14:34:31Z" level=warning msg="Encountered error applying application jupyterhub:  (kubeflow.error): Code 500 with message: Apply.Run : [error when creating \"/tmp/kout516991221\": secrets \"jupyterhub\" is forbidden: unable to create new content in namespace workflowsz because it is being terminated, error when creating \"/tmp/kout516991221\": configmaps \"parameters\" is forbidden: unable to create new content in namespace workflowsz because it is being terminated, error when creating \"/tmp/kout516991221\": configmaps \"spark-cluster-template\" is forbidden: unable to create new content in namespace workflowsz because it is being terminated, error when creating \"/tmp/kout516991221\": configmaps \"jupyter-singleuser-profiles\" is forbidden: unable to create new content in namespace workflowsz because it is being terminated, error when creating \"/tmp/kout516991221\": configmaps \"jupyterhub-cfg\" is forbidden: unable to create new content in namespace workflowsz because it is being termi...
time="2020-12-02T14:34:31Z" level=warning msg="Will retry in 38 seconds."

For whatever reason, the kfdef hangs around in ns1. As soon as I deleted the kfdef from ns1, the operator got unblocked, happily deleted the resources, and then finally created the resources in ns2.

In other words, the operator was stuck waiting for ns1/kfdef to be happy before it would go on to create the resources in ns2.

Suggestion:
If the operator notices that a namespace is in the Terminating state, it should delete the kfdef, which would then have the operator delete the resources, and etc.

OR

Don't get stuck/hung up on errors in one namespace preventing other namespaces from working.

@kubeflow-bot kubeflow-bot added this to To Do in Needs Triage Dec 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Development

No branches or pull requests

1 participant