Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter-role error applying kubeflow-core component with ksonnet #1353

Closed
IMBurbank opened this issue Aug 11, 2018 · 14 comments
Closed

Jupyter-role error applying kubeflow-core component with ksonnet #1353

IMBurbank opened this issue Aug 11, 2018 · 14 comments

Comments

@IMBurbank
Copy link
Contributor

IMBurbank commented Aug 11, 2018

Hi,

I've been trying different Kubeflow tutorials for over a week, just trying to get anything working so I can upgrade the data pipeline and model from there.

I'm currently trying this tutorial on Google Cloud and keep getting the following error:

bassmanburbank@mnist-kubeflow:~/ksonnet-kubeflow-v2$ ks apply cloud -c kubeflow-core
INFO Applying configmaps default.kubeflow-version
INFO Creating non-existent configmaps default.kubeflow-version
INFO Applying services default.tf-hub-0
INFO Creating non-existent services default.tf-hub-0
INFO Applying services default.tf-hub-lb
INFO Creating non-existent services default.tf-hub-lb
INFO Applying clusterrolebindings default.centraldashboard
INFO Creating non-existent clusterrolebindings default.centraldashboard
INFO Applying roles default.jupyter-role
INFO Creating non-existent roles default.jupyter-role
ERROR handle object: creating object: creating object: roles.rbac.authorization.k8s.io "jupyter-role" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["list"]}] user=&{BassManBurbank@gmail.com  [system:authenticated] map[user-assertion.cloud.google.com:[AL7tVDfQsftDTy7sio55iXjLZCVRS+HSmDXfHSj6FbslEgeYPdLq2WO068Zoiacsx4Qnba6tyiKtkOerSwXooKO1nmpew8mggowrM7Ugj1TjNKSsUd8ZPBAiy3n7b0I5ImaPefHaRxSEbgeeORgJ4t52npcHBK0q3UYKKnuTyxk5lDarRt7H9OydcATAMGYSQNjnSdLIdkKh9DY4Fw7fYTJTnykeR7sxwa19BnwdpXqqLw==]]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews" "selfsubjectrulesreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swagger-2.0.0.pb-v1" "/swagger.json" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]

All issues I've found here recommend the following commands:
gcloud container clusters get-credentials $KUBENAME --zone $KUBEZONE

kubectl create clusterrolebinding default-admin \ --clusterrole=cluster-admin --user=$(gcloud config get-value account)

But these commands are included in the tutorial and haven't helped me. I've wiped out this project and retried several times. I'm at a loss...

My only divergences are trying newer versions of the packages, since the versions in the tutorial seemed quite old.

Versions I used:
Kubernetes: (GKE Default) v1.9.7
Ksonnet: ks_0.12.0_linux_amd64
Kubeflow-core: v0.2.2
Kubeflow-TFserving: v0.2.2
TF-Job: NONE (appears to now be in Kubeflow-core)

Any help regarding this issue would be deeply appreciated!

@IMBurbank IMBurbank changed the title Kubeflow Startup Error [ERROR handle object: creating object: creating object: roles.rbac.authorization.k8s.io "jupyter-role" is forbidden] Kubeflow-core Error [ERROR handle object: creating object: creating object: roles.rbac.authorization.k8s.io "jupyter-role" is forbidden] Aug 11, 2018
@IMBurbank
Copy link
Contributor Author

IMBurbank commented Aug 13, 2018

I have now tried this from scratch using the following version combinations and received the same error:

Versions:
Kubernetes: (GKE Default) v1.9.7
Ksonnet: ks_0.9.2_linux_amd64
Kubeflow-core: v0.2.2
Kubeflow-TFserving: v0.2.2
TF-Job: NONE (appears to now be in Kubeflow-core)

Versions (default versions from tutorial):
Kubernetes: (GKE Default) v1.9.7
Ksonnet: ks_0.9.2_linux_amd64
Kubeflow-core: v0.1.0-rc.0
Kubeflow-TFserving: v0.1.0-rc.0
TF-Job: v0.1.0-rc.0

@IMBurbank IMBurbank changed the title Kubeflow-core Error [ERROR handle object: creating object: creating object: roles.rbac.authorization.k8s.io "jupyter-role" is forbidden] Error Applying Kubeflow-Core Component with KSonnet Aug 13, 2018
@IMBurbank IMBurbank changed the title Error Applying Kubeflow-Core Component with KSonnet Jupyter-Role Error Applying Kubeflow-Core Component with KSonnet Aug 13, 2018
@IMBurbank IMBurbank changed the title Jupyter-Role Error Applying Kubeflow-Core Component with KSonnet Jupyter-role error applying kubeflow-core component with ksonnet Aug 13, 2018
@IMBurbank
Copy link
Contributor Author

IMBurbank commented Aug 13, 2018

Digging through this repo as best I could, I figured that maybe the problem was that I needed to use KSonnet 0.11.0 instead of v0.9.2 as in the tutorial or the v0.12.0 current release.

So... I started from scratch and tried this tutorial again using KS_VER=ks_0.11.0_linux_amd64

This time when I tried to apply kubeflow-core, it froze for over a minute before the process was killed. When I tried again, the same jupyter-role error came back.

bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks env add cloud
INFO Using context "gke_mnist-kubeflow_us-central1-a_kubeflow-codelab" from kubeconfig file "/home/bassmanburbank/.kube/config"
INFO Creating environment "cloud" with namespace "default", pointing to cluster at address "https://35.226.237.64"
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ VERSION=v0.2.2
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks pkg install kubeflow/core@${VERSION}
INFO Retrieved 33 files
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks pkg install kubeflow/tf-serving@${VERSION}
INFO Retrieved 5 files
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks generate core kubeflow-core --name=kubeflow-core --cloud=gke
INFO Writing component at '/home/bassmanburbank/kubeflow-introduction/ksonnet-kubeflow/components/kubeflow-core.jsonnet'
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks apply cloud -c kubeflow-core
Killed
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks apply cloud -c kubeflow-core
INFO Updating configmaps default.kubeflow-version
INFO Updating services default.tf-hub-0
INFO Updating services default.tf-hub-lb
INFO Updating clusterrolebindings default.centraldashboard
INFO Updating roles default.jupyter-role
INFO  Creating non-existent roles default.jupyter-role
ERROR handle object: can't update roles default.jupyter-role: roles.rbac.authorization.k8s.io "jupyter-role" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["list"]}] user=&{BassManBurbank@gmail.com  [system:authenticated] map[user-assertion.cloud.google.com:[AL7tVDfqotXq7gjdNEb85td5BUjR8wpwyGtlUBrRpCs+dZI5IiJVKDkS7x2tRme1JU6CS1dqSoIKyJScAzq482gBbE2SrxphPufhwsY7LO/Vn6YQT23DBY5G53wSvF5GkogFMj96l3lG4VgTvevqyi5jIy0Zeeq6vkRWYcVzYXcVhFAe5YRIlhv0KRU6d4fea1IkHR1qnFVeXFuB3uiPqbDSkOJ7bPto7a8bFvSyVnbD4g==]]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews" "selfsubjectrulesreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swagger-2.0.0.pb-v1" "/swagger.json" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]

Versions:
Kubernetes: (GKE Default) v1.9.7
Ksonnet: ks_0.11.0_linux_amd64
Kubeflow-core: v0.2.2
Kubeflow-TFserving: v0.2.2
TF-Job: NONE (appears to now be in Kubeflow-core)

I am at an absolute loss, and all my machine learning work is for naught until I can actually deploy to a production environment.

Please help.

@jlewi
Copy link
Contributor

jlewi commented Aug 13, 2018

See here:
https://www.kubeflow.org/docs/guides/troubleshooting/#rbac-clusters

You need to create an RBAC role to grant the user running the deployment sufficient permission to create the resources.

@IMBurbank
Copy link
Contributor Author

Hi,

Thank you for the response.

I tried gleaning everything I could from the troubleshooting section. When I read that section, the only suggestion I saw was making sure to create an appropriate clusterrole binding using the command:

kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --user=your-user@acme.com

Every time I try again from scratch, I've used the command:

kubectl create clusterrolebinding default-admin \ --clusterrole=cluster-admin --user=$(gcloud config get-value account)

to include my user account.

When you say that I need to create an RBAC role, are you referring to a step other than the one I pasted above?

I read through the GCloud RBAC section but didn't make it any further in understanding how to resolve this RBAC issue. It seems that from a clean slate, there isn't anything else that I'm supposed to do. So, I'm obviously missing something. And yet, I'm at a complete loss...

@jlewi
Copy link
Contributor

jlewi commented Aug 13, 2018

When you run deploy.sh this should create a bunch of directories e.g
${DEPLOYMENT}_deployment_manager_configs
${DEPLOYMENT}_ks_app

At this point your GKE cluster should exist. And you have your Kubeflow ksonnet app in ${DEPLOYMENT}_ks_app.

At this point if you're having trouble deploying the ksonnet APP due to the RBAC issue you should create an RBAC binding as mentioned above and then update the ksonnet app.

e.g.

kubectl create clusterrolebinding default-admin \ --clusterrole=cluster-admin --user=$(gcloud config get-value account)
cd ${DEPLOYMENT}_ks_app
ks apply default

If you look at the original RBAC error you can see the user it is using

user=&{BassManBurbank@gmail.com

So make sure you are creating an RBAC role for that account.

@IMBurbank
Copy link
Contributor Author

IMBurbank commented Aug 13, 2018

I don't run a deploy.sh script in this tutorial (that I know of), and don't understand how it would be incorporated.

That user account is the correct one, and I'm running the command
kubectl create clusterrolebinding default-admin \ --clusterrole=cluster-admin --user=$(gcloud config get-value account)
every time I've tried the tutorial.

However, the ks apply default command isn't mentioned in the tutorial, so I tried again, just from the point where I initialize the KSonnet app. I got the same error as before (output below).

I still don't fully understand if you're implying that I should be doing something more than kubectl create clusterrolebinding default-admin \ --clusterrole=cluster-admin --user=$(gcloud config get-value account) to create an RBAC role for this account.

When I tried ks apply default right after entering the initialized ksonnet directory, nothing seemed to happen. After installing kubeflow packages and failing to apply the core, I tried ks apply default again, and got the same error.

Output for this attempt:

bassmanburbank@mnist-kubeflow:~/kubeflow-introduction$ ls
CONTRIBUTING.md  initialized-ksonnet-kubeflow.tar.gz  ks_0.11.0_linux_amd64  ks_0.11.0_linux_amd64.tar.gz  ksonnet-kubeflow  LICENSE  README.md  tensorflow-model  web-ui
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction$ rm -r ksonnet-kubeflow/
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction$ echo $KS_VER
ks_0.11.0_linux_amd64
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction$ echo $VERSION
v0.2.2
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction$ ls
CONTRIBUTING.md  initialized-ksonnet-kubeflow.tar.gz  ks_0.11.0_linux_amd64  ks_0.11.0_linux_amd64.tar.gz  LICENSE  README.md  tensorflow-model  web-ui
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction$ gcloud container clusters get-credentials kubeflow-codelab --zone us-central1-a
Fetching cluster endpoint and auth data.
kubeconfig entry generated for kubeflow-codelab.
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction$ kubectl create clusterrolebinding default-admin \
>       --clusterrole=cluster-admin --user=$(gcloud config get-value account)
Your active configuration is: [cloudshell-3442]
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "default-admin" already exists
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction$ ks init ksonnet-kubeflow
INFO Using context "gke_mnist-kubeflow_us-central1-a_kubeflow-codelab" from kubeconfig file "/home/bassmanburbank/.kube/config"
INFO Creating environment "default" with namespace "default", pointing to cluster at address "https://35.226.237.64"
INFO Generating ksonnet-lib data at path '/home/bassmanburbank/kubeflow-introduction/ksonnet-kubeflow/lib/v1.9.7'
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction$ cd ksonnet-kubeflow/
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks env add cloud
INFO Using context "gke_mnist-kubeflow_us-central1-a_kubeflow-codelab" from kubeconfig file "/home/bassmanburbank/.kube/config"
INFO Creating environment "cloud" with namespace "default", pointing to cluster at address "https://35.226.237.64"
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks apply default
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks pkg install kubeflow/core@${VERSION}
INFO Retrieved 33 files
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks pkg install kubeflow/tf-serving@${VERSION}
INFO Retrieved 5 files
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks generate core kubeflow-core --name=kubeflow-core --cloud=gke
INFO Writing component at '/home/bassmanburbank/kubeflow-introduction/ksonnet-kubeflow/components/kubeflow-core.jsonnet'
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks apply cloud -c kubeflow-core
INFO Updating configmaps default.kubeflow-version
INFO Updating services default.tf-hub-0
INFO Updating services default.tf-hub-lb
INFO Updating clusterrolebindings default.centraldashboard
INFO Updating roles default.jupyter-role
INFO  Creating non-existent roles default.jupyter-role
ERROR handle object: can't update roles default.jupyter-role: roles.rbac.authorization.k8s.io "jupyter-role" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["list"]}] user=&{BassManBurbank@gmail.com  [system:authenticated] map[user-assertion.cloud.google.com:[AL7tVDdUmliL3H50Ol6TaUXV0Iq2QTkM2kSvGCZQxFVeoaORs+zR3NHUfkH+Asr2Zuu+ZWlVy4N8pMysWLx/BWTH0vfIj/Tf97gfCJo4abOzS6Y/1PxEJUYgaU58woUCX92PmTfCqIBWijxCogdfbPKc6g8F4iR6LWTUeSjcMTOx7Ptr9Gg/HxecdLMAv9VjLh1DRLqeKxR0Nz666TyChLVIs9+k1CAdUNktuz02O36bRA==]]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews" "selfsubjectrulesreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swagger-2.0.0.pb-v1" "/swagger.json" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks apply default
INFO Updating configmaps default.kubeflow-version
INFO Updating services default.tf-hub-0
INFO Updating services default.tf-hub-lb
INFO Updating clusterrolebindings default.centraldashboard
INFO Updating roles default.jupyter-role
INFO  Creating non-existent roles default.jupyter-role
ERROR handle object: can't update roles default.jupyter-role: roles.rbac.authorization.k8s.io "jupyter-role" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["delete"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["list"]}] user=&{BassManBurbank@gmail.com  [system:authenticated] map[user-assertion.cloud.google.com:[AL7tVDc7cmzo/rROM4W4gTUlQFAUO86vo0yhSUtxNUwAhlsH/SVgtpO4uTeXSKmh7qxAnI4urYXyq18QBpCgBJMEJeuO3msRuWLzLVWRjVTM9Da2BLuz8RcH6njLzfpD4QmbCOkiz7rom9jEXqVgJ9v5JkodtgpWcizxGa0N5UOiNUBTAFl3/PSSeXFDTJv0tlQm2m9YEQnVEUha3i0olmJiSbzyWu/WGaavAldvlXv6nQ==]]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews" "selfsubjectrulesreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swagger-2.0.0.pb-v1" "/swagger.json" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]

If there are any other working tutorials I could complete from my default GC platform, I'll abandon this particular one in a second. If there's something important that I'm not or mis-reading, let me know and I'll happily start again from the beginning.

For now, I'll nuke my Google shell environment and try a few more combinations of app versions and setup scripts (as often as I can within GitHub API rate-limits) to try to get a basic working environment.

@jlewi
Copy link
Contributor

jlewi commented Aug 13, 2018

You're getting an RBAC error. The way to fix this is to fix the RBAC error by creating an appropriate cluster binding. We need to figure out why you don't have the appropriate RBAC permissions to create the JupyterHub role.

The output shows your attempt to create the cluster role binding failed

kubectl create clusterrolebinding default-admin \
>       --clusterrole=cluster-admin --user=$(gcloud config get-value account)
Your active configuration is: [cloudshell-3442]
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "default-admin" already exists

I would suggest running the command

 kubectl get clusterrolebinding -o yaml default-admin

To see whether the clusterrolebinding is actually bound to user BassManBurbank@gmail.com

Can you also run the following command and paste the output into the issue.

gcloud config list

@IMBurbank
Copy link
Contributor Author

IMBurbank commented Aug 13, 2018

I believe I got that error because this was the first time I hadn't started from a fresh shell environment and kubernetes cluster. I had previously executed the command to set my user as the default-admin.

My previous cloud shell session timed out. I reconnected and here's the config:

bassmanburbank@mnist-kubeflow:~$ gcloud config list
[component_manager]
disable_update_check = True
[compute]
gce_metadata_read_timeout_sec = 5
[core]
account = bassmanburbank@gmail.com
disable_usage_reporting = False
project = mnist-kubeflow
[metrics]
environment = devshell

Your active configuration is: [cloudshell-5523]

And the clusterrolebinding output:

bassmanburbank@mnist-kubeflow:~$ kubectl get clusterrolebinding -o yaml default-admin
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2018-08-13T00:44:53Z
  name: default-admin
  resourceVersion: "622"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/default-admin
  uid: 17c30947-9e92-11e8-bd5a-42010a800248
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: bassmanburbank@gmail.com

@jlewi
Copy link
Contributor

jlewi commented Aug 13, 2018

That looks correct.

Do you still have your ksonnet application somewhere? i.e. the directory

~/kubeflow-introduction/ksonnet-kubeflow

Lets confirm that ksonnet is pointing to the correct K8s cluster; what is the output of?

cd ~/kubeflow-introduction/ksonnet-kubeflow
ks env list

and what is the output of

kubectl cluster-info

Are they both using the same Kubernetes master?

Can you also run

cd ~/kubeflow-introduction/ksonnet-kubeflow
ks show default

and share the output.

@jlewi
Copy link
Contributor

jlewi commented Aug 13, 2018

BTW if you join slack it might be easier to debug this in chat
https://kubeflow.slack.com/

@IMBurbank
Copy link
Contributor Author

IMBurbank commented Aug 13, 2018

I found the link to get a slack invite in the Kubeflow docs Community section and entered my email. I apologize for my ignorance. I'll join as soon as the invite comes through.

In the meantime, here's the output:

bassmanburbank@mnist-kubeflow:~$ cd ~/kubeflow-introduction/ksonnet-kubeflow
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks env list
NAME    OVERRIDE KUBERNETES-VERSION NAMESPACE SERVER
====    ======== ================== ========= ======
cloud            v1.9.7             default   https://35.226.237.64
default          v1.9.7             default   https://35.226.237.64
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ kubectl cluster-info
Kubernetes master is running at https://35.226.237.64
GLBCDefaultBackend is running at https://35.226.237.64/api/v1/namespaces/kube-system/services/default-http-backend:http/proxy
Heapster is running at https://35.226.237.64/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://35.226.237.64/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://35.226.237.64/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
Metrics-server is running at https://35.226.237.64/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ cd ~/kubeflow-introduction/ksonnet-kubeflow
bassmanburbank@mnist-kubeflow:~/kubeflow-introduction/ksonnet-kubeflow$ ks show default
---
apiVersion: v1
data:
  jupyterhub_config.py: |
    import json
    import os
    from kubespawner.spawner import KubeSpawner
    from jhub_remote_user_authenticator.remote_user_auth import RemoteUserAuthenticator
    from oauthenticator.github import GitHubOAuthenticator


    class KubeFormSpawner(KubeSpawner):

        # relies on HTML5 for image datalist
        def _options_form_default(self):
            global registry, repoName
            return '''
        <label for='image'>Image</label>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
        <input list="image" name="image" placeholder='repo/image:tag'>
        <datalist id="image">
          <option value="{0}/{1}/tensorflow-1.4.1-notebook-cpu:v0.2.1">
          <option value="{0}/{1}/tensorflow-1.4.1-notebook-gpu:v0.2.1">
          <option value="{0}/{1}/tensorflow-1.5.1-notebook-cpu:v0.2.1">
          <option value="{0}/{1}/tensorflow-1.5.1-notebook-gpu:v0.2.1">
          <option value="{0}/{1}/tensorflow-1.6.0-notebook-cpu:v0.2.1">
          <option value="{0}/{1}/tensorflow-1.6.0-notebook-gpu:v0.2.1">
          <option value="{0}/{1}/tensorflow-1.7.0-notebook-cpu:v0.2.1">
          <option value="{0}/{1}/tensorflow-1.7.0-notebook-gpu:v0.2.1">
          <option value="{0}/{1}/tensorflow-1.8.0-notebook-cpu:v0.2.1">
          <option value="{0}/{1}/tensorflow-1.8.0-notebook-gpu:v0.2.1">
        </datalist>
        <br/><br/>

        <label for='cpu_guarantee'>CPU</label>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
        <input name='cpu_guarantee' placeholder='200m, 1.0, 2.5, etc'></input>
        <br/><br/>

        <label for='mem_guarantee'>Memory</label>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
        <input name='mem_guarantee' placeholder='100Mi, 1.5Gi'></input>
        <br/><br/>

        <label for='extra_resource_limits'>Extra Resource Limits</label>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
        <input name='extra_resource_limits' placeholder='{{&quot;nvidia.com/gpu&quot;: 3}}'></input>
        <br/><br/>
        '''.format(registry, repoName)

        def options_from_form(self, formdata):
            options = {}
            options['image'] = formdata.get('image', [''])[0].strip()
            options['cpu_guarantee'] = formdata.get(
                'cpu_guarantee', [''])[0].strip()
            options['mem_guarantee'] = formdata.get(
                'mem_guarantee', [''])[0].strip()
            options['extra_resource_limits'] = formdata.get(
                'extra_resource_limits', [''])[0].strip()
            return options

        @property
        def singleuser_image_spec(self):
            global cloud
            if cloud == 'ack':
                image = 'registry.aliyuncs.com/kubeflow-images-public/tensorflow-notebook-cpu'
            else:
                image = 'gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-cpu:v0.2.1'
            if self.user_options.get('image'):
                image = self.user_options['image']
            return image

        @property
        def cpu_guarantee(self):
            cpu = '500m'
            if self.user_options.get('cpu_guarantee'):
                cpu = self.user_options['cpu_guarantee']
            return cpu

        @property
        def mem_guarantee(self):
            mem = '1Gi'
            if self.user_options.get('mem_guarantee'):
                mem = self.user_options['mem_guarantee']
            return mem

        @property
        def extra_resource_limits(self):
            extra = ''
            if self.user_options.get('extra_resource_limits'):
                extra = json.loads(self.user_options['extra_resource_limits'])
            return extra


    ###################################################
    # JupyterHub Options
    ###################################################
    c.JupyterHub.ip = '0.0.0.0'
    c.JupyterHub.hub_ip = '0.0.0.0'
    # Don't try to cleanup servers on exit - since in general for k8s, we want
    # the hub to be able to restart without losing user containers
    c.JupyterHub.cleanup_servers = False
    ###################################################

    ###################################################
    # Spawner Options
    ###################################################
    cloud = os.environ.get('CLOUD_NAME')
    registry = os.environ.get('REGISTRY')
    repoName = os.environ.get('REPO_NAME')
    c.JupyterHub.spawner_class = KubeFormSpawner
    c.KubeSpawner.singleuser_image_spec = '{0}/{1}/tensorflow-notebook'.format(registry, repoName)

    c.KubeSpawner.cmd = 'start-singleuser.sh'
    c.KubeSpawner.args = ['--allow-root']
    # gpu images are very large ~15GB. need a large timeout.
    c.KubeSpawner.start_timeout = 60 * 30
    # Increase timeout to 5 minutes to avoid HTTP 500 errors on JupyterHub
    c.KubeSpawner.http_timeout = 60 * 5

    # Volume setup
    c.KubeSpawner.singleuser_uid = 1000
    c.KubeSpawner.singleuser_fs_gid = 100
    c.KubeSpawner.singleuser_working_dir = '/home/jovyan'
    volumes = []
    volume_mounts = []
    ###################################################
    # Persistent volume options
    ###################################################
    # Using persistent storage requires a default storage class.
    # TODO(jlewi): Verify this works on minikube.
    # see https://github.com/kubeflow/kubeflow/pull/22#issuecomment-350500944
    pvc_mount = os.environ.get('NOTEBOOK_PVC_MOUNT')
    if pvc_mount and pvc_mount != 'null':
        c.KubeSpawner.user_storage_pvc_ensure = True
        # How much disk space do we want?
        c.KubeSpawner.user_storage_capacity = '10Gi'
        c.KubeSpawner.pvc_name_template = 'claim-{username}{servername}'
        volumes.append(
            {
                'name': 'volume-{username}{servername}',
                'persistentVolumeClaim': {
                    'claimName': 'claim-{username}{servername}'
                }
            }
        )
        volume_mounts.append(
            {
                'mountPath': pvc_mount,
                'name': 'volume-{username}{servername}'
            }
        )

    # ###################################################
    # ### Extra volumes for NVIDIA drivers (Azure)
    # ###################################################
    # # Temporary fix:
    # # AKS / acs-engine doesn't yet use device plugin so we have to mount the drivers to use GPU
    # # TODO(wbuchwalter): Remove once device plugin is merged
    if cloud == 'aks' or cloud == 'acsengine':
        volumes.append({
            'name': 'nvidia',
            'hostPath': {
                'path': '/usr/local/nvidia'
            }
        })
        volume_mounts.append({
            'name': 'nvidia',
            'mountPath': '/usr/local/nvidia'
        })

    c.KubeSpawner.volumes = volumes
    c.KubeSpawner.volume_mounts = volume_mounts

    ######## Authenticator ######
    c.JupyterHub.authenticator_class = 'dummyauthenticator.DummyAuthenticator'
kind: ConfigMap
metadata:
  name: jupyterhub-config
  namespace: default
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: tf-hub
  name: tf-hub-0
  namespace: default
spec:
  clusterIP: None
  ports:
  - name: hub
    port: 8000
  selector:
    app: tf-hub
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    getambassador.io/config: |-
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name: tf-hub-lb-hub-mapping
      prefix: /hub/
      rewrite: /hub/
      timeout_ms: 300000
      service: tf-hub-lb.default
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name: tf-hub-lb-user-mapping
      prefix: /user/
      rewrite: /user/
      timeout_ms: 300000
      service: tf-hub-lb.default
  labels:
    app: tf-hub-lb
  name: tf-hub-lb
  namespace: default
spec:
  ports:
  - name: hub
    port: 80
    targetPort: 8000
  selector:
    app: tf-hub
  type: ClusterIP
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: tf-hub
  namespace: default
spec:
  replicas: 1
  serviceName: ""
  template:
    metadata:
      labels:
        app: tf-hub
    spec:
      containers:
      - command:
        - jupyterhub
        - -f
        - /etc/config/jupyterhub_config.py
        env:
        - name: NOTEBOOK_PVC_MOUNT
          value: /home/jovyan
        - name: CLOUD_NAME
          value: gke
        - name: REGISTRY
          value: gcr.io
        - name: REPO_NAME
          value: kubeflow-images-public
        image: gcr.io/kubeflow/jupyterhub-k8s:v20180531-3bb991b1
        name: tf-hub
        ports:
        - containerPort: 8000
        - containerPort: 8081
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
      serviceAccountName: jupyter-hub
      volumes:
      - configMap:
          name: jupyterhub-config
        name: config-volume
  updateStrategy:
    type: RollingUpdate
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: jupyter-role
  namespace: default
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - persistentvolumeclaims
  verbs:
  - get
  - watch
  - list
  - create
  - delete
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - get
  - watch
  - list
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: jupyter-hub
  name: jupyter-hub
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: jupyter-role
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: jupyter-role
subjects:
- kind: ServiceAccount
  name: jupyter-hub
  namespace: default
---
apiVersion: v1
data:
  controller_config_file.yaml: |-
    {
        "grpcServerFilePath": "/opt/mlkube/grpc_tensorflow_server/grpc_tensorflow_server.py"
    }
kind: ConfigMap
metadata:
  name: tf-job-operator-config
  namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: tf-job-operator
  name: tf-job-operator
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  labels:
    app: tf-job-operator
  name: tf-job-operator
rules:
- apiGroups:
  - tensorflow.org
  - kubeflow.org
  resources:
  - tfjobs
  verbs:
  - '*'
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - '*'
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  verbs:
  - '*'
- apiGroups:
  - batch
  resources:
  - jobs
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - configmaps
  - pods
  - services
  - endpoints
  - persistentvolumeclaims
  - events
  verbs:
  - '*'
- apiGroups:
  - apps
  - extensions
  resources:
  - deployments
  verbs:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  labels:
    app: tf-job-operator
  name: tf-job-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: tf-job-operator
subjects:
- kind: ServiceAccount
  name: tf-job-operator
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  labels:
    app: tf-job-dashboard
  name: tf-job-dashboard
rules:
- apiGroups:
  - tensorflow.org
  - kubeflow.org
  resources:
  - tfjobs
  verbs:
  - '*'
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - '*'
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  verbs:
  - '*'
- apiGroups:
  - batch
  resources:
  - jobs
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - configmaps
  - pods
  - services
  - endpoints
  - persistentvolumeclaims
  - events
  verbs:
  - '*'
- apiGroups:
  - apps
  - extensions
  resources:
  - deployments
  verbs:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  labels:
    app: tf-job-dashboard
  name: tf-job-dashboard
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: tf-job-dashboard
subjects:
- kind: ServiceAccount
  name: tf-job-dashboard
  namespace: default
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    getambassador.io/config: |-
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name: tfjobs-ui-mapping
      prefix: /tfjobs/
      rewrite: /tfjobs/
      service: tf-job-dashboard.default
  name: tf-job-dashboard
  namespace: default
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    name: tf-job-dashboard
  type: ClusterIP
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: tf-job-dashboard
  name: tf-job-dashboard
  namespace: default
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: tf-job-dashboard
  namespace: default
spec:
  template:
    metadata:
      labels:
        name: tf-job-dashboard
    spec:
      containers:
      - command:
        - /opt/tensorflow_k8s/dashboard/backend
        image: gcr.io/kubeflow-images-public/tf_operator:v0.2.0
        name: tf-job-dashboard
        ports:
        - containerPort: 8080
      serviceAccountName: tf-job-dashboard
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: tfjobs.kubeflow.org
spec:
  group: kubeflow.org
  names:
    kind: TFJob
    plural: tfjobs
    singular: tfjob
  validation:
    openAPIV3Schema:
      properties:
        spec:
          properties:
            tfReplicaSpecs:
              properties:
                Chief:
                  properties:
                    replicas:
                      maximum: 1
                      minimum: 1
                      type: integer
                PS:
                  properties:
                    replicas:
                      minimum: 1
                      type: integer
                Worker:
                  properties:
                    replicas:
                      minimum: 1
                      type: integer
  version: v1alpha2
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: tf-job-operator-v1alpha2
  namespace: default
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: tf-job-operator
    spec:
      containers:
      - command:
        - /opt/kubeflow/tf-operator.v2
        - --alsologtostderr
        - -v=1
        env:
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        image: gcr.io/kubeflow-images-public/tf_operator:v0.2.0
        name: tf-job-operator
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
      serviceAccountName: tf-job-operator
      volumes:
      - configMap:
          name: tf-job-operator-config
        name: config-volume
---
apiVersion: v1
kind: Service
metadata:
  labels:
    service: ambassador
  name: ambassador
  namespace: default
spec:
  ports:
  - name: ambassador
    port: 80
    targetPort: 80
  selector:
    service: ambassador
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    service: ambassador-admin
  name: ambassador-admin
  namespace: default
spec:
  ports:
  - name: ambassador-admin
    port: 8877
    targetPort: 8877
  selector:
    service: ambassador
  type: ClusterIP
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: ambassador
  namespace: default
rules:
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - create
  - update
  - patch
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
  - list
  - watch
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ambassador
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: ambassador
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ambassador
subjects:
- kind: ServiceAccount
  name: ambassador
  namespace: default
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ambassador
  namespace: default
spec:
  replicas: 3
  template:
    metadata:
      labels:
        service: ambassador
      namespace: default
    spec:
      containers:
      - env:
        - name: AMBASSADOR_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: AMBASSADOR_SINGLE_NAMESPACE
          value: "true"
        image: quay.io/datawire/ambassador:0.30.1
        livenessProbe:
          httpGet:
            path: /ambassador/v0/check_alive
            port: 8877
          initialDelaySeconds: 30
          periodSeconds: 30
        name: ambassador
        readinessProbe:
          httpGet:
            path: /ambassador/v0/check_ready
            port: 8877
          initialDelaySeconds: 30
          periodSeconds: 30
        resources:
          limits:
            cpu: 1
            memory: 400Mi
          requests:
            cpu: 200m
            memory: 100Mi
      - image: quay.io/datawire/statsd:0.30.1
        name: statsd
      restartPolicy: Always
      serviceAccountName: ambassador
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    getambassador.io/config: |-
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name: k8s-dashboard-ui-mapping
      prefix: /k8s/ui/
      rewrite: /
      tls: true
      service: kubernetes-dashboard.kube-system
  name: k8s-dashboard
  namespace: default
spec:
  ports:
  - port: 443
    targetPort: 8443
  selector:
    k8s-app: kubernetes-dashboard
  type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: centraldashboard
  name: centraldashboard
  namespace: default
spec:
  template:
    metadata:
      labels:
        app: centraldashboard
    spec:
      containers:
      - image: gcr.io/kubeflow-images-public/centraldashboard:v0.2.1
        name: centraldashboard
        ports:
        - containerPort: 8082
      serviceAccountName: centraldashboard
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    getambassador.io/config: |-
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name: centralui-mapping
      prefix: /
      rewrite: /
      service: centraldashboard.default
  labels:
    app: centraldashboard
  name: centraldashboard
  namespace: default
spec:
  ports:
  - port: 80
    targetPort: 8082
  selector:
    app: centraldashboard
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: centraldashboard
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  labels:
    app: centraldashboard
  name: centraldashboard
  namespace: default
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - pods/exec
  - pods/log
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  labels:
    app: centraldashboard
  name: centraldashboard
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: centraldashboard
subjects:
- kind: ServiceAccount
  name: centraldashboard
  namespace: default
---
apiVersion: v1
data:
  kubeflow-version: |
    {
      "Major": "0",
      "Minor": "2",
      "Patch": "devel",
      "GitCommit": "",
      "BuildDate": "",
      "ksonnetVersion": "0.9.2",
    }
kind: ConfigMap
metadata:
  name: kubeflow-version
  namespace: default

@IMBurbank
Copy link
Contributor Author

Just confirming that I'm in the Kubeflow Slack workspace

@IMBurbank
Copy link
Contributor Author

I'm not sure exactly why I had issues using the default-admin clusterrolebinding, but jlewi did help me get around this issue by creating a clusterrolebinding with a non-default name, such as

kubectl create clusterrolebinding kfadmin \ --clusterrole=cluster-admin --user=$(gcloud config get-value account)

Afterward, I got some component errors when trying to apply the kubeflow-core component:
ks apply cloud -c kubeflow-core
such as
ERROR handle object: can't update deployments default.ambassador: Operation cannot be fulfilled on deployments.extensions "ambassador": the object has been modified; please apply your changes to the latest version and try again

But these can be fixed by re-applying the changed component,
ks apply <env-name> [-c <component-name>] [--dry-run] [flags]
in the example above,
ks apply cloud -c ambassador
before applying the kubeflow-core
ks apply cloud -c kubeflow-core

I was informed that the component modification errors will be fixed in ksonnet v0.12.0. ks apply works for now.

Thank you very much for the help!

@yizhexu
Copy link

yizhexu commented Sep 12, 2018

I ran into the exact same issue and your work around worked for me.

Retrospectively I think the reason was:

echo $(gcloud config get-value account) shows my user name as: igracex@gmail.com

But error message says my user name is: IGraceX@gmail.com

The identification of user role was case sensitive had caused the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants