New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jupyter-role error applying kubeflow-core component with ksonnet #1353
Comments
I have now tried this from scratch using the following version combinations and received the same error: Versions: Versions (default versions from tutorial): |
Digging through this repo as best I could, I figured that maybe the problem was that I needed to use KSonnet 0.11.0 instead of v0.9.2 as in the tutorial or the v0.12.0 current release. So... I started from scratch and tried this tutorial again using KS_VER=ks_0.11.0_linux_amd64 This time when I tried to apply kubeflow-core, it froze for over a minute before the process was killed. When I tried again, the same jupyter-role error came back.
Versions: I am at an absolute loss, and all my machine learning work is for naught until I can actually deploy to a production environment. Please help. |
See here: You need to create an RBAC role to grant the user running the deployment sufficient permission to create the resources. |
Hi, Thank you for the response. I tried gleaning everything I could from the troubleshooting section. When I read that section, the only suggestion I saw was making sure to create an appropriate clusterrole binding using the command:
Every time I try again from scratch, I've used the command:
to include my user account. When you say that I need to create an RBAC role, are you referring to a step other than the one I pasted above? I read through the GCloud RBAC section but didn't make it any further in understanding how to resolve this RBAC issue. It seems that from a clean slate, there isn't anything else that I'm supposed to do. So, I'm obviously missing something. And yet, I'm at a complete loss... |
When you run deploy.sh this should create a bunch of directories e.g At this point your GKE cluster should exist. And you have your Kubeflow ksonnet app in ${DEPLOYMENT}_ks_app. At this point if you're having trouble deploying the ksonnet APP due to the RBAC issue you should create an RBAC binding as mentioned above and then update the ksonnet app. e.g.
If you look at the original RBAC error you can see the user it is using
So make sure you are creating an RBAC role for that account. |
I don't run a deploy.sh script in this tutorial (that I know of), and don't understand how it would be incorporated. That user account is the correct one, and I'm running the command However, the I still don't fully understand if you're implying that I should be doing something more than When I tried Output for this attempt:
If there are any other working tutorials I could complete from my default GC platform, I'll abandon this particular one in a second. If there's something important that I'm not or mis-reading, let me know and I'll happily start again from the beginning. For now, I'll nuke my Google shell environment and try a few more combinations of app versions and setup scripts (as often as I can within GitHub API rate-limits) to try to get a basic working environment. |
You're getting an RBAC error. The way to fix this is to fix the RBAC error by creating an appropriate cluster binding. We need to figure out why you don't have the appropriate RBAC permissions to create the JupyterHub role. The output shows your attempt to create the cluster role binding failed
I would suggest running the command
To see whether the clusterrolebinding is actually bound to user Can you also run the following command and paste the output into the issue.
|
I believe I got that error because this was the first time I hadn't started from a fresh shell environment and kubernetes cluster. I had previously executed the command to set my user as the default-admin. My previous cloud shell session timed out. I reconnected and here's the config:
And the clusterrolebinding output:
|
That looks correct. Do you still have your ksonnet application somewhere? i.e. the directory
Lets confirm that ksonnet is pointing to the correct K8s cluster; what is the output of?
and what is the output of
Are they both using the same Kubernetes master? Can you also run
and share the output. |
BTW if you join slack it might be easier to debug this in chat |
I found the link to get a slack invite in the Kubeflow docs Community section and entered my email. I apologize for my ignorance. I'll join as soon as the invite comes through. In the meantime, here's the output:
|
Just confirming that I'm in the Kubeflow Slack workspace |
I'm not sure exactly why I had issues using the default-admin clusterrolebinding, but jlewi did help me get around this issue by creating a clusterrolebinding with a non-default name, such as
Afterward, I got some component errors when trying to apply the kubeflow-core component: But these can be fixed by re-applying the changed component, I was informed that the component modification errors will be fixed in ksonnet v0.12.0. Thank you very much for the help! |
I ran into the exact same issue and your work around worked for me. Retrospectively I think the reason was:
But error message says my user name is: IGraceX@gmail.com The identification of user role was case sensitive had caused the issue |
Hi,
I've been trying different Kubeflow tutorials for over a week, just trying to get anything working so I can upgrade the data pipeline and model from there.
I'm currently trying this tutorial on Google Cloud and keep getting the following error:
All issues I've found here recommend the following commands:
gcloud container clusters get-credentials $KUBENAME --zone $KUBEZONE
kubectl create clusterrolebinding default-admin \ --clusterrole=cluster-admin --user=$(gcloud config get-value account)
But these commands are included in the tutorial and haven't helped me. I've wiped out this project and retried several times. I'm at a loss...
My only divergences are trying newer versions of the packages, since the versions in the tutorial seemed quite old.
Versions I used:
Kubernetes: (GKE Default) v1.9.7
Ksonnet: ks_0.12.0_linux_amd64
Kubeflow-core: v0.2.2
Kubeflow-TFserving: v0.2.2
TF-Job: NONE (appears to now be in Kubeflow-core)
Any help regarding this issue would be deeply appreciated!
The text was updated successfully, but these errors were encountered: