Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli script fails on setup #128

Closed
camille-rodriguez opened this issue Nov 4, 2019 · 6 comments · Fixed by #131
Closed

cli script fails on setup #128

camille-rodriguez opened this issue Nov 4, 2019 · 6 comments · Fixed by #131

Comments

@camille-rodriguez
Copy link

Hello everyone,

I am having issues with the script cli.py. I am deploying on AWS.
I launched python3 scripts/cli.py ck setup --controller ckkf and it ran successfully until it hit this:

DEBUG:root:containerd is lead by containerd/0
DEBUG:root:easyrsa is lead by easyrsa/0
DEBUG:root:etcd is lead by etcd/0
+ juju kubectl apply -f storage/aws-ebs.yml
storageclass.storage.k8s.io/k8s-ebs created
+ juju scp -m ckkf:default kubernetes-master/0:~/config /tmp/tmp0rh5mvbs
+ juju add-k8s ckkf -c ckkf --region=aws/us-east-1 --storage juju-operator-storage

ERROR storage class "juju-operator-storage" not found
Command '('juju', 'add-k8s', 'ckkf', '-c', 'ckkf', '--region=aws/us-east-1', '--storage', 'juju-operator-storage')' returned non-zero exit status 1.

I retried it on my own after and the command worked.

$ juju add-k8s ckkf -c ckkf --region=aws/us-east-1 --storage juju-operator-storage

k8s substrate "ec2/us-east-1" added as cloud "ckkf" with EBS Volume default storage provisioned
by the new "juju-operator-storage" storage class.```

From the juju status, my CDK deployment looks all good, so I'm running the following script to deploy kubeflow on it. It hits another error. Is it because the previous script didn't finish and I have to re-deploy everything?

$ python3 scripts/cli.py deploy-to ckkf --cloud ckkf
Enter a password to set for the Kubeflow dashboard:
Repeat for confirmation:

  • juju add-model kubeflow ckkf
    Added 'kubeflow' model on ckkf/us-east-1 with credential 'ckkf' for user 'admin'
  • juju kubectl apply -f resources/katib-configmap.yaml
    Error from server (NotFound): error when creating "resources/katib-configmap.yaml": namespaces "kubeflow" not found
    Error: SubcommandError("kubectl --kubeconfig /tmp/.tmp1ss8AK apply -f resources/katib-configmap.yaml -n kubeflow", "exit code: 1")
    Command '('juju', 'kubectl', 'apply', '-f', 'resources/katib-configmap.yaml')' returned non-zero exit status 1.

I see how I can work around this (create the namespace myself), but it would be good to investigate and fix the issue in the script too.
@camille-rodriguez
Copy link
Author

I believe there is a command missing kubectl create namespace kubeflow here https://github.com/juju-solutions/bundle-kubeflow/blob/3cf431b37a2f89b87830e46041aa58357f1e95c9/scripts/cli.py#L269

@camille-rodriguez
Copy link
Author

I created the namespace manually and re-launched the kubeflow deployment. Now there is an error with the katib-controller regarding ephemeral storage.
image

@camille-rodriguez
Copy link
Author

I opened a separate bug for the ambassador-auth issue

@knkski
Copy link
Contributor

knkski commented Nov 5, 2019

@camille-rodriguez: You shouldn't have to create the namespace, Juju should do that itself when you create a model. As far as the storage error you're seeing, I assume you're using Juju 2.7-rc1, which recently changed something to cause that error. Can you try removing these lines from cli.py, and rerunning the deploy?

https://github.com/juju-solutions/bundle-kubeflow/blob/c1073ad/scripts/cli.py#L399-L400

@camille-rodriguez
Copy link
Author

@knkski I removed those two lines and CDK deployed successfully on AWS! And kubeflow on top of it. Thanks!

@camille-rodriguez
Copy link
Author

And yes I am using juju 2.7 to avoid this bug #125

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants