Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeflow installation on macos m1 with rancher desktop #2416

Closed
yecohn opened this issue Mar 23, 2023 · 7 comments
Closed

kubeflow installation on macos m1 with rancher desktop #2416

yecohn opened this issue Mar 23, 2023 · 7 comments

Comments

@yecohn
Copy link

yecohn commented Mar 23, 2023

I am trying to install kubeflow from branch master from manifests, using the command

while ! kustomize build example | awk '!/well-defined/' | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

I am using kubernetes 1.24 from rancher desktop:

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-08-17T18:54:23Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.11+k3s1", GitCommit:"c14436a9ecfffb3be553a06bb0a4fac6122579ce", GitTreeState:"clean", BuildDate:"2023-03-10T21:47:44Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/arm64"}

and kustomize 5.0.0.1.

During the deployement I obtain an error:


NAMESPACE          NAME                                                     READY   STATUS             RESTARTS        AGE
kube-system        helm-install-traefik-crd-hs4r2                           0/1     Completed          0               43m
kube-system        helm-install-traefik-s2c7l                               0/1     Completed          2               43m
kube-system        traefik-64b96ccbcd-tjdz9                                 1/1     Running            1 (5m38s ago)   43m
auth               dex-8579644bbb-p5kc7                                     1/1     Running            1 (5m38s ago)   36m
istio-system       istiod-586fcd6677-nsfvh                                  1/1     Running            1 (5m38s ago)   38m
cert-manager       cert-manager-cainjector-d5dc6cd7f-qrjtt                  1/1     Running            1 (9m11s ago)   37m
kubeflow           metadata-envoy-deployment-76c587bd47-dpxv2               1/1     Running            1 (5m38s ago)   13m
kube-system        local-path-provisioner-687d6d7765-gqnlg                  1/1     Running            1 (5m38s ago)   43m
kube-system        svclb-traefik-1503cd1b-w69sd                             2/2     Running            2 (5m38s ago)   43m
kubeflow           kubeflow-pipelines-profile-controller-5dd5468d9b-nxv99   1/1     Running            0               13m
kube-system        coredns-7b5bbc6644-xd9xp                                 1/1     Running            1 (5m38s ago)   43m
kubeflow           kserve-controller-manager-7879bf6dd7-29bdj               2/2     Running            2 (5m38s ago)   13m
knative-eventing   eventing-controller-5b7bfc8895-vzb4x                     1/1     Running            1 (5m38s ago)   13m
knative-eventing   eventing-webhook-5896d776b-l4xb4                         1/1     Running            1 (5m38s ago)   13m
kubeflow           katib-controller-86d4d45478-pstv7                        1/1     Running            1 (5m38s ago)   13m
cert-manager       cert-manager-7475574-2w29b                               1/1     Running            1 (9m11s ago)   37m
cert-manager       cert-manager-webhook-6868bd8b7-lbvrx                     1/1     Running            1 (5m38s ago)   37m
kube-system        metrics-server-667586758d-59g4s                          1/1     Running            1 (5m38s ago)   43m
kubeflow           katib-db-manager-689cdf95c6-v7jl8                        1/1     Running            1 (7m47s ago)   13m
kubeflow           metacontroller-0                                         1/1     Running            0               12m
kubeflow           cache-server-86584db5d8-fvzq5                            2/2     Running            0               13m
kubeflow           ml-pipeline-persistenceagent-75bccd8b64-n2gfl            2/2     Running            0               13m
knative-serving    net-istio-webhook-6858cd8998-mznfm                       2/2     Running            5 (4m47s ago)   13m
istio-system       cluster-local-gateway-757849494c-cqv88                   1/1     Running            1 (5m38s ago)   13m
istio-system       authservice-0                                            1/1     Running            0               13m
istio-system       istio-ingressgateway-cf7bd56f-9lvmg                      1/1     Running            1 (5m38s ago)   38m
kubeflow           minio-6d6d45469f-8f7qt                                   2/2     Running            1 (5m38s ago)   13m
knative-serving    controller-657b7bb75c-gjxkm                              2/2     Running            4 (4m32s ago)   13m
knative-serving    webhook-76f9bc6584-kzm74                                 2/2     Running            5 (4m39s ago)   13m
knative-serving    domainmapping-webhook-f76bcd89f-qdzg7                    2/2     Running            5 (4m28s ago)   13m
knative-serving    domain-mapping-6c4878cc54-zvwz6                          2/2     Running            5 (4m26s ago)   13m
knative-serving    net-istio-controller-6cb499fccb-g7dvk                    2/2     Running            4 (4m33s ago)   13m
kubeflow           workflow-controller-78c979dc75-gl46c                     2/2     Running            4 (4m29s ago)   13m
kubeflow           katib-mysql-5bc98798b4-v5tbv                             1/1     Running            1               13m
kubeflow           ml-pipeline-scheduledworkflow-6dfcd5dd89-m4lmd           2/2     Running            1 (5m38s ago)   13m
kubeflow           ml-pipeline-viewer-crd-86cbc45d9b-8rrg8                  2/2     Running            4 (4m23s ago)   13m
knative-serving    autoscaler-5cc8b77f4d-ztbzd                              2/2     Running            3 (4m8s ago)    13m
knative-serving    activator-5bbf976855-979ch                               2/2     Running            3 (4m9s ago)    13m
kubeflow           katib-ui-b5d5cf978-djvs5                                 2/2     Running            5 (4m20s ago)   13m
kubeflow           mysql-6878bbff69-pzq2p                                   2/2     Running            0               13m
kubeflow           training-operator-7f768bbbdb-9cp57                       1/1     Running            2 (3m54s ago)   13m
kubeflow           metadata-writer-6c576c94b8-d7dhl                         2/2     Running            2 (3m8s ago)    13m
kubeflow           ml-pipeline-77d4d9974b-vx5sz                             2/2     Running            3 (2m12s ago)   13m
kubeflow           metadata-grpc-deployment-5c8599b99c-zp7qk                2/2     Running            5 (117s ago)    13m
kubeflow           ml-pipeline-visualizationserver-5577c64b45-d2v4b         2/2     Running            0               13m
kubeflow           admission-webhook-deployment-cb6db9648-78rtl             0/1     ImagePullBackOff   0               13m
kubeflow           kserve-models-web-app-f9c576856-88qdc                    1/2     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           centraldashboard-dd9c778b6-78snk                         1/2     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           ml-pipeline-ui-5ddb5b76d8-89hdf                          2/2     Running            7 (2m15s ago)   13m
kubeflow           jupyter-web-app-deployment-cc9cbc696-bvb48               1/2     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           volumes-web-app-deployment-7b998df674-765sm              1/2     ImagePullBackOff   0               13m
kubeflow           tensorboards-web-app-deployment-8474fd9569-4xnst         1/2     ImagePullBackOff   0               13m
kubeflow           notebook-controller-deployment-699589b4f9-bb6fd          1/2     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           profiles-deployment-74f656c59f-qbzlz                     1/3     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           tensorboard-controller-deployment-5655cc9dbb-5mvfg       2/3     ImagePullBackOff   0               13m

When I inspect the problematic pods: (for example tensorboard-controller-deployment-5655cc9dbb-5mvfg ) I obtained:

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  15m                    default-scheduler  0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Warning  FailedScheduling  8m10s                  default-scheduler  0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Normal   Scheduled         7m57s                  default-scheduler  Successfully assigned kubeflow/tensorboard-controller-deployment-5655cc9dbb-5mvfg to lima-rancher-desktop
  Warning  FailedScheduling  16m                    default-scheduler  0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Normal   Created           7m43s                  kubelet            Created container istio-init
  Normal   Pulled            7m43s                  kubelet            Container image "docker.io/istio/proxyv2:1.16.0" already present on machine
  Normal   Started           7m42s                  kubelet            Started container istio-init
  Normal   Pulling           7m28s                  kubelet            Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0"
  Normal   Pulled            7m8s                   kubelet            Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" in 19.515117093s
  Normal   Created           7m7s                   kubelet            Created container kube-rbac-proxy
  Normal   Created           7m6s                   kubelet            Created container istio-proxy
  Normal   Pulled            7m6s                   kubelet            Container image "docker.io/istio/proxyv2:1.16.0" already present on machine
  Normal   Started           7m6s                   kubelet            Started container kube-rbac-proxy
  Normal   Started           7m5s                   kubelet            Started container istio-proxy
  Warning  Unhealthy         7m2s (x2 over 7m3s)    kubelet            Readiness probe failed: Get "http://10.42.0.104:15021/healthz/ready": dial tcp 10.42.0.104:15021: connect: connection refused
  Normal   Pulling           6m20s (x3 over 7m39s)  kubelet            Pulling image "docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0"
  Warning  Failed            6m8s (x2 over 7m28s)   kubelet            Failed to pull image **"docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0": no match for platform in manifest: not found**
  Warning  Failed            6m8s (x2 over 7m28s)   kubelet            Error: ErrImagePull
  Warning  Failed            5m55s (x3 over 6m47s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff           81s (x20 over 6m47s)   kubelet            Back-off pulling image "docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc

It looks like the docker image registry is not found.
Any idea how should I proceed ?

@kubeflow-bot kubeflow-bot added this to To Do in Needs Triage Mar 23, 2023
@annajung
Copy link
Member

It might be that there isn't enough resources to fulfill the workload. You might want to describe the node and see and try setting up your cluster with more CPU / memory based on what you see from the node description

As for image debugging, you can try docker pull kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0 to determine if you can access the image

@e-compagno
Copy link

e-compagno commented Apr 13, 2023

If you have an M2 it could be that the image is not available. Check also here

@koseoyoung
Copy link

I'm using M1 and encountering a similar issue.
docker pull kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0 This is not working without --platform flag.

Does the image support the arm64 arch?
What can I do with the current kubeflow containers and images configuration to enable this to work?

Thanks!

@koseoyoung
Copy link

^ updated: I don't think the images support arm64 arch (e.g., https://hub.docker.com/r/kubeflownotebookswg/tensorboard-controller/tags)

Then all kubeflow components can not be deployed to the M1 machine if my understanding is correct, right? Are there any other ways to deploy these components in the M1 machine?

@juliusvonkohout
Copy link
Member

@koseoyoung @yecohn we are working on the arm support without any timeline, but feel free to help and contribute. You can reach out to @kimwnasptd and join the notebooks wg meeting.

@juliusvonkohout
Copy link
Member

duplicate of #2472

/close

@google-oss-prow
Copy link

@juliusvonkohout: Closing this issue.

In response to this:

duplicate of #2472

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Needs Triage automation moved this from To Do to Closed Aug 25, 2023
@kubeflow-bot kubeflow-bot removed this from Closed in Needs Triage Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants