Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant start jupyter-notebook pod in kubeflow version 0.2.1 #1221

Closed
yeya24 opened this issue Jul 16, 2018 · 3 comments
Closed

Cant start jupyter-notebook pod in kubeflow version 0.2.1 #1221

yeya24 opened this issue Jul 16, 2018 · 3 comments
Labels
area/jupyter Issues related to Jupyter kind/question

Comments

@yeya24
Copy link
Contributor

yeya24 commented Jul 16, 2018

Hi guys.
I installed version0.2.1 both on my local ubuntu machine and on the ubuntu in the aws. And I met the same problems that i cant startserver in the jupyter-hub page.

Here is some output

sudo kubectl get po -n kubeflow
NAME                                        READY     STATUS             RESTARTS   AGE
ambassador-849fb9c8c5-2rgf8                 2/2       Running            5          1h
ambassador-849fb9c8c5-brclh                 2/2       Running            5          1h
ambassador-849fb9c8c5-jhwpd                 2/2       Running            5          1h
centraldashboard-7d7744cccb-lbd9c           1/1       Running            0          1h
jupyter-admin                               0/1       CrashLoopBackOff   8          22m
spartakus-volunteer-68556f9b7d-pttlg        1/1       Running            2          1h
tf-hub-0                                    1/1       Running            0          1h
tf-job-dashboard-64fc6f5849-x5kss           1/1       Running            0          1h
tf-job-operator-v1alpha2-756cf9cb97-ddwx5   1/1       Running            0          1h

Then I looked the jupyter-admin pod . The ubuntu machine satisfie the request of 1 CPU 1G memory and 1 gpu. But can't start the pod

ubuntu@ip-172-31-3-138:~$ sudo kubectl describe pod jupyter-admin -n kubeflow
Name:         jupyter-admin
Namespace:    kubeflow
Node:         ip-172-31-3-138/172.31.3.138
Start Time:   Mon, 16 Jul 2018 18:26:38 +0000
Labels:       app=jupyterhub
              component=singleuser-server
              heritage=jupyterhub
Annotations:  cni.projectcalico.org/podIP=192.168.0.55/32
              hub.jupyter.org/username=admin
Status:       Running
IP:           192.168.0.55
Containers:
  notebook:
    Container ID:  docker://b2275e7af544c663dc612497c9254ce27ed01658983faa0d4c807c14e5cb1ac4
    Image:         gcr.io/kubeflow-images-public/tensorflow-1.7.0-notebook-gpu:v20180419-0ad94c4e
    Image ID:      docker-pullable://gcr.io/kubeflow-images-public/tensorflow-1.7.0-notebook-gpu@sha256:33b13e3de4a53854d8c52f172d58ef554e96a46e7e8d65cb27eaa33c3ca4a002
    Port:          8888/TCP
    Host Port:     0/TCP
    Args:
      start-singleuser.sh
      --ip="0.0.0.0"
      --port=8888
      --allow-root
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    127
      Started:      Mon, 16 Jul 2018 18:26:55 +0000
      Finished:     Mon, 16 Jul 2018 18:26:55 +0000
    Ready:          False
    Restart Count:  2
    Limits:
      nvidia.com/gpu:  1
    Requests:
      cpu:             1
      memory:          1Gi
      nvidia.com/gpu:  1
    Environment:
      JUPYTERHUB_API_TOKEN:           b16de297013b4b329da9dbcd40b37a22
      JPY_API_TOKEN:                  b16de297013b4b329da9dbcd40b37a22
      JUPYTERHUB_CLIENT_ID:           jupyterhub-user-admin
      JUPYTERHUB_HOST:                
      JUPYTERHUB_OAUTH_CALLBACK_URL:  /user/admin/oauth_callback
      JUPYTERHUB_USER:                admin
      JUPYTERHUB_API_URL:             http://tf-hub-0:8081/hub/api
      JUPYTERHUB_BASE_URL:            /
      JUPYTERHUB_SERVICE_PREFIX:      /user/admin/
      MEM_GUARANTEE:                  1.0Gi
      CPU_GUARANTEE:                  1
    Mounts:
      /home/jovyan from volume-admin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from jupyter-notebook-token-sn4p7 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  volume-admin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  claim-admin
    ReadOnly:   false
  jupyter-notebook-token-sn4p7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  jupyter-notebook-token-sn4p7
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age                From                      Message
  ----     ------                 ----               ----                      -------
  Warning  FailedScheduling       37s (x2 over 37s)  default-scheduler         pod has unbound PersistentVolumeClaims
  Normal   Scheduled              36s                default-scheduler         Successfully assigned jupyter-admin to ip-172-31-3-138
  Normal   SuccessfulMountVolume  36s                kubelet, ip-172-31-3-138  MountVolume.SetUp succeeded for volume "jupyter-notebook-token-sn4p7"
  Normal   SuccessfulMountVolume  36s                kubelet, ip-172-31-3-138  MountVolume.SetUp succeeded for volume "nfs-pv"
  Normal   Pulled                 19s (x3 over 36s)  kubelet, ip-172-31-3-138  Container image "gcr.io/kubeflow-images-public/tensorflow-1.7.0-notebook-gpu:v20180419-0ad94c4e" already present on machine
  Normal   Created                19s (x3 over 36s)  kubelet, ip-172-31-3-138  Created container
  Normal   Started                19s (x3 over 35s)  kubelet, ip-172-31-3-138  Started container
  Warning  BackOff                5s (x4 over 34s)   kubelet, ip-172-31-3-138  Back-off restarting failed container

Here are more output

ubuntu@ip-172-31-3-138:~$ sudo kubectl exec -n kubeflow -it jupyter-admin  sh 
error: unable to upgrade connection: container not found ("notebook")
ubuntu@ip-172-31-3-138:~$ sudo kubectl logs   jupyter-admin -p -n kubeflow
Execute the command
/usr/local/bin/start.sh: line 62: exec: jupyterhub-singleuser: not found

I'm sure the above it's the problem. But I dont't know how to fix it.

My docker version is 18.03.1-ce and my kubernetes is 1.10.2

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:30 2018
  OS/Arch:      linux/amd64
  Experimental: false

Thank you very much!

@jlewi
Copy link
Contributor

jlewi commented Jul 16, 2018

You're jupyter images are too old.
The image

gcr.io/kubeflow-images-public/tensorflow-1.7.0-notebook-gpu:v20180419-0ad94c4

Is not compatible with the latest changes to JupyterHub.

Try

gcr.io/kubeflow-images-public/tensorflow-1.7.0-notebook-gpu:v0.2.1

@jlewi jlewi added kind/question area/jupyter Issues related to Jupyter labels Jul 16, 2018
@yeya24
Copy link
Contributor Author

yeya24 commented Jul 17, 2018

I all update my tensorflow image to v0.2.1 but there is still an error

Events:
  Type     Reason                 Age              From               Message
  ----     ------                 ----             ----               -------
  Warning  FailedScheduling       5m (x7 over 6m)  default-scheduler  pod has unbound PersistentVolumeClaims
  Normal   Scheduled              4m               default-scheduler  Successfully assigned jupyter-ben to ubuntu
  Normal   SuccessfulMountVolume  4m               kubelet, ubuntu    MountVolume.SetUp succeeded for volume "jupyter-notebook-token-t6tdc"
  Normal   SuccessfulMountVolume  4m               kubelet, ubuntu    MountVolume.SetUp succeeded for volume "nfs-pv"
  Normal   Pulled                 3m (x5 over 4m)  kubelet, ubuntu    Container image "gcr.io/kubeflow-images-public/tensorflow-1.8.0-notebook-gpu:v0.2.1" already present on machine
  Normal   Created                3m (x5 over 4m)  kubelet, ubuntu    Created container
  Normal   Started                3m (x5 over 4m)  kubelet, ubuntu    Started container
  Warning  BackOff                3m (x8 over 4m)  kubelet, ubuntu    Back-off restarting failed container

and more output

ahb@ubuntu:~$ sudo kubectl logs   jupyter-ben -p -n kubeflow
checking if /home/jovyan volume needs init...
...creating /home/jovyan/work
mkdir: cannot create directory ‘/home/jovyan/work’: Permission denied

How to fix it. I change the enviroment to my local machine but the environ is the same to that one in aws. Both kubernetes 1.10.2.

@jlewi
Copy link
Contributor

jlewi commented Jul 17, 2018

That line comes from here
https://github.com/kubeflow/kubeflow/blob/master/components/tensorflow-notebook-image/pvc-check.sh#L30

By default we mount a PV for your default storage class at /home/jovyan and that seems like its causing permissions problems in your case.

I would try the following

ks param set kubeflow-core jupyterNotebookPVCMount /home/jovyan/work
ks apply default -c kubeflow-core

If ks apply gives an error about resource being modified (see #980) then just rerun ks apply until it succeeds.

Then try logging into JupyterHub again and recreating your pod.

@yeya24 yeya24 closed this as completed Jul 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/jupyter Issues related to Jupyter kind/question
Projects
None yet
Development

No branches or pull requests

2 participants