Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.7] Created notebook instance shows "no healthy upstream" on connect #4284

Closed
Ark-kun opened this issue Oct 12, 2019 · 7 comments
Closed
Labels

Comments

@Ark-kun
Copy link
Contributor

Ark-kun commented Oct 12, 2019

/kind bug

What steps did you take and what happened:
Installed KF 0.7 using kfctl with IAP
Created notebook instance in one of two namespaces.
When I click "Connect" I get: "no healthy upstream".
Some other notebook instances (same namespace, same deployment) work fine.

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Oct 12, 2019

/cc @jlewi

@jlewi
Copy link
Contributor

jlewi commented Oct 12, 2019

@Ark-kun can you provide the following information

kubectl -n ${NAMESPACE} notebooks describe ${NOTEBOOK}
kubectl -n ${NAMESPACE} notebooks get -o yaml  ${NOTEBOOK}
kubectl -n ${NAMESPACE} pods describe ${NOTEBOOK-POD}
kubectl -n ${NAMESPACE} pods get -o yaml  ${NOTEBOOK-POD}

Can you also provide the URl that you are hitting in the web browser?

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Oct 17, 2019

kubectl -n ${NAMESPACE} describe notebooks ${NOTEBOOK}

Name:         notebook
Namespace:    avolkov-google-com
Labels:       app=notebook
Annotations:  <none>
API Version:  kubeflow.org/v1beta1
Kind:         Notebook
Metadata:
  Creation Timestamp:  2019-10-11T03:29:35Z
  Generation:          1
  Resource Version:    145976
  Self Link:           /apis/kubeflow.org/v1beta1/namespaces/avolkov-google-com/notebooks/notebook
  UID:                 5934c474-ebd7-11e9-a983-42010a800202
Spec:
  Template:
    Spec:
      Containers:
        Env:
        Image:  gcr.io/kubeflow-images-public/tensorflow-2.0.0a-notebook-cpu:v-base-ef41372-1177829795472347138
        Name:   notebook
        Resources:
          Limits:
          Requests:
            Cpu:     0.5
            Memory:  1.0Gi
        Volume Mounts:
          Mount Path:              /home/jovyan
          Name:                    workspace-notebook
          Mount Path:              /dev/shm
          Name:                    dshm
      Service Account Name:        default-editor
      Ttl Seconds After Finished:  300
      Volumes:
        Name:  workspace-notebook
        Persistent Volume Claim:
          Claim Name:  workspace-notebook
        Empty Dir:
          Medium:  Memory
        Name:      dshm
Status:
  Conditions:
    Last Probe Time:  2019-10-11T03:29:54Z
    Type:             Running
    Last Probe Time:  2019-10-11T03:29:41Z
    Reason:           PodInitializing
    Type:             Waiting
  Container State:
    Running:
      Started At:  2019-10-11T03:29:53Z
  Ready Replicas:  0
Events:            <none>

kubectl -n ${NAMESPACE} get notebooks -o yaml ${NOTEBOOK}

apiVersion: kubeflow.org/v1beta1
kind: Notebook
metadata:
  creationTimestamp: 2019-10-11T03:29:35Z
  generation: 1
  labels:
    app: notebook
  name: notebook
  namespace: avolkov-google-com
  resourceVersion: "145976"
  selfLink: /apis/kubeflow.org/v1beta1/namespaces/avolkov-google-com/notebooks/notebook
  uid: 5934c474-ebd7-11e9-a983-42010a800202
spec:
  template:
    spec:
      containers:
      - env: []
        image: gcr.io/kubeflow-images-public/tensorflow-2.0.0a-notebook-cpu:v-base-ef41372-1177829795472347138
        name: notebook
        resources:
          limits: {}
          requests:
            cpu: "0.5"
            memory: 1.0Gi
        volumeMounts:
        - mountPath: /home/jovyan
          name: workspace-notebook
        - mountPath: /dev/shm
          name: dshm
      serviceAccountName: default-editor
      ttlSecondsAfterFinished: 300
      volumes:
      - name: workspace-notebook
        persistentVolumeClaim:
          claimName: workspace-notebook
      - emptyDir:
          medium: Memory
        name: dshm
status:
  conditions:
  - lastProbeTime: 2019-10-11T03:29:54Z
    type: Running
  - lastProbeTime: 2019-10-11T03:29:41Z
    reason: PodInitializing
    type: Waiting
  containerState:
    running:
      startedAt: 2019-10-11T03:29:53Z
  readyReplicas: 0

kubectl -n ${NAMESPACE} describe pods ${NOTEBOOK_POD}

Name:               notebook-0
Namespace:          avolkov-google-com
Priority:           0
PriorityClassName:  <none>
Node:               gke-kf07-kf07-cpu-pool-v1-819cb147-fc0g/10.128.0.37
Start Time:         Thu, 10 Oct 2019 20:29:41 -0700
Labels:             app=notebook
                    controller-revision-hash=notebook-885488c48
                    notebook-name=notebook
                    statefulset=notebook
                    statefulset.kubernetes.io/pod-name=notebook-0
Annotations:        sidecar.istio.io/status={"version":"5f3ae3613c7945ef767cb9fd594596bc001ff3ab915f12e4379c0cb5648d2729","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs...
Status:             Pending
IP:                 10.32.1.39
Controlled By:      StatefulSet/notebook
Init Containers:
  istio-init:
    Container ID:  docker://29f61ed40a015bf62248f6b4e730a1a0d898c2452878bfdd867190bc379fa875
    Image:         docker.io/istio/proxy_init:1.1.6
    Image ID:      docker-pullable://istio/proxy_init@sha256:54d89fb2b3b0a2365f2d2b0a8862f1f8320a63ab6a09c637c60f13f6021c4609
    Port:          <none>
    Host Port:     <none>
    Args:
      -p
      15001
      -u
      1337
      -m
      REDIRECT
      -i
      *
      -x
      
      -b
      8888
      -d
      15020
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 10 Oct 2019 20:29:51 -0700
      Finished:     Thu, 10 Oct 2019 20:29:52 -0700
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:        10m
      memory:     10Mi
    Environment:  <none>
    Mounts:       <none>
Containers:
  notebook:
    Container ID:   
    Image:          gcr.io/kubeflow-images-public/tensorflow-2.0.0a-notebook-cpu:v-base-ef41372-1177829795472347138
    Image ID:       
    Port:           8888/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ErrImagePull
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     500m
      memory:  1Gi
    Environment:
      NB_PREFIX:  /notebook/avolkov-google-com/notebook
    Mounts:
      /dev/shm from dshm (rw)
      /home/jovyan from workspace-notebook (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-editor-token-ggps2 (ro)
  istio-proxy:
    Container ID:  docker://833a22f68984c1e6ef5dc54bd31bd7176dbeba1a7d67e5152b12633c96d33531
    Image:         docker.io/istio/proxyv2:1.1.6
    Image ID:      docker-pullable://istio/proxyv2@sha256:e7ee1ad38bd5b556ad0527ac691a9f647b66835960417b154c5d28b2ed9219cb
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      --configPath
      /etc/istio/proxy
      --binaryPath
      /usr/local/bin/envoy
      --serviceCluster
      notebook.$(POD_NAMESPACE)
      --drainDuration
      45s
      --parentShutdownDuration
      1m0s
      --discoveryAddress
      istio-pilot.istio-system:15010
      --zipkinAddress
      zipkin.istio-system:9411
      --connectTimeout
      10s
      --proxyAdminPort
      15000
      --concurrency
      2
      --controlPlaneAuthPolicy
      NONE
      --statusPort
      15020
      --applicationPorts
      8888
    State:          Running
      Started:      Thu, 10 Oct 2019 20:29:53 -0700
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  128Mi
    Requests:
      cpu:      10m
      memory:   40Mi
    Readiness:  http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
    Environment:
      POD_NAME:                      notebook-0 (v1:metadata.name)
      POD_NAMESPACE:                 avolkov-google-com (v1:metadata.namespace)
      INSTANCE_IP:                    (v1:status.podIP)
      ISTIO_META_POD_NAME:           notebook-0 (v1:metadata.name)
      ISTIO_META_CONFIG_NAMESPACE:   avolkov-google-com (v1:metadata.namespace)
      ISTIO_META_INTERCEPTION_MODE:  REDIRECT
      ISTIO_METAJSON_LABELS:         {"app":"notebook","controller-revision-hash":"notebook-885488c48","notebook-name":"notebook","statefulset":"notebook","statefulset.kubernetes.io/pod-name":"notebook-0"}

    Mounts:
      /etc/certs/ from istio-certs (ro)
      /etc/istio/proxy from istio-envoy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-editor-token-ggps2 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  workspace-notebook:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  workspace-notebook
    ReadOnly:   false
  dshm:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  Memory
  default-editor-token-ggps2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-editor-token-ggps2
    Optional:    false
  istio-envoy:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  Memory
  istio-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  istio.default-editor
    Optional:    true
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                   From                                              Message
  ----     ------   ----                  ----                                              -------
  Warning  Failed   16m (x37178 over 5d)  kubelet, gke-kf07-kf07-cpu-pool-v1-819cb147-fc0g  Error: ImagePullBackOff
  Normal   BackOff  1m (x37245 over 5d)   kubelet, gke-kf07-kf07-cpu-pool-v1-819cb147-fc0g  Back-off pulling image "gcr.io/kubeflow-images-public/tensorflow-2.0.0a-notebook-cpu:v-base-ef41372-1177829795472347138"

kubectl -n ${NAMESPACE} get pods ${NOTEBOOK_POD}

NAME         READY     STATUS             RESTARTS   AGE
notebook-0   1/2       ImagePullBackOff   0          5d20h

@shanidchamal
Copy link

@Ark-kun seems like the docker image gcr.io/kubeflow-images-public/tensorflow-2.0.0a-notebook-cpu:v-base-ef41372-1177829795472347138 is not present in your cluster and it is not getting pulled as you can see the ImagePullBackoff error

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Oct 21, 2019

BTW, I just chose one of the standard images offered by Kubeflow.

@jlewi jlewi added this to To Do in Needs Triage Oct 22, 2019
@jlewi
Copy link
Contributor

jlewi commented Oct 23, 2019

Duplicate of #4267

@jlewi jlewi marked this as a duplicate of #4267 Oct 23, 2019
@jlewi jlewi closed this as completed Oct 23, 2019
Needs Triage automation moved this from To Do to Closed Oct 23, 2019
@jlewi jlewi removed this from Closed in Needs Triage Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants