Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README to reflect install experience #617

Merged
merged 1 commit into from May 21, 2018

Conversation

ykevinc
Copy link
Contributor

@ykevinc ykevinc commented Apr 9, 2018

Hi,

I learned about kubeflow recently and gave it a run, but it occurred to me that:

  1. If there aren't enough nodes in the cluster, then creating server in browser (tf-hub-0) will wait forever.
  2. When kubeflow docker image is being installed, if node doesn't have enough storage, kubectl port-forward tf-hub-0 process will have issue.
E0405 13:41:46.006868   61040 portforward.go:331] an error occurred forwarding 8000 -> 8000: error forwarding port 8000 to pod 237c9ddb56660461a4328110bdbb2624eea8ef4dc5055031bd00d0c6d393a831, uid : container not running (237c9ddb56660461a4328110bdbb2624eea8ef4dc5055031bd00d0c6d393a831)

This updates the user guide to get around the issues. At the same time I want to raise them if maybe they can be solved without updating the user guide (pre-check cluster size before deploying pod, potentially make kubeflow image smaller)


This change is Reviewable

user_guide.md Outdated
@@ -9,7 +9,7 @@ This guide will walk you through the basics of deploying and interacting with Ku
## Requirements
* Kubernetes >= 1.8 [see here](https://github.com/kubeflow/tf-operator#requirements)
* ksonnet version [0.9.2](https://ksonnet.io/#get-started) or later. (See [below](#why-kubeflow-uses-ksonnet) for an explanation of why we use ksonnet)

* An existing 2-node (at least) kubernate cluster. Nodes need to have storage >= 20 GB due to the ML libraries and third party packages being bundled in Kubeflow Docker images
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need 2 node? I've tested Kubeflow with 1 node minikube multiple times with success. Granted, you need a lot of disk storage mostly because of Jupyter notebook image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inc0 after looking at the logs, my wording should be more specific to CPUs instead of nodes. I used a AWS t2.medium node for going through the kubeflow user guide.

If the node had had more than 2 CPUs, spawning would have been fine. t2.medium was documented to have 2 'vCPU', but it just wasn't enough for spawning another pod.

Please let me know if I should update the wordings or I safely can close the PR, assuming they are common prerequisites in Kubernetes/Jupyter worlds.

Name:         jupyter-mimi
Namespace:    kubeflow
Node:         <none>
Labels:       app=jupyterhub
              component=singleuser-server
              heritage=jupyterhub
              hub.jupyter.org/username=mimi
Annotations:  <none>
Status:       Pending
IP:           
Containers:
  notebook:
    Image:  gcr.io/kubeflow-images-staging/tensorflow-1.7.0-notebook-gpu:v20180403-1f854c44
    Port:   8888/TCP
    Args:
      start-singleuser.sh
      --ip="0.0.0.0"
      --port=8888
      --allow-root
    Requests:
      cpu:     1
      memory:  1G
    Environment:
      JUPYTERHUB_API_TOKEN:           X
      JPY_API_TOKEN:                  X
      JUPYTERHUB_CLIENT_ID:           user-mimi
      JUPYTERHUB_HOST:                
      JUPYTERHUB_OAUTH_CALLBACK_URL:  /user/mimi/oauth_callback
      JUPYTERHUB_USER:                mimi
      JUPYTERHUB_API_URL:             http://tf-hub-0:8081/hub/api
      JUPYTERHUB_BASE_URL:            /
      JUPYTERHUB_SERVICE_PREFIX:      /user/mimi/
      MEM_GUARANTEE:                  1G
      CPU_GUARANTEE:                  1.0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from no-api-access-please (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  no-api-access-please:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:      
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  1m (x37 over 11m)  default-scheduler  No nodes are available that match all of the predicates: Insufficient cpu (2), PodToleratesNodeTaints (1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to specify minimum requirements about number of CPUs.

@jlewi
Copy link
Contributor

jlewi commented May 9, 2018

@ykevinc Can you sync and make the requested changes please?

@jlewi
Copy link
Contributor

jlewi commented May 14, 2018

Thank you. Sorry the review process was so lengthy.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jlewi
Copy link
Contributor

jlewi commented May 21, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot merged commit dfad46e into kubeflow:master May 21, 2018
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021
yanniszark pushed a commit to arrikto/kubeflow that referenced this pull request Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants