Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: A deployment story - Using GPUs on GKE #994

Open
consideRatio opened this issue Oct 25, 2018 · 65 comments
Open

WIP: A deployment story - Using GPUs on GKE #994

consideRatio opened this issue Oct 25, 2018 · 65 comments

Comments

@consideRatio
Copy link
Member

consideRatio commented Oct 25, 2018

GPU powered machine learning on GKE

To enable GPUs on GKE, this is what I've done. Note that this post is a Work In Progress and will be edited from time to time. To see when the last edit was made, see the header of this post.

Prerequisite knowledge

Kubernetes nodes, pods and daemonsets

A node is represents actual hardware on the cloud, a pod represents something running on a node, and a daemonset will ensure one pod running something is created for each node. If you lack knowledge about kubernetes, I'd recommend learning more at their concepts page.

Bonus knowledge:

This video provides a background allowing you to understand why additional steps is required for this to work: https://www.youtube.com/watch?v=KplFFvj3XRk

NOTE: Regarding taints. GPU nodes will get them on GKE, and pods requesting them will get tolerations, without any additional setup.

1. GKE Kubernetes cluster on a GPU enabled zone

Google has various zones (datacenters), some does not have GPUs. First you must have a GKE cluster coupled with a zone that has GPU access. To find out what zones has GPUs and what kind of GPUs it has, see this page. In overall performance and cost, K80 < P100 < V100. Note that there is also TPUs and that their availability is also zone dependant. This documentation will not address utilizing TPUs though.

Note that GKE Kubernetes clusters comes with a pre-installed with some parts needed for GPUs to be utilized:

  1. A daemonset in your Kubernetes cluster called nvidia-gpu-device-plugin. I don't know fully what this does yet.
  2. A custom resource controller plugin, which is enabled by default, that will handle extra resource requests such as nvidia.com/gpu: 1 properly.

2. JupyterHub installation

This documentation assumes you have deployed a JupyterHub already by following the https://z2jh.jupyter.org guide on your Kubernetes cluster.

3. Docker image for the JupyterHub users

I built an image for a basic Hello World with GPU enabled Tensorflow. If you are fine to utilize this, you don't need to do anything further. My image is available as consideratio/singleuser-gpu:v0.3.0.

About the Dockerfile

I build on top of a jupyter/docker-stacks image to allow JupyterHub to integrate well with. I also pinned cudatoolkit=9.0, it is a dependency of tensorflow-gpu but would install with a even newer version that is unsupported by the GPUs I'm aiming to use, namely Tesla K80 or Tesla P100. To learn more about these compatibility issues see: https://docs.anaconda.com/anaconda/user-guide/tasks/gpu-packages/

Dockerfile reference

NOTE: To make this run without a GPU available, you must still install an nvidia driver. This can be done using apt-get install nvidia-384, if you do, this must not conflict with the nvidia-driver-installer daemonset later that still needs to run sadly afaik. This is a rabbithole and hard to maintain I think.

# For the latest tag, see: https://hub.docker.com/r/jupyter/datascience-notebook/tags/
FROM jupyter/datascience-notebook:f2889d7ae7d6

# GPU powered ML
# ----------------------------------------
RUN conda install -c conda-forge --yes --quiet \
    tensorflow-gpu \
    cudatoolkit=9.0 && \
    conda clean -tipsy && \
    fix-permissions $CONDA_DIR && \
    fix-permissions /home/$NB_USER

# Allow drivers installed by the nvidia-driver-installer to be located
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/nvidia/lib64
# Also, utilities like `nvidia-smi` are installed here
ENV PATH=${PATH}:/usr/local/nvidia/bin
# To build and push a Dockerfile (in current working directory) to dockerhub under your username
DOCKER_USERNAME=my-docker-username
TAG=v0.3.0
docker build --tag ${DOCKER_USERNAME}/singleuser-gpu:${TAG} . && docker push ${DOCKER_USERNAME}/singleuser-gpu:${TAG}

3B. Create an image using repo2docker (WIP)

jupyterhub/team-compass#96 (comment)

4. Create a GPU node pool

Create a new node pool for your Kubernetes cluster. I choose a n1-highmem-2 node with a Tesla K80 GPU. These instructions are written and tested for K80 and P100.

Note that there is an issue of using autoscaling from 0 nodes, and that it is a slow process to scale up a GPU node as it needs to start, install drivers, and download the image file - each step takes quite a while. I'm expecting 5-10 minutes of startup for this. I recommend you start out with using a single fixed node while setting this up initially.

For details on how to setup a node pool with attached GPUs on the nodes, see: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#create

5. Daemonset: nvidia-driver-installer

You need to make sure the GPU nodes gets appropriate drivers installed. This is what the nvidia-driver-installed daemonset will do for you! It will install drivers and utilities in /usr/local/nvidia, which is required for the conda package tensorflow-gpu for example to function properly.

NOTE: Tensorflow have a pinned dependency on cudatoolkit, and a given cudatoolkit requires a minimum NVIDIA driver version. tensorflow=1.11 and tensorflow=1.12 requires cudatoolkit=9.0 and tensorflow=1.13 will require cudatoolkit=10.0 for example, cudatoolkit=9.0 requires a NVIDIA driver of at least version 384.81 and cudatoolkit=10.0 requires a NVIDIA driver of at least version 410.48.

# To install the daemonset:
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/daemonset.yaml
# Verify that it is installed, you should find something with:
kubectl get -n kube-system ds/nvidia-driver-installer
# To verify the daemonset pods are successfully installing drivers on the node(s)
# 1. See that there are one pod per GPU node running or attempting to run
kubectl get pods -n kube-system ds/nvidia-driver-installer

# 2. Print the relevant logs of the nvidia-driver-installer pods
kubectl logs -n kube-system ds/nvidia-driver-installer -c nvidia-driver-installer

# 3. Verify it ends with something like this:

WARNING: nvidia-installer was forced to guess the X library path '/usr/lib'
         and X module path '/usr/lib/xorg/modules'; these paths were not
         queryable from the system.  If X fails to find the NVIDIA X driver
         module, please install the `pkg-config` utility and the X.Org
         SDK/development package for your distribution and reinstall the
         driver.

/
[INFO    2018-11-12 08:05:35 UTC] Updated cached version as:
CACHE_BUILD_ID=10895.52.0
CACHE_NVIDIA_DRIVER_VERSION=396.26
[INFO    2018-11-12 08:05:35 UTC] Verifying Nvidia installation
Mon Nov 12 08:05:37 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   56C    P0    86W / 149W |      0MiB / 11441MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[INFO    2018-11-12 08:05:38 UTC] Finished installing the drivers.
[INFO    2018-11-12 08:05:38 UTC] Updating host's ld cache

Set a driver version for the nvidia-driver-installer daemonset to install

The default driver as of writing for the daemonset above, is 396.26. I struggled with installing that without this daemonset, so I ended up using 384.145 instead.

Option 1: Use a one liner

kubectl patch daemonset -n kube-system nvidia-driver-installer --patch '{"spec":{"template":{"spec":{"initContainers":[{"name":"nvidia-driver-installer","env":[{"name":"NVIDIA_DRIVER_VERSION","value":"384.145"}]}]}}}}'

Option 2: manually edit the daemonset manifest...

kubectl edit daemonset -n kube-system nvidia-driver-installer

# ... and then add the following entries to the init containers `env` (`spec.template.spec.init_containers[0].env`)
# - name: NVIDIA_DRIVER_VERSION
#   value: "384.145"

Reference: https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu

6. Configure some spawn options

Perhaps the user does not always need a GPU, so it is good to allow the user to choose instead. This can be done with the following configuration.

WARNING: Most recent z2jh Helm chart utilizes Kubespawner version 0.10.1 that has deprecated image_spec in favor of 'image', and this is updated to match this. Tweak this configuration to use image_spec if you want to deploy with an older version of the z2jh chart or Kubespawner.

singleuser:
  profileList:
    - display_name: "Default: Shared, 8 CPU cores"
      description: "By selecting this choice, you will be assigned a environment that will run on a shared machine with CPU only."
      default: True
    - display_name: "Dedicated, 2 CPU cores & 13GB RAM, 1 NVIDIA Tesla K80 GPU"
      description: "By selecting this choice, you will be assigned a environment that will run on a dedicated machine with a single GPU, just for you."
      kubespawner_override:
        image: consideratio/singleuser-gpu:v0.3.0
        extra_resource_limits:
          nvidia.com/gpu: "1"

Result

Note that this displays a screenshot of the configuration I've utilized, which differs slightly from the example configuration and setup documented in this post.

image

7. Verify GPU functionality

After you have got a Jupyter GPU pod launched and running, you could verify your GPU works as intended by...

  1. Open a terminal and run:
# Verify that the following command ...
nvidia-smi

# has an output like below:
Mon Nov 12 10:38:07 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   61C    P0    72W / 149W |  10877MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Could not find nvidia-smi?

  • Perhaps the PATH variable was not configured properly? Try inspecting it and looking inside the folder where the nvidia-smi binary was supposed to be installed.
  • Perhaps the nvidia-driver-installer failed to install the driver?
  1. Clone this repo
git clone git@github.com:aymericdamien/TensorFlow-Examples.git
  1. Open a demonstration notebook, for example TensorFlow-Examples/notebooks/convolutional_network.ipynb, and run all cells.

Previous issues

Autoscaling - no longer an issue?

UPDATE: I'm not sure why this happened, but it doesn't happen any more for me.

I've had massive trouble autoscaling. I managed to autoscale from 1 to 2 nodes, but it took 37 minutes... Autoscale down worked as it should, with 10 minutes of a unused GPU node for the be scaled down.

To handle the long scale up time, you can configure a long timeout for kubespawner's spawning procedure like this:

singleuser:
  startTimeout: 3600

Latest update (2018-11-15)

I got autoscaling to work, but it is slow still, it takes about 9 minutes plus the time for your image to be pulled to the new node. Some lessons learned:

  1. The cluster autoscaler runs simulations using a hardcoded copy of kube-scheduler default configuration logic, so utilizing a custom kube-scheduler configuration with different predicates could cause issues. See Getting the CA to play well with a custom scheduler kubernetes/autoscaler#1406 for more info.

  2. I stopped using a dynamically applied label as a label selector (cloud.google.com/gke-accelerator=nvidia-tesla-k80). I don't remember if this worked at all with the cluster autoscaler, and that it worked to scale from both 0->1 node and from 1->2 nodes. If you want to select a specific GPU from multiple node pools, I'd recommend adding your own pre-defined labels like gpu: k80 and using them to nodeSelector select on.

  3. I started using the default-scheduler instead of the jupyterhub-user-scheduler as I figure it would be safer to not risk there was a difference in what predicates they used even though they may have the exact same predicates configured. NOTE: a predicate is a function that takes information about a node in this case, and returns true or false if the node is a candidate to be scheduled on.

To debug the autoscaler:

  1. Inspect the spawning pods events using kubectl describe pod -n jhub jupyter-erik-2esundell
  2. Inspect the cluster autoscalers status configmap by running:
kubectl get cm -n kube-system cluster-autoscaler-status -o yaml
  1. Look for the node pool in the output, mine was named user-k80
      Name:        https://content.googleapis.com/compute/v1/projects/ds-platform/zones/europe-west1-b/instanceGroups/gke-floss-user-k80-dd296e90-grp
  1. Inspect the status of your node-pool regarding cloudProviderTarget, registered and ready.
      Name:        https://content.googleapis.com/compute/v1/projects/ds-platform/zones/europe-west1-b/instanceGroups/gke-floss-user-k80-dd296e90-grp
      Health:      Healthy (ready=2 unready=0 notStarted=0 longNotStarted=0 registered=2 longUnregistered=0 cloudProviderTarget=2 (minSize=1, maxSize=2))
  1. You want all to become ready.

  2. You can also inspect the node events with kubectl describe node the-name-of-the-node:

Events:
  Type    Reason                   Age   From                                              Message
  ----    ------                   ----  ----                                              -------
  Normal  Starting                 20m   kubelet, gke-floss-user-k80-dd296e90-99fw         Starting kubelet.
  Normal  NodeHasSufficientDisk    20m   kubelet, gke-floss-user-k80-dd296e90-99fw         Node gke-floss-user-k80-dd296e90-99fw status is now: NodeHasSufficientDisk
  Normal  NodeHasSufficientMemory  20m   kubelet, gke-floss-user-k80-dd296e90-99fw         Node gke-floss-user-k80-dd296e90-99fw status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    20m   kubelet, gke-floss-user-k80-dd296e90-99fw         Node gke-floss-user-k80-dd296e90-99fw status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     20m   kubelet, gke-floss-user-k80-dd296e90-99fw         Node gke-floss-user-k80-dd296e90-99fw status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  20m   kubelet, gke-floss-user-k80-dd296e90-99fw         Updated Node Allocatable limit across pods
  Normal  NodeReady                19m   kubelet, gke-floss-user-k80-dd296e90-99fw         Node gke-floss-user-k80-dd296e90-99fw status is now: NodeReady
  Normal  Starting                 19m   kube-proxy, gke-floss-user-k80-dd296e90-99fw      Starting kube-proxy.
  Normal  UnregisterNetDevice      14m   kernel-monitor, gke-floss-user-k80-dd296e90-99fw  Node condition FrequentUnregisterNetDevice is now: False, reason: UnregisterNetDevice

Potentially related:

I'm using Kubernetes 1.11.2-gke.9, but my GPU nodes apparently have 1.11.2-gke.15.
Autoscaling from 0 nodes: kubernetes/autoscaler#903

User placeholders for GPU nodes

Currently the user placeholders can only go to one kind of node pool, and it would make sense to allow the admin to configure how many placeholders for a normal pool and how many for a GPU pool. They are needed for autoscaling ahead of arriving users to not force them to wait for a new node, and this could be extra relevant for GPU nodes as they may need to be created on the fly every time for an arriving real user without the user placeholders.

We could perhaps instantiate multiple placeholder deployment/statefulsets based on a template and some extra specifications.

Pre pulling images specifically for GPU nodes

Currently we can only specify one kind of image puller, pulling all kinds of images to a single type of node. It is pointless to pull and especially to wait for image pulling of unneeded images, so it would be nice to optimize this somehow.

This is tracked in #992 (thanks @jzf2101!)

The future - Shared GPUs

Users cannot share GPUs like they can share CPU, this is an issue. But in the future, perhaps? From what I've heard this is something that is progressing right now.

@minrk
Copy link
Member

minrk commented Oct 25, 2018

Awesome! 🍰

@koustuvsinha
Copy link

@consideRatio Hi, do you think swapping in pytorch with tensorflow in the dockerfile will work? (changing conda channel and pytorch)

@consideRatio
Copy link
Member Author

@koustuvsinha yepp, installing both would also work i think.

@ablekh
Copy link

ablekh commented Oct 26, 2018

Cool. It sure will be fun to try to use GPUs on Azure AKS. Will report after having a chance to work on it.

@consideRatio
Copy link
Member Author

The post is now updated, I think it is easier to read and has a more logical order to the steps taken. It also has some extra verification steps, but still not enough verification steps I think.

@jzf2101
Copy link

jzf2101 commented Nov 13, 2018

This is related to #992 correct?

@consideRatio
Copy link
Member Author

consideRatio commented Nov 14, 2018

@jzf2101 This is #994 :D You meant #1021? Yeah those that gets their own GPU etc could certainly be the kind of users that would appreciate being abple to do sudo apt-get install ... I'm glad you raised that issue, it is very relevant to me to get more knowledgeable about as well.

@jzf2101
Copy link

jzf2101 commented Nov 14, 2018

Correction- I meant #992

@consideRatio
Copy link
Member Author

@jzf2101 ah! yepp thanks for connecting this

@consideRatio
Copy link
Member Author

Made an update to the text, I added information about autoscaling the GPU nodes. Something resolved itself, I'm not sure what, now it "only" takes 9 minutes + image pulling to get a GPU node ready.

@jzf2101
Copy link

jzf2101 commented Nov 15, 2018

Which version of Ubuntu is in the Docker Images? I can't find it in the notes.

@consideRatio
Copy link
Member Author

@jzf2101 the image I provide in this post is built from jupyter/datascience-notebook (1), built in top of scipy-notebook (2), on top of minimal-notebook (3), on top of base-notebook (4), on top of ubuntu 18.04 aka bionic.

  1. jupyter/datascience-notebook (https://github.com/jupyter/docker-stacks/blob/master/datascience-notebook/Dockerfile)
  2. jupyter/scipy-notebook (https://github.com/jupyter/docker-stacks/blob/master/scipy-notebook/Dockerfile)
  3. jupyter/minimal-notebook (https://github.com/jupyter/docker-stacks/blob/master/minimal-notebook/Dockerfile)
  4. jupyter/base-notebook (https://github.com/jupyter/docker-stacks/blob/master/base-notebook/Dockerfile)

@amanda-tan
Copy link

@consideRatio Thank you for putting this together! I am currently stuck at Step #5. I get an error when I try to run kubectl logs

error: cannot get the logs from *extensions.DaemonSet

kubectl get -n kube-system ds/nvidia-driver-installer gets me this:

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
nvidia-driver-installer 1 1 1 1 1 45m

Suggestions?

@consideRatio
Copy link
Member Author

@amanda-tan hmm clueless, but you could do a more explicit command:

kubectl logs -n kube-system nvidia-driver-installer-alskdjf

Where you would enter your actual pod name

@clkao
Copy link
Contributor

clkao commented Dec 7, 2018 via email

@amanda-tan
Copy link

@clkao Yes! That worked thank you!

Also, just wanted to add that I got this to work -- the profileListConfig did not work for me ; I probably made an error somewhere but just whittling it down to:

extraConfig: |- c.KubeSpawner.extra_resource_limits = {"nvidia.com/gpu": "1"}

worked like a charm. Thank you so much.

@amanda-tan
Copy link

amanda-tan commented Jan 24, 2019

  1. Has anyone tried provisioning pre-emptible GPU instances with this setup? I am having a hard time getting the beyond one instance of pre-emptible GPU.

  2. Also, I am trying to use this setup for classroom and it seems extremely cost-ineffective; are there suggestions on how to lower the overall costs?

ETA: I guess there is also a Pre-emptible GPU quota which must be increased! That solved #1.

@consideRatio
Copy link
Member Author

@amanda-tan yepp this will cost a lot. I don't know how to reduce the cost much, but the experience for the users can be improved greatly with user-placeholders as found in the z2jh 0.8-dev releases available already. Users would not have to wait for the scale up in best case with these. See the "optimizations" section of z2jh.jupyter.org for more info about such autoscaling optimizations. Requires k8s 1.11+ and Helm 2.11+.

Having multiple GPUs per node is also a reasonable idea, then the users could share some CPU even though they dont share the GPUs.

@djgagne
Copy link

djgagne commented Jan 24, 2019

I ran a short course using Jupyterhub and Kubernetes with pre-emptible GPUs and scaled up to about 50 users. I ran the nodes for 8 hours with a total cost of about $75 on Google Cloud. Using 10 CPU/8 GPU clusters worked well for me so that each user had 1 CPU and 1 GPU available. You do need an extra 2 CPUs per node to manage the sub-cluster, otherwise you will have 1 GPU sitting idle per cluster. Use K80 GPUs to keep costs minimized and make sure you are running in a region and zone that has them available. Adding extra RAM to a node is really cheap, so don't be afraid to do that beyond the 6.5 GB per CPU standard for the highmem instances.

Make sure you have your quota increase requests in well before you need the nodes for the course because that was one of the more challenging parts for me to get through. You will need the GPUs (all regions) and regional GPU quotas increased. There are also separate quotas for preemptible GPUs versus regular GPUs, so be aware of those. You may also run into issues with quotas on the number of CPUs and the number of IP addresses you can have, so check on all those.

@consideRatio
Copy link
Member Author

consideRatio commented Mar 27, 2020

@astrajohn I got that issue recently myself, and for me it was because:

  1. I start the container up as root
  2. I use the jupyter/docker-stacks images' script called start.sh (invoked through start-singleuser.sh)
  3. The start.sh script switches from the root user to the jovyan user in a sudo -E -u jovyan command.

This is the issue, the switch from the root user to the jovyan user will reset a certain set of paths as can be spotted with sudo sudo -V. I've submitted a fix for this in jupyter/docker-stacks#1052.


@rahuldave there are no GPU-specific user placeholder pods as part of the JupyterHub helm chart, but you can create it yourself by mimicing the statefulset called user-placeholder part of the helm chart and adjusting it slightly. Here is my attempt.

# Purpose:
# --------
# To have a way to ensure there is always X numbers of slots available for users
# that quickly needs some GPU.
#
# Usage:
# ------
# 1. Update metadata.namespace, spec.template.spec.priorityClassName,
#    spec.template.spec.schedulerName with your namespace and helm release name
#    respectively. Verify the namespace matches with where you deployed your
#    JupyterHub helm chart by inspecting `kubectl get namespace`, and verify the
#    helm release name with `kubectl get priorityclass`.
# 3. Optionally update spec.template.spec.affinity.nodeAffinity to match how
#    your JupyterHub helm chart was configured.
# 4. Optionally configure your resource requests to align with what you
#    provision your users in the Helm chart.
# 5. kubectl apply -f user-placeholder-gpu-daemonset.yaml
# 6. kubectl scale -n <namespace> sts/user-placeholder-gpu --replicas 1
# 7. You may want to ensure your continuous image puller have a toleration for
#    `nvidia.com/gpu=present:NoSchedule` as well to prepull the images, assuming
#    you have the same images for the GPU nodes as the CPU nodes.
#    
#    kubectl patch -n <namespace> ds/continuous-image-puller --type=json --patch '[{"op":"add", "path":"/spec/template/spec/tolerations/-", "value": {"effect":"NoSchedule", "key":"nvidia.com/gpu", "operator":"Equal", "value":"present"}}]'
#
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: jupyterhub
    component: user-placeholder-gpu
  name: user-placeholder-gpu
  namespace: jupyterhub
spec:
  podManagementPolicy: Parallel
  replicas: 0
  selector:
    matchLabels:
      app: jupyterhub
      component: user-placeholder-gpu
  serviceName: user-placeholder-gpu
  template:
    metadata:
      labels:
        app: jupyterhub
        component: user-placeholder-gpu
    spec:
      affinity:
        nodeAffinity:
          # Make this either requiredDuring... or preferredDuring... so it
          # matches how you configured your Helm chart in
          # scheduling.userPods.nodeAffinity.matchNodePurpose.
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: hub.jupyter.org/node-purpose
                operator: In
                values:
                - user
      containers:
      - image: gcr.io/google_containers/pause:3.1
        name: pause
        resources:
          limits:
            # OPTIONALLY: Configure these to align with how you configure your
            # users resource limits and requests in the JupyterHub Helm chart.
            # This is only relevant if you will have a mix of gpu and non-gpu
            # users on this GPU node though, as the limiting resource then could
            # end up being something else than the GPUs.
            nvidia.com/gpu: 1
          requests:
            nvidia.com/gpu: 1
      priorityClassName: jupyterhub-user-placeholder-priority
      schedulerName: jupyterhub-user-scheduler
      tolerations:
      - effect: NoSchedule
        key: hub.jupyter.org_dedicated
        operator: Equal
        value: user
      - effect: NoSchedule
        key: hub.jupyter.org/dedicated
        operator: Equal
        value: user
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate

@snickell
Copy link
Contributor

snickell commented Jun 7, 2020

I've been playing with NVIDIA's helm chart for injecting GPU drivers etc as an alternative to Google's daemonset, worth looking into: https://github.com/NVIDIA/gpu-operator

I think this could provide an upstream helm chart dependency that could be included in response to a values.yaml setting to enable GPUs, which seems like a more z2jh approach?

@consideRatio
Copy link
Member Author

Thanks for describing that as an option @snickell ! I think it is an approach that makes sense to document, but not to have as an optional chart dependency. Adding this would be a maintenance challenge too big to keep current i think.

  • its not testable locally or on travis
  • it would only be used by a subset of users
  • its Nvidia only (also GKE only?)

Having a working example in docs with a timestamp on when it worked seems like a good path in between to me.

@snickell
Copy link
Contributor

Totally makes sense to me to document the setup nicely and move on.

Seems like GPU support out of the box would be a "nice to have someday" in the core, or perhaps in a companion chart?

It's too bad the GPU setup story is so complex (at least on GKE) in 2020, its deceptively easy to click the "yes give me a GPU in my pool!" button and surprisingly hard to get it all going correctly. IMO A great starting point would be if GKE would allow the GPU node taints to be optional, but of course that's not in our control 🤷🏾‍♀️ It'd sure be nice if NVIDIA and AMD got together and released gpu-operator :-P

In case its helpful:

  • Not GKE only, works fine baremetal, guessing it works for EKS and AKS as well?
  • NVIDIA only, though flip side is I suspect that covers well over 3/4 of the GPUs deployed in clusters in 2020, since EKS, GKE and AKS all support only NVIDIA GPUs to the best of my knowledge. CUDA was/is a pretty annoying but successful gambit on NVIDIA's part.
  • It does seem like anywhere doing ML will probably end up needing to setup GPUs, and the overlap between "Z2JH users" and "ML users" is probably non-trivial? Its getting harder to think of scientific computing without running into at least a few models you want to run on a GPU even if you don't train.
  • Side note: I'm curious to read about the Z2JH travis setup, is there somewhere I can look to learn more about that?

@snickell
Copy link
Contributor

snickell commented Jun 10, 2020

No pressure thought: On the companion chart front, one idea would be to have z2jh-experimental-gpu that pulls z2jh as its helm dependency, and sprinkles in 'best effort no guarantees' GPU bits. In my experience docs rot a lot faster than repos, because repos tend to get issues / PRs faster (for better or worse 🤣)

Having an experimental companion chart would also start the process of building a foundation for GPUs that could someday be folded into z2jh mainstream. e.g. if its 2023 and its been a couple years since z2jh-experimental-gpu had major change and a ton of people are using it, you have info on how good the setup is and might think "lets put that in the main chart". Whereas with a documented example, you never quite know who's doing what, and how well its working.

@manics
Copy link
Member

manics commented Jun 10, 2020

Side note: I'm curious to read about the Z2JH travis setup, is there somewhere I can look to learn more about that?

It's being refactored at the moment, see #1664

On the companion chart front, one idea would be to have z2jh-experimental-gpu that pulls z2jh as its helm dependency, and sprinkles in 'best effort no guarantees' GPU bits

That sounds sensible, it's the model used by the BinderHub Helm chart.

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/cuda-and-images-problem-on-jupyterhub-k8s/5139/2

@Guanzhou-Ke
Copy link

@meeseeksmachine Thank you for you attention! @consideRatio Dear my friend, I according your tutorial to build my singlegpuuser such below:

Dockerfile

# For the latest tag, see: https://hub.docker.com/r/jupyter/datascience-notebook/tags/
FROM jupyter/datascience-notebook:5197709e9f23

RUN conda config --set channel_priority flexible

# GPU powered ML
# ----------------------------------------
RUN conda install --yes --quiet --force-reinstall \
    tensorflow-gpu \
    cudatoolkit=10.2 && \
    conda clean -tipsy && \
    fix-permissions $CONDA_DIR && \
    fix-permissions /home/$NB_USER

# Allow drivers installed by the nvidia-driver-installer to be located
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/nvidia/lib64
# Also, utilities like `nvidia-smi` are installed here
ENV PATH=${PATH}:/usr/local/nvidia/bin

But, unfortunately, I could still not find cuda information when I begin notebook with my singlegpuuser image. I could see nvidia driver information by using nvidia-smi.

Is it because I did not complete step 5? I saw the nvidia information before I built and used this image. Just add below information into jhub config.yaml.

singleuser:
  profileList:
    - display_name: "Dedicated, 2 CPU cores & 13GB RAM, 1 NVIDIA Tesla K80 GPU"
      description: "By selecting this choice, you will be assigned a environment that will run on a dedicated machine with a single GPU, just for you."
      kubespawner_override:
        image: consideratio/singleuser-gpu:v0.3.0
        extra_resource_limits:
          nvidia.com/gpu: "1

So, tell me how to do? Thanks!

@jkovalski
Copy link

Has anyone done something similar to this using AWS EKS?

I currently have GPU nodes running in my cluster with the official Amazon EKS optimized accelerated Amazon Linux AMIs, which from my understanding, has the NVIDIA drivers and the nvidia-container-runtime (as the default runtime) already installed.

I can successfully run a pod based off of the nvidia/cuda:9.2-devel image, which gives me some output for nvidia-smi.

However, I am having trouble constructing my own Docker image on top of one of the jupyter/docker-stacks images, as in this demonstration. I tried using the same Dockerfile shown above, but I cannot successfully run nvidia-smi, which gives me the following output: Failed to initialize NVML: Unknown Error. I noticed that I do not have a /usr/local/nvidia directory. I do have /usr/bin/nvidia-smi, however.

I don't believe there is an equivalent to the nvidia-driver-installer daemonset for AWS EKS, as described above, so that is one difference here, but as described in the AWS docs, it sounds like the official AMI that I'm using should take care of the drivers already.

It's hard to tell what's missing here. I'd really appreciate any help!

@jeffliu-LL
Copy link

@jkovalski I was able to get this working on EKS.

There's an nvidia daemonset that you need to run on kubernetes in order for the kubernetes containers to use the nvidia GPUs. I think the AWS AMI includes the drivers, but not the kubernetes daemonset. I followed this tutorial: https://aws.amazon.com/blogs/compute/running-gpu-accelerated-kubernetes-workloads-on-p3-and-p2-ec2-instances-with-amazon-eks/

Then for the user notebook image, I used this: https://hub.docker.com/r/cschranz/gpu-jupyter/ which uses the NVIDIA CUDA image as the base, and installs the jupyter/docker-stacks dependencies on top.

@jkovalski
Copy link

jkovalski commented Aug 24, 2020

@jeffliu-LL Hm, so I have the NVIDIA device plugin daemonset already running in my cluster, and the Jupyter singleuser notebook pods are successfully getting scheduled onto my GPU node (p3.2xlarge), so that part is working. I'm also using the official AWS AMI: amazon-eks-gpu-node-1.16-*.

I also tried using that image that you linked, but when I try to run nvidia-smi, I get: Failed to initialize NVML: Unknown Error. Specifically, I am using cschranz/gpu-jupyter:v1.1_cuda-10.1_ubuntu-18.04_python-only.

My issue seems to be related to the Docker image / the pod having access to the underlying GPU. Did you have to add anything to the Docker image to make it work?

@jkovalski
Copy link

@jeffliu-LL Following up - the image seems okay. I spun up a pod using that base image I referenced above, and I was able to successfully run the nvidia-smi command, which I saw from the pod's logs. It only seems to not work when I'm running it in the context of JupyterHub. Is there anything that you had to add in the JupyterHub configuration, maybe for singleuser specifically?

@jeffliu-LL
Copy link

@jkovalski My singleuser config entry looked like:

singleuser:
  image:
    # default image
    name: {account_name}/deep-learning-img
    tag: {tag_id}
  profileList:
    - display_name: "Default GPU environment"
      description: "Environment with GPU dependencies installed"
      default: true
      kubespawner_override:
        extra_resource_limits:
          nvidia.com/gpu: "1"

The line that might be relevant is the nvidia.com/gpu: "1" entry.
I had some other options under singleuser for mounting drives and configuring memory limits per user, but I doubt those are relevant.

@scottyhq
Copy link
Contributor

scottyhq commented Aug 25, 2020

@jkovalski and @jeffliu-LL , i think there are probably various ways to get it done, and exact settings likely change with EKS and CUDA versions etc. We have this working on GKE and EKS currently and described the setup in a blog post https://medium.com/pangeo/deep-learning-with-gpus-on-pangeo-9466e25bfd74. It links to our images and config settings which are all open source. One other kubespawner setting we needed was 'environment': {'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility'}.

@jkovalski
Copy link

@jeffliu-LL @scottyhq Thanks guys. Unfortunately, I still cannot get his working. I suspect it might have to do with the versions of NVIDIA/CUDA, etc. I am using the official Amazon Linux 2 AMI (amazon-eks-gpu-node-1.16-*), but I'm having trouble finding anything about the NVIDIA drivers that are installed on it. I'm guessing I need to make sure whatever is on that AMI is compatible with what's on the Docker images I'm trying to use.

@snickell
Copy link
Contributor

snickell commented Dec 16, 2020

I'm having trouble getting user-placeholder pods to tolerate the nvidia/gpu taint that GKE applies to cluster pool nodes with a GPU. How do I effect the tolerations section of user-placeholder pods?

UPDATE:

  1. Tolerations are applied to the user-placeholder pods as part of their stateful set here: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/templates/scheduling/user-placeholder/statefulset.yaml#L37
  2. Which in turn references the helm template value jupyterhub.userTolerations defined here: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/templates/scheduling/_scheduling-helpers.tpl#L5:L17
  3. Which in turn inserts the value of singleuser.extraTolerations which one can set in the z2jh helm config yaml, and many people do when enabling GPUs.

People having trouble with user-placeholder, like me, probably have at least two pools (with GPU, without GPU) and don't enable the GPU tolerations in singleuser.extraTolerations, instead they enable it on a per-profile basis using kubespawner_overrides, and the extra_resource_limits automatically adds the required toleration:

    profileList:
      - display_name: "Improc analyst profile"
        description: "Setup for improc analysts, now with magic GPU 🧙🏿‍♂️-dust"
        default: true
        kubespawner_override:
          extra_resource_limits:
            nvidia.com/gpu: "1"

But unless I'm mistaken, this DOES NOT set the toleration on the user-placeholder pods, it would have to be added to singleuser.extraTolerations, but then EVERY pod would tolerate, and non-gpu profiles would start getting allocated to GPU nodes potentially?

The problem is that, as far as I can tell, there's no way to accept GPU taint on user-placeholder without accidentally accdepting it on ALL user pods. @consideRatio does that seem correct to you?

@consideRatio
Copy link
Member Author

consideRatio commented Dec 16, 2020

Comment update

The problem is that, as far as I can tell, there's no way to accept GPU taint on user-placeholder without accidentally accdepting it on ALL user pods. @consideRatio does that seem correct to you?

Correct, you are required to have two separate statefulsets with user-placeholders for this. And below I provide some code for you to add that to a helm chart that has JupyterHub helm chart as a dependency without needing to copy paste much code etc.


@snickell, toleration for that taint is typically automatically provided as part of requesting GPU alongside CPU/Memory. I'm copy pasting a solution.

Assuming you have a local chart, that in turn depends on the JupyterHub Helm chart, you can add the following parts to it.

values.yaml

userPlaceholderGPU:
  enabled: true
  replicas: 0

_helpers.tpl

{{/*
NOTE: This utility template is needed until https://git.io/JvuGN is resolved.

Call a template from the context of a subchart.

Usage:
  {{ include "call-nested" (list . "<subchart_name>" "<subchart_template_name>") }}
*/}}
{{- define "call-nested" }}
{{- $dot := index . 0 }}
{{- $subchart := index . 1 | splitList "." }}
{{- $template := index . 2 }}
{{- $values := $dot.Values }}
{{- range $subchart }}
{{- $values = index $values . }}
{{- end }}
{{- include $template (dict "Chart" (dict "Name" (last $subchart)) "Values" $values "Release" $dot.Release "Capabilities" $dot.Capabilities) }}
{{- end }}

Dedicated user-placeholder template for GPUs

{{- if .Values.userPlaceholderGPU.enabled }}
# Purpose:
# --------
# To ensure there is always X numbers of slots available for users that quickly
# needs a GPU pod.
#
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: user-placeholder-gpu-p100
spec:
  podManagementPolicy: Parallel
  replicas: {{ .Values.userPlaceholderGPU.replicas }}
  selector:
    matchLabels:
      component: user-placeholder-gpu-p100
  serviceName: user-placeholder-gpu-p100
  template:
    metadata:
      labels:
        component: user-placeholder-gpu-p100
    spec:
      nodeSelector:
        gpu: p100
      {{- if .Values.jupyterhub.scheduling.podPriority.enabled }}
      priorityClassName: {{ .Release.Name }}-user-placeholder-priority
      {{- end }}
      {{- if .Values.jupyterhub.scheduling.userScheduler.enabled }}
      schedulerName: {{ .Release.Name }}-user-scheduler
      {{- end }}
      tolerations:
        {{- include "call-nested" (list . "jupyterhub" "jupyterhub.userTolerations") | nindent 8 }}
      {{- if include "call-nested" (list . "jupyterhub" "jupyterhub.userAffinity") }}
      affinity:
        {{- include "call-nested" (list . "jupyterhub" "jupyterhub.userAffinity") | nindent 8 }}
      {{- end }}
      terminationGracePeriodSeconds: 0
      automountServiceAccountToken: false
      containers:
      - image: gcr.io/google_containers/pause:3.1
        name: pause
        resources:
          limits:
            nvidia.com/gpu: 1
          requests:
            nvidia.com/gpu: 1
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
{{- end }}

@snickell
Copy link
Contributor

@consideRatio oh cool, processing now, sorry I replied to myself in an update to my above comment, confusing, my bad

@snickell
Copy link
Contributor

Wow, thank you for collecting that information so clearly @consideRatio , I appreciate it immensely, your workaround makes very good sense to me, and directly addresses my issue, I'll be trying it in a few minutes 🙏🏽🙏🏽🙏🏽

@mohammedi-haroune
Copy link

We're having the same issue with LD_LIBRARY_PATH as @astrajohn and for the exact reasons @consideRatio stated, so, LD_LIBRARY_PATH is ignored when running start.sh with root user because of sudo as noted in https://www.sudo.ws/man/1.7.4p6/sudo.man.html

Note that the dynamic linker on most operating systems will remove variables that can control dynamic linking from the environment of setuid executables, including sudo. Depending on the operating system this may include RLD, DYLD, LD_, LDR_, LIBPATH, SHLIB_PATH, and others. These type of variables are removed from the environment before sudo even begins execution and, as such, it is not possible for sudo to preserve them.

We're using KubeSpawner to run notebook servers on GKE, our images are based on docker-stacks images, GPU daemon set is created in our cluster as stated in the issue description, but I noticed the daemon set installs CUDA in /home/kubernetes/bin/nvidia (see GKE docs) and mounts it to the containers that need GPU at /usr/local/nvidia which makes the containers rely on LD_LIBRARY_PATH environment to find CUDA, that's why notebook servers can't detect the GPU if they are running with start.sh using root user.

Even in @consideRatio's pull request on docker-stacks, sudo is still used here, so I don't think that'll fix the issue.

Also, I think @jkovalski is having the same issue.

How are you guys dealing with that?

@consideRatio
Copy link
Member Author

consideRatio commented May 23, 2021

@mohammedi-haroune I've used the closed PRs changes, replacing the start scripts for my image. It would be great to land similar changes to jupyter/docker-stacks.

@mohammedi-haroune
Copy link

@mohammedi-haroune I've used the closed PRs changes, replacing the start scripts for my image. It would be great to land similar changes to jupyter/docker-stacks.

You're using root user to run start.sh, right?

So, this is the line responsible for keeping LD_LIBRARY_PATH ?
https://github.com/jupyter/docker-stacks/pull/1052/files#diff-41f90d7afcdae13f8516195e078d0a203972b5c5105851eaecfc0a98e9739107R172

echo 'Defaults env_delete -= "PATH LD_* PYTHON*"' &gt;&gt; /etc/sudoers.d/added-by-start-script

@consideRatio
Copy link
Member Author

consideRatio commented May 23, 2021

When i use a root user and switch to another user, for example to first enable sudo for that user, retaining LD_ vars or PATH var is a challenge.

It is in the transition that environment variables can be stripped, and i think that change is what ensures those arent stripped.

@mohammedi-haroune
Copy link

Added this to my Dockerfile solved the issue, notebook servers running on GKE are now able to detect GPU. Thank you @consideRatio.

RUN echo 'Defaults env_delete -= "LD_*"' >> /etc/sudoers.d/added-by-dockerfile

@snickell
Copy link
Contributor

snickell commented Jul 7, 2021

@mohammedi-haroune thanks for posting this, this has been giving us massive grief too

@MattPChoy
Copy link

@consideRatio Hey mate, thanks for all of the amazing work on this. Is your Docker image meant to have libcuda.so cuda libraries installed? I get this error when I try to import tensorflow, which leads to the fact that there is no symlink libcuda.so

Additionally, doesn't look like there's CUDA drivers installed in the same directory.

$ ls | grep cuda
libicudata.a
libicudata.so
libicudata.so.60
libicudata.so.60.2
$ python
Python 3.6.3 |Anaconda, Inc.| (default, Nov  9 2017, 00:19:18)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/opt/conda/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/opt/conda/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/__init__.py", line 22, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/opt/conda/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/opt/conda/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

@vizeit
Copy link

vizeit commented Oct 1, 2024

If anyone wants to use GPU enabled JupyterHub on GKE Autopilot, I have described the details here at this link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests