JupyterHub fails to load image properly, but starts a notebook anyway #226

bkungfoo · 2018-02-09T00:27:04Z

I've encountered this several times during deployment, both on minikube and gke. When starting jupyterhub, sometimes starting a server with a valid image (e.g. gcr.io/kubeflow/tensorflow-notebook-gpu:8fbc341245695e482848ac3c2034a99f7c1e5763) creates a container without any libraries installed.

kubectl logs tf-hub-0 -n $NAMESPACE shows the following error:

[W 2018-02-08 23:51:44.573 JupyterHub configurable:168] Config option singleuser_image_spec not recognized by KubeFormSpawner. Did you mean one of: singleuser_image_pull_policy, singleuser_image_pull_secrets, singleuser_node_selector?

The text was updated successfully, but these errors were encountered:

jlewi · 2018-02-10T02:32:57Z

When you say no libraries are installed you mean python libraries?
Does Jupyter start running?

jlewi · 2018-02-12T14:38:43Z

@bkungfoo ping? Any more info?

bkungfoo · 2018-02-12T19:42:31Z

Jupyter starts running, but tf-gpu is not properly installed. Here is what I get when I create a notebook and run "import tensorflow as tf":

ImportError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py in ()
57
---> 58 from tensorflow.python.pywrap_tensorflow_internal import *
59 from tensorflow.python.pywrap_tensorflow_internal import version

/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py in ()
27 return _mod
---> 28 _pywrap_tensorflow_internal = swig_import_helper()
29 del swig_import_helper

/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py in swig_import_helper()
23 try:
---> 24 _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
25 finally:

/opt/conda/lib/python3.6/imp.py in load_module(name, file, filename, details)
242 else:
--> 243 return load_dynamic(name, filename, file)
244 elif type_ == PKG_DIRECTORY:

/opt/conda/lib/python3.6/imp.py in load_dynamic(name, path, file)
342 name=name, loader=loader, origin=path)
--> 343 return _load(spec)
344

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last)
in ()
----> 1 import tensorflow as tf

/opt/conda/lib/python3.6/site-packages/tensorflow/init.py in ()
22
23 # pylint: disable=wildcard-import
---> 24 from tensorflow.python import *
25 # pylint: enable=wildcard-import
26

/opt/conda/lib/python3.6/site-packages/tensorflow/python/init.py in ()
47 import numpy as np
48
---> 49 from tensorflow.python import pywrap_tensorflow
50
51 # Protocol buffers

/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py in ()
71 for some common reasons and solutions. Include the entire stack trace
72 above this error message when asking for help.""" % traceback.format_exc()
---> 73 raise ImportError(msg)
74
75 # pylint: enable=wildcard-import,g-import-not-at-top,unused-import,line-too-long

ImportError: Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/opt/conda/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/opt/conda/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

jlewi · 2018-02-12T23:00:57Z

This usually means GPUs aren't properly configured.

I'm assuming you are running on GKE?

Did you follow the GKE instructions to install the NVIDIA drivers via daemonset?
When you spawned the Jupyter server via JupyterHub did you specify GPUs in the resource requirements?

aronchick · 2018-02-13T00:41:14Z

It's an interesting question - is there something we can do it K6w to enforce running something when you run on cloud <X>. Eg if we detect you're running on GCP, you need to install/run x,y,z; if we detect you're running on Azure, you need to install/run t,u,v; if we can't detect, we run nothing.

…

On Mon, Feb 12, 2018 at 3:04 PM Jeremy Lewi ***@***.***> wrote: This usually means GPUs aren't properly configured. I'm assuming you are running on GKE? 1. Did you follow the GKE instructions <https://cloud.google.com/kubernetes-engine/docs/concepts/gpus#installing_drivers> to install the NVIDIA drivers via daemonset? 2. When you spawned the Jupyter server via JupyterHub did you specify GPUs in the resource requirements? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#226 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADIdQq1D04H2mSh4Gz-D0ga8-jUMAFzks5tUMKpgaJpZM4R_PQs> .

bkungfoo · 2018-02-13T21:08:45Z

This problem is likely due to not following the instructions here to deploy an nvidia driver daemon on the GKE cluster. Closing the issue.
https://cloud.google.com/kubernetes-engine/docs/concepts/gpus

@jessiezcc

/cc @jessiezcc /assign @jlewi remove extra new line remove extra new line

bkungfoo closed this as completed Feb 13, 2018

yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021

fix slice range (kubeflow#226)

7eeea12

elenzio9 pushed a commit to arrikto/kubeflow that referenced this issue Oct 31, 2022

Add chesu@ to drive-content-managers.members.txt (kubeflow#226)

2e140cb

/cc @jessiezcc /assign @jlewi remove extra new line remove extra new line

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JupyterHub fails to load image properly, but starts a notebook anyway #226

JupyterHub fails to load image properly, but starts a notebook anyway #226

bkungfoo commented Feb 9, 2018

jlewi commented Feb 10, 2018

jlewi commented Feb 12, 2018

bkungfoo commented Feb 12, 2018

jlewi commented Feb 12, 2018

aronchick commented Feb 13, 2018 via email

bkungfoo commented Feb 13, 2018

JupyterHub fails to load image properly, but starts a notebook anyway #226

JupyterHub fails to load image properly, but starts a notebook anyway #226

Comments

bkungfoo commented Feb 9, 2018

jlewi commented Feb 10, 2018

jlewi commented Feb 12, 2018

bkungfoo commented Feb 12, 2018

jlewi commented Feb 12, 2018

aronchick commented Feb 13, 2018 via email

bkungfoo commented Feb 13, 2018