Skip to content
This repository has been archived by the owner on Jun 25, 2022. It is now read-only.

gpu Docker image #10

Closed
MarkEdmondson1234 opened this issue Feb 12, 2019 · 23 comments
Closed

gpu Docker image #10

MarkEdmondson1234 opened this issue Feb 12, 2019 · 23 comments

Comments

@MarkEdmondson1234
Copy link
Contributor

It works in build and I can log in, but when I try to use keras for the toy example I get:

> library(keras)
> 
> mnist <- dataset_mnist()
ImportError: No module named keras
Use the install_keras() function to install the core Keras library
Error: Error loading Python module keras
@MarkEdmondson1234
Copy link
Contributor Author

Other info:

I launched via:

nvidia-docker run -d -p 80:8787 -e USER=gpu -e PASSWORD=gpu --name gpu2 rocker/gpu

(I have port 80 open for testing)

Log in ok to RStudio with gpu/gpu

I go to the terminal and can see the GPU is live:

gpu@5ec17f23ba93:~$ nvidia-smi
Tue Feb 12 23:36:21 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    23W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
gpu@5ec17f23ba93:~$

I tried to install_keras(tensorflow="gpu") again, it installed ok but when trying to run example I got:

> library(keras)
> 
> mnist <- dataset_mnist()
Using TensorFlow backend.
Error: ImportError: Traceback (most recent call last):
  File "/home/gpu/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/gpu/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/gpu/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error 

@MarkEdmondson1234
Copy link
Contributor Author

From what I remember from the R tensorflow install docs, I think you need CUDA version 9.0, not 9.2?

## CUDA Version
ENV CUDA_MAJOR_VERSION=9.2
ENV CUDA_MAJOR_VERSION_HYP=9.2
ENV CUDA_MINOR_VERSION=9.2.148-1
ENV NVIDIA_REQUIRE_CUDA="cuda>=9.2"

@MarkEdmondson1234
Copy link
Contributor Author

Yep from here: https://tensorflow.rstudio.com/tools/local_gpu.html

Note that it’s important to download cuDNN v7.0 for CUDA 9.0 (rather than CUDA 9.1 or 9.2, which may be the choice initially presented) as v7.0 is what TensorFlow is built against.

@cboettig
Copy link
Member

@MarkEdmondson1234 thanks, yeah, I was puzzling just over this too, though I believe @noamross has TF working with 9.2. I believe this can either be addressed with getting the right symlinks or possibly by getting pip to install the right tensorflow libs (i.e. those compiled against 9.2)? or maybe I'm wrong.

@seabbs may have looked at this as well.

@cboettig
Copy link
Member

Does look like circa Sept 2018 at least pip-based tensorflow-gpu was only built against 9.0. not sure if that's still the case but it seems so. Conda gives some suggestion that it's version supports 9.2, https://docs.anaconda.com/anaconda/user-guide/tasks/gpu-packages/ ? And there's some recommendations for building tensorflow from source. https://www.pytorials.com/how-to-install-tensorflow-gpu-with-cuda-9-2-for-python-on-ubuntu/

@MarkEdmondson1234
Copy link
Contributor Author

I tried to change the CUDA environment args to 9.0 which built but did not execute at runtime, so I guess more to do:

## CUDA Version
ENV CUDA_MAJOR_VERSION=9.0
ENV CUDA_MAJOR_VERSION_HYP=9.0
ENV CUDA_MINOR_VERSION=9.0.176-1
ENV NVIDIA_REQUIRE_CUDA="cuda==9.0"
nvidia-docker run -d -p 80:8787 -e USER=gpu -e PASSWORD=gpu --name gpu4 gcr.io/gcer-public/gpu-r:0e74959 
66c303d29bfe0d69f8cc3b259ed8bbb221c99fe4897f195c76998a1a68d4bd34
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux
.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , 
stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility
 --require=cuda==9.0 --pid=29606 /var/lib/docker/overlay2/56cfb8c992d62cdd223cc74ebd56b82e30e5043248e90660d7411058ca9c1d01/merged]\\\\n
nvidia-container-cli: requirement error: invalid expression\\\\n\\\"\"": unknown.

@cboettig
Copy link
Member

@MarkEdmondson1234 so it does look like building the different cuda versions will be necessary, or at least convenient, here.

I've taken the recipes for the official nvidia/cuda stack and overlaid them on the rocker images here: https://github.com/rocker-org/ml/tree/master/cuda

I've then put the machine learning side of things on top of this as a separate file here https://github.com/rocker-org/ml/tree/master/ml (currently only for 9.0, but I noticed the tensorflow tf-nightly-gpu are now built against cuda 10.0, so hope to support that soon too).

So rocker/ml (when it builds later tonight) is now my working candidate for a gpu-enabled image. clearly things are still in a bit in flux here, but I think moving in a good direction at least, more feedback and testing always welcome.

at least in my test, I'm able to build the current, rocker/cuda:9.0 based rocker/ml image and run the mnist example on my Nvidia GPU machine.

@MarkEdmondson1234
Copy link
Contributor Author

This will be awesome, thanks. My motivation is to be able to work through the Deep Learning with R book using a GCP deeplearning GPU VM - hope to add a template to googleComputeEngineR so folks can get started via:

library(googleComputeEngineR) # assume auto-auth, project settings etc

vm <- gce_vm(template = "gpu-ml-rstudio",
             name = "deeplearning-ml",
             username = "gpu", password = "gpu")

@MarkEdmondson1234
Copy link
Contributor Author

MarkEdmondson1234 commented Feb 13, 2019

I tried it again this morning with the new image, and I think it worked :D

nvidia-docker run -d -p 80:8787 -e USER=gpu -e PASSWORD=gpu --name ml rocker/ml
> library(keras)
> mnist <- dataset_mnist()
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 1s 0us/step
> train_images <- mnist$train$x
> train_labels <- mnist$train$y
> test_images <- mnist$test$x
> test_labels <- mnist$test$y
> 
> network <- keras_model_sequential() %>% 
+     layer_dense(units = 512, activation = "relu", input_shape = c(28*28)) %>% 
+     layer_dense(units = 10, activation = "softmax")
> 
> network %>% compile(
+     optimizer = "rmsprop",
+     loss = "categorical_crossentropy",
+     metrics = c("accuracy")
+ )
> 
> train_images <- array_reshape(train_images, c(60000, 28*28))
> train_images <- train_images / 255
> 
> test_images <- array_reshape(test_images, c(10000, 28*28))
> test_images <- test_images / 255
> 
> train_labels <- to_categorical(train_labels)
> test_labels <- to_categorical(test_labels)
> 
> network %>% fit(train_images, train_labels, epochs = 5, batch_size = 128)
2019-02-13 13:50:36.047596: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-02-13 13:50:36.905061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-13 13:50:36.905392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:00:04.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2019-02-13 13:50:36.905424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-13 13:50:37.288155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-13 13:50:37.288227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-02-13 13:50:37.288238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-02-13 13:50:37.288482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7055 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:04.0, compute capability: 6.1)
Epoch 1/5
60000/60000 [==============================] - 5s 78us/step - loss: 0.2563 - acc: 0.9259
Epoch 2/5
60000/60000 [==============================] - 2s 32us/step - loss: 0.1041 - acc: 0.9692
Epoch 3/5
60000/60000 [==============================] - 2s 31us/step - loss: 0.0679 - acc: 0.9795
Epoch 4/5
60000/60000 [==============================] - 2s 31us/step - loss: 0.0504 - acc: 0.9850
Epoch 5/5
60000/60000 [==============================] - 2s 31us/step - loss: 0.0381 - acc: 0.9889
> 

@MarkEdmondson1234
Copy link
Contributor Author

MarkEdmondson1234 commented Feb 13, 2019

just a heads up I'm still money-ing around with the rocker/ml image a bit, partly so I can get a sensible tag scheme where we can support both different versions of cuda and different versions of R. There will still be a rocker/ml:latest that does something reasonable though (e.g. probably cuda 9.0 and latest R for now). thoughts on a sane way to do this are welcome

Lots of moving parts on this one, as I see it:

  • R version
  • Rstudio installed or not
  • CUDA version
  • Python version (2.7 or 3.5?)
  • Tensorflow version
  • Which packages (keras, h20, xgboost, etc)

Whilst its nice to have xgboost/h20 in there for the future they are very heavy (40mins+ to build) and I appreciated only having Tensorflow/Keras in one image.

Also the Tensorflow version had this message about CPU

2019-02-13 13:50:36.047596: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA`

Perhaps safe to ignore but did tempt me to rebuild supporting the CPU features as well, at the very least its going to trigger questions.

But for me, the most flexible but with sensible default would be:

  • CUDA images with versions that are supported by Tensorflow, for those who want to build themselves
  • If possible, python 3 only
  • Tensorflow at most modern as possible given the CUDA dependency (Tensorflow 2.0 is coming soon)
  • CPU and GPU Tensorflow
  • An example of an image with everything loaded within R deeplearning, but slimmer versions with just say Tensorflow/keras or xgboost. I think for dev its nice to have all deeplearning images in one image but for production one per image makes more sense.
  • A plumber image as a common next step is to serve models over an API
  • RStudio installed for development, but only base-R for production

In all that, I don't think R versions will be most important unless its really bleeding edge as in my experience most R packages still work on updating R versions, whereas TF/Python breaks quickly so I would be inspecting those versions more closely.

As a suggestion then:

  • rocker/ml-cuda:{CUDA_VERSION}-{R_VERSION} - R and R+RStudio versions
  • rocker/ml-tf:{gpu|cpu}-{TF_VERSION} (Built on highest supporting CUDA, also installs keras)
  • rocker/ml:{gpu|cpu}-{all|xgboost|keras|h20} (Rstudio installed options, only R options)

Then we add plumber ourselves, not too much hassle.

@cboettig
Copy link
Member

@MarkEdmondson1234 thanks much for this, it is a huge help to bounce ideas off you here.

I like the three levels you outline here. For the base image names, I'm tempted to simplify them to: rocker/cuda, rocker/tf, and rocker/ml, even if on the GitHub side all the Dockerfiles live in rocker-org/ml.

For python version, I believe I have everything at python3 and python2.7 is not even installed on the system.

Re the tensorflow CPU message, I see that too, I believe that's up to the team that builds the tensorflow pip package / wheel thingy. (I looked at installing TF from source when we were trying to build on CUDA 9.2. and it looks hairy, particularly because the build system is super-interactive and so hard to see how to automate). Let's just assume they know what they are doing.

For the tags, I'm still struggling. I like the notion of gpu/cpu tags, I agree that access to CPU machine-learning stack that can bypass the NVIDIA stuff and be much lighter is a big win.

With regards to version-type tags, the whole rest of the R versioned stack pins versions by R's release dates -- e.g. you get the version of pandoc, RStudio, and all R packages that were current when said R version was last current. We've tried to promote the notion that a user can do something like rocker/verse:3.5.1 or rocker/binder:3.5.1 and know that stuff that worked on that image once will have a very very high probability of still working a year from now. It's not obvious how this promise translates to something like rocker/tf:1.12.0-gpu. I suppose that always means R 3.5.2? Also, is locking the tensorflow version like this implicit promise of locking the cuda version (i.e. tensorflow 1.12.0 also means CUDA 9.0?)

For the base rocker/ml image, I think it would still be easier to just have it include h2o, keras, and xgboost out of the box. In terms of space, all of these images would be much smaller if I get xgboost multi-GPU support to install from pip instead of from source, then I can drop some 2 GB of NVIDIA devel libs. I think adding too many options there makes it harder for new users to know where to start and is more for us to maintain, and an experienced user will always be able to build a smaller image custom fit to their own needs than we will ever be able to provide. The version issue raises it's head here too -- how do I specify which version of tensorflow this is getting? Or to put it another way -- a year from now, what tag will I use to reuild the rocker/ml image with the environment that rocker/ml is currently creating today?

Thanks again, really like your ideas on this and appreciate your feedback!

@noamross
Copy link

noamross commented Feb 14, 2019

I think we should stick with tags being r-version numbers, though of course only a limited set of R more recent R versions. Then we have a single cuda/tensorflow stack that is the latest that can be built against hardware compatible with major-version cuda, e.g.,

  • rocker/cuda9
  • rocker/cuda10
  • rocker/tf (Tensorflow, cpu only)
  • rocker/tf-gpu9 (The last/latest Tensorflow and R/Keras versions built against cuda 9.0)
  • rocker/tf-gpu10 (The last/latest Tensorflow and R/Keras versions built against cuda 10.0)
  • rocker/ml (TF, xgboost, h2o)
  • rocker/ml-gpu9
  • rocker/ml-gpu10

Practically, the keras R package version available at a given R release date will determine which R versions will be available in this stack. So, rocker/tf-gpu9:3.5.1 Will have the version of keras last available for 3.5.1, and the last version of tensorflow that Keras package supported. There will not be a rocker/tf-gpu10:3.5.1, because as of the end of 3.5.1 there wasn't a cuda-10 version of Tensorflow that was supported by the R Keras package, etc.

We can of course do some documentation on how to customize your stack.

@noamross
Copy link

all of these images would be much smaller if I get xgboost multi-GPU support to install from pip instead of from source

Could we try some multi-stage built magic here? Though having those libs might be good for users who want to install other software from source, like mxnet.

@cboettig
Copy link
Member

Thanks @noamross! After waffling and thinking it over, I agree about having the gpu/cpu part in the image name instead of the image tag. rocker/ml-gpu is more consistent with our existing use of version tags in this stack. In general overloading all this on tags seems more common, but at least on the package side, python is distinguishing gpu versions with this hyphen-gpu suffix in tensorflow-gpu, so that's consistent.

I am also tempted to just stick with one CUDA version per R version. Again looking to the precedent on the Python side, tensorflow-gpu versions 1.5 - 1.12 (current) are all on CUDA 9; CUDA 10 is only available in the nightlies (i.e. we could put it in on our devel tag). tensorflow-gpu v1.0 to 1.4 were CUDA 8, with about a year between the CUDA bumps. This also avoids having to create entirely new docker repos on hub to accommodate CUDA releases; instead we can just update the tags.

Regarding the multistage builds, yes, I think that's possible, but I also think I can already get away with a binary python wheel for xgboost and drop the cuda devel dependencies. I've now separated out those devel libs into a separate Dockerfile and build tensorflow directly on cuda-base.

So, the current directory structure looks like:

├── cuda
│   ├── base
│   │   └── Dockerfile
│   └── devel
│       └── Dockerfile
├── ml
│   ├── cpu
│   │   └── Dockerfile
│   └── gpu
│       └── Dockerfile
└── tf
    ├── cpu
    │   └── Dockerfile
    └── gpu
        └── Dockerfile
├── README.md
├── LICENSE
├── Makefile

Also, how do folks feel about going with rocker/tensorflow instead of rocker/tf ?

So, my new proposed image stack would be

  • rocker/cuda,
  • rocker/cuda-dev, (though no longer used as a base)
  • rocker/tensorflow,
  • rocker/tensorflow-gpu,
  • rocker/ml,
  • rocker/ml-gpu

with tags devel, latest == 3.5.2 on all images.

latest/3.5.2 would be CUDA 9, devel would use CUDA 10 (if I can even get CUDA 10 to build on debian stretch instead of ubuntu 18.04...)

@noamross
Copy link

I can already get away with a binary python wheel for xgboost and drop the cuda devel dependencies.

Does the xgboost R package work with the GPU this way? I'll test but I don't think so.

@noamross
Copy link

rocker/tensorflow works for me

@cboettig
Copy link
Member

Just following up that the instances as described above should all be built now.

Note that on the rocker/ml-gpu instance, xgboost is now built using multistage builds to pull in cuda dev libs. Images can be requested using either the tags latest or 3.5.2 (which give the identical image). Images need more widespread testing though! thanks!

@MarkEdmondson1234
Copy link
Contributor Author

Looks great, will start putting them through their paces.

@MarkEdmondson1234
Copy link
Contributor Author

I've got a template together, so hopefully some other folks will start testing the images as well https://cloudyr.github.io/googleComputeEngineR/articles/gpu.html

@cboettig
Copy link
Member

@MarkEdmondson1234 nice, thanks! I need to update the READMEs in this repo to give some better documentation on getting started (and better acknowledge the contributions from you, @noamross and others!), so reminding myself to link to that as well. thanks!

@MarkEdmondson1234
Copy link
Contributor Author

Another note for any documentation, but having this image now means you can train R models on GPU accelerated instances serverless, via Cloud ML which is super https://cloud.google.com/ml-engine/docs/using-containers - the demo at the end of this video shows using R in containers to train and test https://www.youtube.com/watch?v=XpNVixSN-Mg&feature=youtu.be and the repo with code is here https://github.com/gmikels/google-cloud-R-examples

@cboettig
Copy link
Member

Thanks @MarkEdmondson1234 , that's really cool!

@eitsupi
Copy link
Member

eitsupi commented Feb 26, 2022

I think this issue has been resolved, so I will close it.

@eitsupi eitsupi closed this as completed Feb 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants