-
Notifications
You must be signed in to change notification settings - Fork 13
gpu Docker image #10
Comments
Other info: I launched via:
(I have port 80 open for testing) Log in ok to RStudio with gpu/gpu I go to the terminal and can see the GPU is live:
I tried to install_keras(tensorflow="gpu") again, it installed ok but when trying to run example I got: > library(keras)
>
> mnist <- dataset_mnist()
Using TensorFlow backend.
Error: ImportError: Traceback (most recent call last):
File "/home/gpu/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/gpu/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/gpu/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error |
From what I remember from the R tensorflow install docs, I think you need CUDA version 9.0, not 9.2?
|
Yep from here: https://tensorflow.rstudio.com/tools/local_gpu.html
|
@MarkEdmondson1234 thanks, yeah, I was puzzling just over this too, though I believe @noamross has TF working with 9.2. I believe this can either be addressed with getting the right symlinks or possibly by getting pip to install the right tensorflow libs (i.e. those compiled against 9.2)? or maybe I'm wrong. @seabbs may have looked at this as well. |
Does look like circa Sept 2018 at least pip-based tensorflow-gpu was only built against 9.0. not sure if that's still the case but it seems so. Conda gives some suggestion that it's version supports 9.2, https://docs.anaconda.com/anaconda/user-guide/tasks/gpu-packages/ ? And there's some recommendations for building tensorflow from source. https://www.pytorials.com/how-to-install-tensorflow-gpu-with-cuda-9-2-for-python-on-ubuntu/ |
I tried to change the CUDA environment args to 9.0 which built but did not execute at runtime, so I guess more to do:
|
@MarkEdmondson1234 so it does look like building the different cuda versions will be necessary, or at least convenient, here. I've taken the recipes for the official nvidia/cuda stack and overlaid them on the rocker images here: https://github.com/rocker-org/ml/tree/master/cuda I've then put the machine learning side of things on top of this as a separate file here https://github.com/rocker-org/ml/tree/master/ml (currently only for 9.0, but I noticed the tensorflow tf-nightly-gpu are now built against cuda 10.0, so hope to support that soon too). So at least in my test, I'm able to build the current, |
This will be awesome, thanks. My motivation is to be able to work through the Deep Learning with R book using a GCP deeplearning GPU VM - hope to add a template to
|
I tried it again this morning with the new image, and I think it worked :D
|
Lots of moving parts on this one, as I see it:
Whilst its nice to have xgboost/h20 in there for the future they are very heavy (40mins+ to build) and I appreciated only having Tensorflow/Keras in one image. Also the Tensorflow version had this message about CPU
Perhaps safe to ignore but did tempt me to rebuild supporting the CPU features as well, at the very least its going to trigger questions. But for me, the most flexible but with sensible default would be:
In all that, I don't think R versions will be most important unless its really bleeding edge as in my experience most R packages still work on updating R versions, whereas TF/Python breaks quickly so I would be inspecting those versions more closely. As a suggestion then:
Then we add plumber ourselves, not too much hassle. |
@MarkEdmondson1234 thanks much for this, it is a huge help to bounce ideas off you here. I like the three levels you outline here. For the base image names, I'm tempted to simplify them to: For python version, I believe I have everything at python3 and python2.7 is not even installed on the system. Re the tensorflow CPU message, I see that too, I believe that's up to the team that builds the tensorflow pip package / wheel thingy. (I looked at installing TF from source when we were trying to build on CUDA 9.2. and it looks hairy, particularly because the build system is super-interactive and so hard to see how to automate). Let's just assume they know what they are doing. For the tags, I'm still struggling. I like the notion of With regards to version-type tags, the whole rest of the R versioned stack pins versions by R's release dates -- e.g. you get the version of pandoc, RStudio, and all R packages that were current when said R version was last current. We've tried to promote the notion that a user can do something like For the base Thanks again, really like your ideas on this and appreciate your feedback! |
I think we should stick with tags being r-version numbers, though of course only a limited set of R more recent R versions. Then we have a single cuda/tensorflow stack that is the latest that can be built against hardware compatible with major-version cuda, e.g.,
Practically, the keras R package version available at a given R release date will determine which R versions will be available in this stack. So, We can of course do some documentation on how to customize your stack. |
Could we try some multi-stage built magic here? Though having those libs might be good for users who want to install other software from source, like |
Thanks @noamross! After waffling and thinking it over, I agree about having the I am also tempted to just stick with one CUDA version per R version. Again looking to the precedent on the Python side, Regarding the multistage builds, yes, I think that's possible, but I also think I can already get away with a binary python wheel for xgboost and drop the cuda devel dependencies. I've now separated out those devel libs into a separate Dockerfile and build tensorflow directly on cuda-base. So, the current directory structure looks like:
Also, how do folks feel about going with So, my new proposed image stack would be
with tags
|
Does the xgboost R package work with the GPU this way? I'll test but I don't think so. |
rocker/tensorflow works for me |
Just following up that the instances as described above should all be built now. Note that on the |
Looks great, will start putting them through their paces. |
I've got a template together, so hopefully some other folks will start testing the images as well https://cloudyr.github.io/googleComputeEngineR/articles/gpu.html |
@MarkEdmondson1234 nice, thanks! I need to update the READMEs in this repo to give some better documentation on getting started (and better acknowledge the contributions from you, @noamross and others!), so reminding myself to link to that as well. thanks! |
Another note for any documentation, but having this image now means you can train R models on GPU accelerated instances serverless, via Cloud ML which is super https://cloud.google.com/ml-engine/docs/using-containers - the demo at the end of this video shows using R in containers to train and test https://www.youtube.com/watch?v=XpNVixSN-Mg&feature=youtu.be and the repo with code is here https://github.com/gmikels/google-cloud-R-examples |
Thanks @MarkEdmondson1234 , that's really cool! |
I think this issue has been resolved, so I will close it. |
It works in build and I can log in, but when I try to use keras for the toy example I get:
The text was updated successfully, but these errors were encountered: