Dockerfile not working : ImportError: libcuda.so.1: cannot open shared object file: No such file or directory #31

francoisruty · 2019-06-29T18:28:53Z

Hello, this project looks promising, but I can't get the Dockerfile to yield a successful build.

I run this:
sudo docker build --build-arg CUDA_BASE_VERSION=9.0 --build-arg CUDNN_VERSION=7.6.0.64 --build-arg UBUNTU_VERSION=16.04 --build-arg TENSORFLOW_VERSION=1.12.0 -t dirt .

and I get this during the build process:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

I think it's due to an incompatibility between tensorflow version and cuda version, but I've tried various tensorflow versions, with the same result.

Could you communicate the arguments that work with you? You wrote down the driver versions and OS version, but not the tensorflow version

thanks!

pmh47 · 2019-06-29T19:49:19Z

This is docker-specific; DIRT should work fine with those versions. @dboyle25-40124186, could you take a look please?

DomhnallBoyle · 2019-06-29T20:27:22Z

Apologies @francoisruty, I won't be able to properly take a look at this until Monday.

Going by the CUDA compatibility versions here, the tensorflow and CUDA versions you've given should be compatible.

I'm assuming you've got Nvidia-docker installed and have either edited the docker daemon configuration or ran the image with --runtime=nvidia to allow GPU access to the container?

francoisruty · 2019-06-30T00:33:36Z

@DomhnallBoyle Hello guys thanks for the quick response, no worry on my side for waiting a few days, thanks for all your work :)
Yes I've installed nvidia-docker, configured docker properly etc (I'm running other tensorflow-gpu containers on the same machine).

I can't use the --runtime=nvidia flag for docker build (this flags for docker runs but not docker build). AFAIK it's not possible to use nvidia-docker for docker builds.

Did one of you managed to make a docker build? If yes I'm interested in the OS + the exact Dockerfile and docker build command, at this time I'm just trying to use docker to get the given samples up and running
(then I'm planning to edit the code to have it work with python 3 and Tensorflow 2.0 alpha, I'll gladly share my edits)

DomhnallBoyle · 2019-06-30T12:15:04Z

@francoisruty I was able to get this working. My host machine had Ubuntu 18.04 with CUDA 9.0 and I'm pretty sure the commands I used were exactly like yours above using tensorflow 1.12:

sudo docker build -t <image_name> --build-arg CUDA_BASE_VERSION=9.0 --build-arg CUDNN_VERSION=7.6.0.64 --build-arg UBUNTU_VERSION=16.04 --build-arg TENSORFLOW_VERSION=1.12.0 .

Could you just double check if you've added "default-runtime": "nvidia" in /etc/docker/daemon.json and restarted the daemon, I'm pretty sure this allows Nvidia runtime access for docker build.

Apologies if not, let me try this tomorrow and I'll get back to you

francoisruty · 2019-06-30T12:35:31Z

Hello, I apologize, I had indeed enabled the nvidia runtime, but I wasn't aware you could set it as the default one, enabling one to have GPU available during docker builds!
I tried your command and it works!

I'll try the samples, and see how it goes with python 3 and tensorflow 2, I'll create other issues or PR if needed!

all the best

francoisruty · 2019-06-30T13:23:02Z

Just another comment, it doesn't work with CUDA 10.0 and tensorflow 1.14.0 (those 2 versions are supposed to be compatible), I get:

Installing collected packages: dirt
  Running setup.py install for dirt: started
    Running setup.py install for dirt: finished with status 'error'
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-sNYYMd-build/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-L9TnEM-record/install-record.txt --single-version-externally-managed --compile:
    /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
      warnings.warn(msg)
    running install
    running build
    -- The CXX compiler identification is GNU 5.4.0
    -- The CUDA compiler identification is NVIDIA 10.0.130
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    CMake Warning (dev) at /usr/local/share/cmake-3.14/Modules/FindOpenGL.cmake:275 (message):
      Policy CMP0072 is not set: FindOpenGL prefers GLVND by default when
      available.  Run "cmake --help-policy CMP0072" for policy details.  Use the
      cmake_policy command to set the policy and suppress this warning.
    
      FindOpenGL found both a legacy GL library:
    
        OPENGL_gl_LIBRARY: /usr/local/lib/x86_64-linux-gnu/libGL.so
    
      and GLVND libraries for OpenGL and GLX:
    
        OPENGL_opengl_LIBRARY: /usr/local/lib/x86_64-linux-gnu/libOpenGL.so
        OPENGL_glx_LIBRARY: /usr/local/lib/x86_64-linux-gnu/libGLX.so
    
      OpenGL_GL_PREFERENCE has not been set to "GLVND" or "LEGACY", so for
      compatibility with CMake 3.10 and below the legacy GL library will be used.
    Call Stack (most recent call first):
      CMakeLists.txt:5 (find_package)
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Found OpenGL: /usr/local/lib/x86_64-linux-gnu/libOpenGL.so
    CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
    Please set them or make sure they are set and tested correctly in the CMake files:
    Tensorflow_LIBRARY
        linked by target "rasterise" in directory /tmp/pip-sNYYMd-build/csrc
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-sNYYMd-build/build/CMakeFiles/CMakeOutput.log".
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-sNYYMd-build/setup.py", line 50, in <module>
        'Programming Language :: Python :: 3.7',
      File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
        dist.run_commands()
      File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
        cmd_obj.run()
      File "/usr/lib/python2.7/dist-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/usr/lib/python2.7/distutils/command/install.py", line 601, in run
        self.run_command('build')
      File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
        cmd_obj.run()
      File "/tmp/pip-sNYYMd-build/setup.py", line 24, in run
        build_csrc()
      File "/tmp/pip-sNYYMd-build/setup.py", line 18, in build_csrc
        subprocess.check_call(['cmake', os.path.join(base_path, 'csrc')], cwd=build_path)
      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-sNYYMd-build/csrc']' returned non-zero exit status 1

The important part seems to be this line:
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
Tensorflow_LIBRARY

pmh47 · 2019-06-30T13:39:58Z

That new error is due to a change made in tf 1.14; I've created #32 to track it and will fix soon.

francoisruty · 2019-06-30T13:46:22Z

awesome, many thanks!

pmh47 · 2019-07-01T09:57:33Z

@francoisruty Support for tf 1.14 is now fixed; please open another ticket if there are further problems.

francoisruty · 2019-07-01T10:15:10Z

awesome!

DanyEle · 2020-04-16T10:48:53Z

I bumped into the same error when running a project requiring tensorflow-gpu 1.1.0 on an Azure 16.04 LTS Virtual Machine. The CUDA drivers displayed were shown to be N/A in nvidia-smi.

I solved it by running the following command and installing the nvcc toolkit:

sudo apt-get install nvidia-cuda-toolkit

Afterwards, in nvidia-smi, the CUDA drivers appeared to be the 10.1 version and my program successfully ran, with no error displayed.

francoisruty closed this as completed Jun 30, 2019

vineethbabu mentioned this issue Sep 12, 2019

segmentation fault for square_test.py #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerfile not working : ImportError: libcuda.so.1: cannot open shared object file: No such file or directory #31

Dockerfile not working : ImportError: libcuda.so.1: cannot open shared object file: No such file or directory #31

francoisruty commented Jun 29, 2019

pmh47 commented Jun 29, 2019

DomhnallBoyle commented Jun 29, 2019

francoisruty commented Jun 30, 2019

DomhnallBoyle commented Jun 30, 2019

francoisruty commented Jun 30, 2019

francoisruty commented Jun 30, 2019 •

edited

pmh47 commented Jun 30, 2019

francoisruty commented Jun 30, 2019

pmh47 commented Jul 1, 2019

francoisruty commented Jul 1, 2019

DanyEle commented Apr 16, 2020

Dockerfile not working : ImportError: libcuda.so.1: cannot open shared object file: No such file or directory #31

Dockerfile not working : ImportError: libcuda.so.1: cannot open shared object file: No such file or directory #31

Comments

francoisruty commented Jun 29, 2019

pmh47 commented Jun 29, 2019

DomhnallBoyle commented Jun 29, 2019

francoisruty commented Jun 30, 2019

DomhnallBoyle commented Jun 30, 2019

francoisruty commented Jun 30, 2019

francoisruty commented Jun 30, 2019 • edited

pmh47 commented Jun 30, 2019

francoisruty commented Jun 30, 2019

pmh47 commented Jul 1, 2019

francoisruty commented Jul 1, 2019

DanyEle commented Apr 16, 2020

francoisruty commented Jun 30, 2019 •

edited