Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile not working : ImportError: libcuda.so.1: cannot open shared object file: No such file or directory #31

Closed
francoisruty opened this issue Jun 29, 2019 · 11 comments

Comments

@francoisruty
Copy link

Hello, this project looks promising, but I can't get the Dockerfile to yield a successful build.

I run this:
sudo docker build --build-arg CUDA_BASE_VERSION=9.0 --build-arg CUDNN_VERSION=7.6.0.64 --build-arg UBUNTU_VERSION=16.04 --build-arg TENSORFLOW_VERSION=1.12.0 -t dirt .

and I get this during the build process:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

I think it's due to an incompatibility between tensorflow version and cuda version, but I've tried various tensorflow versions, with the same result.

Could you communicate the arguments that work with you? You wrote down the driver versions and OS version, but not the tensorflow version

thanks!

@pmh47
Copy link
Owner

pmh47 commented Jun 29, 2019

This is docker-specific; DIRT should work fine with those versions. @dboyle25-40124186, could you take a look please?

@DomhnallBoyle
Copy link

Apologies @francoisruty, I won't be able to properly take a look at this until Monday.

Going by the CUDA compatibility versions here, the tensorflow and CUDA versions you've given should be compatible.

I'm assuming you've got Nvidia-docker installed and have either edited the docker daemon configuration or ran the image with --runtime=nvidia to allow GPU access to the container?

@francoisruty
Copy link
Author

@DomhnallBoyle Hello guys thanks for the quick response, no worry on my side for waiting a few days, thanks for all your work :)
Yes I've installed nvidia-docker, configured docker properly etc (I'm running other tensorflow-gpu containers on the same machine).

I can't use the --runtime=nvidia flag for docker build (this flags for docker runs but not docker build). AFAIK it's not possible to use nvidia-docker for docker builds.

Did one of you managed to make a docker build? If yes I'm interested in the OS + the exact Dockerfile and docker build command, at this time I'm just trying to use docker to get the given samples up and running
(then I'm planning to edit the code to have it work with python 3 and Tensorflow 2.0 alpha, I'll gladly share my edits)

@DomhnallBoyle
Copy link

@francoisruty I was able to get this working. My host machine had Ubuntu 18.04 with CUDA 9.0 and I'm pretty sure the commands I used were exactly like yours above using tensorflow 1.12:

sudo docker build -t <image_name> --build-arg CUDA_BASE_VERSION=9.0 --build-arg CUDNN_VERSION=7.6.0.64 --build-arg UBUNTU_VERSION=16.04 --build-arg TENSORFLOW_VERSION=1.12.0 .

Could you just double check if you've added "default-runtime": "nvidia" in /etc/docker/daemon.json and restarted the daemon, I'm pretty sure this allows Nvidia runtime access for docker build.

Apologies if not, let me try this tomorrow and I'll get back to you

@francoisruty
Copy link
Author

Hello, I apologize, I had indeed enabled the nvidia runtime, but I wasn't aware you could set it as the default one, enabling one to have GPU available during docker builds!
I tried your command and it works!

I'll try the samples, and see how it goes with python 3 and tensorflow 2, I'll create other issues or PR if needed!

all the best

@francoisruty
Copy link
Author

francoisruty commented Jun 30, 2019

Just another comment, it doesn't work with CUDA 10.0 and tensorflow 1.14.0 (those 2 versions are supposed to be compatible), I get:

Installing collected packages: dirt
  Running setup.py install for dirt: started
    Running setup.py install for dirt: finished with status 'error'
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-sNYYMd-build/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-L9TnEM-record/install-record.txt --single-version-externally-managed --compile:
    /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
      warnings.warn(msg)
    running install
    running build
    -- The CXX compiler identification is GNU 5.4.0
    -- The CUDA compiler identification is NVIDIA 10.0.130
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    CMake Warning (dev) at /usr/local/share/cmake-3.14/Modules/FindOpenGL.cmake:275 (message):
      Policy CMP0072 is not set: FindOpenGL prefers GLVND by default when
      available.  Run "cmake --help-policy CMP0072" for policy details.  Use the
      cmake_policy command to set the policy and suppress this warning.
    
      FindOpenGL found both a legacy GL library:
    
        OPENGL_gl_LIBRARY: /usr/local/lib/x86_64-linux-gnu/libGL.so
    
      and GLVND libraries for OpenGL and GLX:
    
        OPENGL_opengl_LIBRARY: /usr/local/lib/x86_64-linux-gnu/libOpenGL.so
        OPENGL_glx_LIBRARY: /usr/local/lib/x86_64-linux-gnu/libGLX.so
    
      OpenGL_GL_PREFERENCE has not been set to "GLVND" or "LEGACY", so for
      compatibility with CMake 3.10 and below the legacy GL library will be used.
    Call Stack (most recent call first):
      CMakeLists.txt:5 (find_package)
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Found OpenGL: /usr/local/lib/x86_64-linux-gnu/libOpenGL.so
    CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
    Please set them or make sure they are set and tested correctly in the CMake files:
    Tensorflow_LIBRARY
        linked by target "rasterise" in directory /tmp/pip-sNYYMd-build/csrc
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-sNYYMd-build/build/CMakeFiles/CMakeOutput.log".
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-sNYYMd-build/setup.py", line 50, in <module>
        'Programming Language :: Python :: 3.7',
      File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
        dist.run_commands()
      File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
        cmd_obj.run()
      File "/usr/lib/python2.7/dist-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/usr/lib/python2.7/distutils/command/install.py", line 601, in run
        self.run_command('build')
      File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
        cmd_obj.run()
      File "/tmp/pip-sNYYMd-build/setup.py", line 24, in run
        build_csrc()
      File "/tmp/pip-sNYYMd-build/setup.py", line 18, in build_csrc
        subprocess.check_call(['cmake', os.path.join(base_path, 'csrc')], cwd=build_path)
      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-sNYYMd-build/csrc']' returned non-zero exit status 1

The important part seems to be this line:
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
Tensorflow_LIBRARY

@pmh47
Copy link
Owner

pmh47 commented Jun 30, 2019

That new error is due to a change made in tf 1.14; I've created #32 to track it and will fix soon.

@francoisruty
Copy link
Author

awesome, many thanks!

@pmh47
Copy link
Owner

pmh47 commented Jul 1, 2019

@francoisruty Support for tf 1.14 is now fixed; please open another ticket if there are further problems.

@francoisruty
Copy link
Author

awesome!

@DanyEle
Copy link

DanyEle commented Apr 16, 2020

I bumped into the same error when running a project requiring tensorflow-gpu 1.1.0 on an Azure 16.04 LTS Virtual Machine. The CUDA drivers displayed were shown to be N/A in nvidia-smi.

I solved it by running the following command and installing the nvcc toolkit:

sudo apt-get install nvidia-cuda-toolkit

Afterwards, in nvidia-smi, the CUDA drivers appeared to be the 10.1 version and my program successfully ran, with no error displayed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants