Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with nvidia driver version in Dockerfile #32

Open
K25801 opened this issue Mar 11, 2024 · 9 comments
Open

Error with nvidia driver version in Dockerfile #32

K25801 opened this issue Mar 11, 2024 · 9 comments

Comments

@K25801
Copy link

K25801 commented Mar 11, 2024

Hi, I'm trying to test the model, when I run test.py, an error occurs.
aab016cefd728ab772efb52cef8b64a
Then I tried nvidia-smi, I get:
33025e45b0145118442f2b808f9cc16
After getting into the driver's version, I found them incompatible:
9eb6f5190dbf611ffef700054e18529
As I'm using the Dockerfile, this issue only happens in this image and I believe it is a problem within the Dockerfile, as nvidia-smi is well-working on my host. Do you have any suggestions in this case?

@oneformer3d-contributor
Copy link
Collaborator

oneformer3d-contributor commented Mar 11, 2024

Hi @K25801 ,
Which GPU are you using? Is it compatible with cuda 11.6, we are using in the Dockerfile?

Also have you tried this advise from Dockefile?

# Feel free to skip nvidia-cuda-dev if minkowski installation is fine

@K25801
Copy link
Author

K25801 commented Mar 11, 2024

Hi, I think the problem is caused by
nvidia-cuda-dev
There will be no problem when skipping this line. But that causes the installation of MinkowskiEngine to fail.
Like this:
[11/21] /opt/conda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/include -I/workspace/MinkowskiEngine/src -I/workspace/MinkowskiEngine/src/3rdparty -I/opt/conda/include/python3.10 -c -c /workspace/MinkowskiEngine/src/interpolation_gpu.cu -o /workspace/MinkowskiEngine/build/temp.linux-x86_64-cpython-310/workspace/MinkowskiEngine/src/interpolation_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' --expt-relaxed-constexpr --expt-extended-lambda -O3 -Xcompiler=-fno-gnu-unique -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14 FAILED: /workspace/MinkowskiEngine/build/temp.linux-x86_64-cpython-310/workspace/MinkowskiEngine/src/interpolation_gpu.o /opt/conda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/include -I/workspace/MinkowskiEngine/src -I/workspace/MinkowskiEngine/src/3rdparty -I/opt/conda/include/python3.10 -c -c /workspace/MinkowskiEngine/src/interpolation_gpu.cu -o /workspace/MinkowskiEngine/build/temp.linux-x86_64-cpython-310/workspace/MinkowskiEngine/src/interpolation_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' --expt-relaxed-constexpr --expt-extended-lambda -O3 -Xcompiler=-fno-gnu-unique -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14 In file included from /workspace/MinkowskiEngine/src/allocators.cuh:36:0, from /workspace/MinkowskiEngine/src/kernel_region.hpp:40, from /workspace/MinkowskiEngine/src/coordinate_map.hpp:30, from /workspace/MinkowskiEngine/src/interpolation_gpu.cu:26: /opt/conda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory #include <cusolverDn.h> ^~~~~~~~~~~~~~ compilation terminated.

I'm using RTX A6000 and Quadro RTX 8000, here is my nvidia-smi infos
39162aafec38bcc10fbc79a6c43278f

Yes, I'm using the Dockerfile.

@oneformer3d-contributor
Copy link
Collaborator

fatal error: cusolverDn.h: No such file or directory

Yes this is exactly the error, why in some configurations we need to install extra cuda headers from nvidia-cuda-dev . However this package pulls tons of dependencies including another version of nvidia drivers or smth. You can may be try install libcusolver* instead of the whole nvidia-cuda-dev. I do not know the good solution as unfortunatly minkowskiengine is not supported for a couple of years now :(

@K25801
Copy link
Author

K25801 commented Mar 11, 2024

Thanks for your update! I immediately tried
apt install libcusolver*

After successfully installing, I tried to install MinkowskiEng again, but got the same error.
[11/21] /opt/conda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/include -I/tmp/pip-req-build-_1eo4q2g/src -I/tmp/pip-req-build-_1eo4q2g/src/3rdparty -I/opt/conda/include/python3.10 -c -c /tmp/pip-req-build-_1eo4q2g/src/broadcast_gpu.cu -o /tmp/pip-req-build-_1eo4q2g/build/temp.linux-x86_64-cpython-310/tmp/pip-req-build-_1eo4q2g/src/broadcast_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' --expt-relaxed-constexpr --expt-extended-lambda -O3 -Xcompiler=-fno-gnu-unique -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14 FAILED: /tmp/pip-req-build-_1eo4q2g/build/temp.linux-x86_64-cpython-310/tmp/pip-req-build-_1eo4q2g/src/broadcast_gpu.o /opt/conda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/include -I/tmp/pip-req-build-_1eo4q2g/src -I/tmp/pip-req-build-_1eo4q2g/src/3rdparty -I/opt/conda/include/python3.10 -c -c /tmp/pip-req-build-_1eo4q2g/src/broadcast_gpu.cu -o /tmp/pip-req-build-_1eo4q2g/build/temp.linux-x86_64-cpython-310/tmp/pip-req-build-_1eo4q2g/src/broadcast_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' --expt-relaxed-constexpr --expt-extended-lambda -O3 -Xcompiler=-fno-gnu-unique -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14 In file included from /tmp/pip-req-build-_1eo4q2g/src/allocators.cuh:36:0, from /tmp/pip-req-build-_1eo4q2g/src/kernel_region.hpp:40, from /tmp/pip-req-build-_1eo4q2g/src/coordinate_map.hpp:30, from /tmp/pip-req-build-_1eo4q2g/src/broadcast_gpu.cu:28: /opt/conda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory #include <cusolverDn.h> ^~~~~~~~~~~~~~ compilation terminated.

@K25801
Copy link
Author

K25801 commented Mar 11, 2024

If I use nvidia-cuda-dev, the installation will be fine, but nvidia-smi doesn't works

@xiongcs
Copy link

xiongcs commented Apr 16, 2024

2
Can such a GPU run through the program?Is cuda 11.6 the minimum requirement?
Because I always encounter such an error in training [runtime error: cuda error: the provided PTX was compiled with an unsupported toolkit.
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.】

@oneformer3d-contributor
Copy link
Collaborator

Cuda 11.2 should be fine, and rtx 3090 also. If you change cuda version just make sure you also changed it in precompiled versions of pytorch, mmcv, torch-scatter and spconv.

@iris0329
Copy link

I use rtx3090 and create docker with the command docker build -t my-tag, the dockerfile put at the current path. all is fine.

@jth5566
Copy link

jth5566 commented Jul 16, 2024

Before installing MinkowskiEngine, enter this command.
export CPATH=/usr/local/cuda/include:$CPATH

I referenced here : microsoft/DeepSpeed#2684 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants