Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spconv v1.1 unknown device type #70

Closed
MartinHahner opened this issue May 19, 2020 · 11 comments
Closed

spconv v1.1 unknown device type #70

MartinHahner opened this issue May 19, 2020 · 11 comments

Comments

@MartinHahner
Copy link
Contributor

MartinHahner commented May 19, 2020

Did anyone run into this problem and was able to solve it?

It only occurs when training SECOND and PartA^2Net
because they are utilizing spconv in their RPN backbone,
training PointPillars, on the other hand, works fine because it's RPN (PointPillarsScatter)
does not utilize spconv.

Is it a phenomenon only occurring with spconv v1.1 ?

Traceback (most recent call last):
  File "train.py", line 215, in <module>
    main()
  File "train.py", line 210, in main
    max_ckpt_save_num=arguments.max_ckpt_save_num)
  File "/scratch_net/hox/mhahner/repositories/PCDet/tools/train_utils/train_utils.py", line 80, in train_model
    leave_pbar=(cur_epoch + 1 == total_epochs)
  File "/scratch_net/hox/mhahner/repositories/PCDet/tools/train_utils/train_utils.py", line 36, in train_one_epoch
    loss, tb_dict, disp_dict = model_func(model, batch)
  File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/__init__.py", line 25, in model_func
    ret_dict, tb_dict, disp_dict = model(input_dict)
  File "/home/mhahner/scratch/apps/anaconda3/envs/apex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/detectors/PartA2_net.py", line 106, in forward
    rpn_ret_dict = self.forward_rpn(**input_dict)
  File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/detectors/PartA2_net.py", line 33, in forward_rpn
    **kwargs
  File "/home/mhahner/scratch/apps/anaconda3/envs/apex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/rpn/rpn_unet.py", line 471, in forward
    x = self.conv_input(input_sp_tensor)
  File "/home/mhahner/scratch/apps/anaconda3/envs/apex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mhahner/scratch/apps/anaconda3/envs/apex/lib/python3.7/site-packages/spconv/modules.py", line 130, in forward
    input = module(input)
  File "/home/mhahner/scratch/apps/anaconda3/envs/apex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mhahner/scratch/apps/anaconda3/envs/apex/lib/python3.7/site-packages/spconv/conv.py", line 177, in forward
    use_hash=self.use_hash)
  File "/home/mhahner/scratch/apps/anaconda3/envs/apex/lib/python3.7/site-packages/spconv/ops.py", line 93, in get_indice_pairs
    stride, padding, dilation, out_padding, int(subm), int(transpose), int(use_hash))
ValueError: /scratch_net/hox/mhahner/repositories/spconv/include/spconv/spconv_ops.h 132
false assert faild. unknown device type
# packages in environment at /home/mhahner/scratch/apps/anaconda3/envs/apex:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
bzip2                     1.0.8                h516909a_2    conda-forge
ca-certificates           2020.1.1                      0
certifi                   2020.4.5.1               py37_0
cmake                     3.17.0               h28c56e5_0    conda-forge
coloredlogs               14.0                     pypi_0    pypi
cudatoolkit               10.1.243             h6bb024c_0
cudatoolkit-dev           10.1.243             h516909a_3    conda-forge
cudnn                     7.6.5                cuda10.1_0
cycler                    0.10.0                   pypi_0    pypi
decorator                 4.4.2                    pypi_0    pypi
easydict                  1.9                      pypi_0    pypi
expat                     2.2.9                he1b5a44_2    conda-forge
humanfriendly             8.2                      pypi_0    pypi
imagecodecs               2020.2.18                pypi_0    pypi
imageio                   2.8.0                    pypi_0    pypi
kiwisolver                1.2.0                    pypi_0    pypi
krb5                      1.17.1               h2fd8d38_0    conda-forge
libblas                   3.8.0               16_openblas    conda-forge
libcblas                  3.8.0               16_openblas    conda-forge
libcurl                   7.69.1               hf7181ac_0    conda-forge
libedit                   3.1.20181209         hc058e9b_0
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.5.0                hdf63c60_6    conda-forge
liblapack                 3.8.0               16_openblas    conda-forge
libopenblas               0.3.9                h5ec1e0e_0    conda-forge
libssh2                   1.8.2                h22169c7_2    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libuv                     1.34.0               h516909a_0    conda-forge
llvm-openmp               10.0.0               hc9558a2_0    conda-forge
llvmlite                  0.32.1                   pypi_0    pypi
matplotlib                3.2.1                    pypi_0    pypi
mkl                       2020.1                      217    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
networkx                  2.4                      pypi_0    pypi
ninja                     1.10.0               hc9558a2_0    conda-forge
numba                     0.49.1                   pypi_0    pypi
numpy                     1.18.4           py37h8960a57_0    conda-forge
openssl                   1.1.1g               h7b6447c_0
pcdet                     0.1.0+2244be4             dev_0    <develop>
pillow                    7.1.2                    pypi_0    pypi
pip                       20.1               pyh9f0ad1d_0    conda-forge
protobuf                  3.11.3                   pypi_0    pypi
pyparsing                 2.4.7                    pypi_0    pypi
python                    3.7.5                h0371630_0
python-dateutil           2.8.1                    pypi_0    pypi
python_abi                3.7                     1_cp37m    conda-forge
pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
pywavelets                1.1.1                    pypi_0    pypi
pyyaml                    5.3.1                    pypi_0    pypi
readline                  7.0               hf8c457e_1001    conda-forge
rhash                     1.3.6             h14c3975_1001    conda-forge
scikit-image              0.17.2                   pypi_0    pypi
scipy                     1.4.1                    pypi_0    pypi
setuptools                46.3.0           py37hc8dfbb8_0    conda-forge
six                       1.14.0                   pypi_0    pypi
spconv                    1.1                      pypi_0    pypi
sqlite                    3.31.1               h7b6447c_0
tensorboardx              2.0                      pypi_0    pypi
tifffile                  2020.5.11                pypi_0    pypi
tk                        8.6.10               hed695b0_0    conda-forge
tqdm                      4.46.0                   pypi_0    pypi
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h516909a_0    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge
@BalajiGobinathan
Copy link

Hi MartinHahner88,
I am also facing the same issue with SECOND network and the test_conv.py also throws the same error.
I am stuck with this error for a month..

Was your build spconv 1.1 successful (python setup.py bdist_wheel) and could you list out the versions that you were using for the build especially the version of CUDA and torch you were using for building spconv1.1

Thanks
Balaji Gobinathan

@MartinHahner
Copy link
Contributor Author

Yes, my build of spconv v1.1 was successful.
Here are the instructions for how I was able to build spconv v1.1 (but not spconv v1.0):

  • install conda
  • then execute the following commands
    Note: the name apex can be replaced by any other name you wish throughout all commands
  • conda create --name apex python=3.7.5 pytorch=1.4.0 cudatoolkit=10.1.243 cudatoolkit-dev=10.1.243 cmake --channel pytorch --channel=conda-forge
  • conda activate apex
  • conda install cudnn
  • git clone https://github.com/traveller59/spconv --recursive
  • cd spconv
  • if necessary, apply this fix to setup.py
  • CUDA_ROOT=<path_to_your_conda_installation>/envs/apex python setup.py bdist_wheel
    Note: <path_to_your_conda_installation> has to be replaced by your actual installation path
  • cd dist/
  • pip install *

When I try to build spconv v1.0 (commit 8da6f96) I am stuck with this error message:

running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/spconv
copying spconv/functional.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/ops.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/pool.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/modules.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/conv.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/test_utils.py -> build/lib.linux-x86_64-3.7/spconv
copying spconv/__init__.py -> build/lib.linux-x86_64-3.7/spconv
creating build/lib.linux-x86_64-3.7/spconv/utils
copying spconv/utils/__init__.py -> build/lib.linux-x86_64-3.7/spconv/utils
running build_ext
/scratch_net/hox/mhahner/repositories/spconv_v1.0/build/lib.linux-x86_64-3.7
Release
-- The CXX compiler identification is GNU 6.3.0
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv/bin/nvcc
-- Check for working CUDA compiler: /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv (found version "10.1") 
-- Caffe2: CUDA detected: 10.1
-- Caffe2: CUDA nvcc is: /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv/bin/nvcc
-- Caffe2: CUDA toolkit directory: /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv
-- Caffe2: Header version is: 10.1
-- Found CUDNN: /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv/lib/libcudnn.so  
-- Found cuDNN: v7.6.5  (include: /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv/include, library: /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv/lib/libcudnn.so)
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.0+PTX;7.5;7.5+PTX
-- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75
CMake Warning (dev) at /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:272 (message):
  The package name passed to `find_package_handle_standard_args` (torch) does
  not match the name of the calling package (Torch).  This can lead to
  problems in calling code that expects `find_package` result variables
  (e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
  /home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:122 (find_package_handle_standard_args)
  CMakeLists.txt:23 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found torch: /home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/lib/libtorch.so  
-- Found PythonInterp: /scratch_net/hox/mhahner/apps/anaconda3/envs/spconv/bin/python3.7 (found suitable version "3.7.5", minimum required is "3.7") 
-- Found PythonLibs: /home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/libpython3.7m.so
-- Performing Test HAS_CPP14_FLAG
-- Performing Test HAS_CPP14_FLAG - Success
-- pybind11 v2.3.dev0
-- Configuring done
-- Generating done
-- Build files have been written to: /scratch_net/hox/mhahner/repositories/spconv_v1.0/build/temp.linux-x86_64-3.7
Scanning dependencies of target spconv_nms
Scanning dependencies of target spconv
[  7%] Building CUDA object src/utils/CMakeFiles/spconv_nms.dir/nms.cu.o
[ 15%] Building CXX object src/spconv/CMakeFiles/spconv.dir/all.cc.o
[ 23%] Building CXX object src/spconv/CMakeFiles/spconv.dir/indice.cc.o
[ 30%] Building CUDA object src/spconv/CMakeFiles/spconv.dir/indice.cu.o
[ 38%] Linking CUDA shared library ../../../lib.linux-x86_64-3.7/spconv/libspconv_nms.so
[ 38%] Built target spconv_nms
[ 46%] Building CXX object src/spconv/CMakeFiles/spconv.dir/reordering.cc.o
/scratch_net/hox/mhahner/repositories/spconv_v1.0/src/spconv/all.cc:20:91: error: no matching function for call to ‘torch::jit::RegisterOperators::RegisterOperators(const char [28], <unresolved overloaded function type>)’
     torch::jit::RegisterOperators("spconv::get_indice_pairs_2d", &spconv::getIndicePair<2>)
                                                                                           ^
In file included from /home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/include/torch/script.h:6:0,
                 from /scratch_net/hox/mhahner/repositories/spconv_v1.0/include/spconv/pool_ops.h:20,
                 from /scratch_net/hox/mhahner/repositories/spconv_v1.0/src/spconv/all.cc:16:
/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/include/torch/csrc/jit/custom_operator.h:20:3: note: candidate: torch::jit::RegisterOperators::RegisterOperators(std::vector<torch::jit::Operator>)
   RegisterOperators(std::vector<Operator> operators) {
   ^~~~~~~~~~~~~~~~~
/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/include/torch/csrc/jit/custom_operator.h:20:3: note:   candidate expects 1 argument, 2 provided
/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/include/torch/csrc/jit/custom_operator.h:17:3: note: candidate: constexpr torch::jit::RegisterOperators::RegisterOperators()
   RegisterOperators() = default;
   ^~~~~~~~~~~~~~~~~
/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/include/torch/csrc/jit/custom_operator.h:17:3: note:   candidate expects 0 arguments, 2 provided
/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/include/torch/csrc/jit/custom_operator.h:16:18: note: candidate: constexpr torch::jit::RegisterOperators::RegisterOperators(const torch::jit::RegisterOperators&)
 struct TORCH_API RegisterOperators {
                  ^~~~~~~~~~~~~~~~~
/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/include/torch/csrc/jit/custom_operator.h:16:18: note:   candidate expects 1 argument, 2 provided
/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/include/torch/csrc/jit/custom_operator.h:16:18: note: candidate: constexpr torch::jit::RegisterOperators::RegisterOperators(torch::jit::RegisterOperators&&)
/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/include/torch/csrc/jit/custom_operator.h:16:18: note:   candidate expects 1 argument, 2 provided
src/spconv/CMakeFiles/spconv.dir/build.make:79: recipe for target 'src/spconv/CMakeFiles/spconv.dir/all.cc.o' failed
make[2]: *** [src/spconv/CMakeFiles/spconv.dir/all.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
Scanning dependencies of target spconv_utils
[ 53%] Building CXX object src/utils/CMakeFiles/spconv_utils.dir/all.cc.o
Segmentation fault
src/spconv/CMakeFiles/spconv.dir/build.make:105: recipe for target 'src/spconv/CMakeFiles/spconv.dir/indice.cu.o' failed
make[2]: *** [src/spconv/CMakeFiles/spconv.dir/indice.cu.o] Error 139
CMakeFiles/Makefile2:154: recipe for target 'src/spconv/CMakeFiles/spconv.dir/all' failed
make[1]: *** [src/spconv/CMakeFiles/spconv.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 61%] Linking CXX shared library ../../../lib.linux-x86_64-3.7/spconv/spconv_utils.cpython-37m-x86_64-linux-gnu.so
[ 61%] Built target spconv_utils
Makefile:146: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
  File "setup.py", line 98, in <module>
    zip_safe=False,
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/setuptools/__init__.py", line 144, in setup
    return distutils.core.setup(**attrs)
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 223, in run
    self.run_command('build')
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "setup.py", line 39, in run
    self.build_extension(ext)
  File "setup.py", line 82, in build_extension
    subprocess.check_call(['cmake', '--build', '.'] + build_args, cwd=self.build_temp)
  File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-j4']' returned non-zero exit status 2.

@BalajiGobinathan
Copy link

BalajiGobinathan commented May 19, 2020

Hi ,
I tried to install conda using the following instructions.
By the way I am using google colab..

!wget -c https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
!chmod +x Anaconda3-5.1.0-Linux-x86_64.sh
!bash ./Anaconda3-5.1.0-Linux-x86_64.sh -b -f -p /usr/local
sys.path.append('/usr/local/lib/python3.6/site-packages')

and then I ran the first command you gave
it throws the following error:

ERROR conda.core.link:_execute(481): An error occurred while installing package 'conda-forge::cudatoolkit-dev-10.1.243-h516909a_3'

Could you please help?

And could you also please share the build log for spconv1.1 that was successful so that we could check if the following line has detected the GPU..

-- Automatic GPU detection failed. Building for common architectures.

Thanks

@MartinHahner
Copy link
Contributor Author

MartinHahner commented May 19, 2020

Hi ,
I tried to install conda using the following instructions.
By the way I am using google colab..

!wget -c https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
!chmod +x Anaconda3-5.1.0-Linux-x86_64.sh
!bash ./Anaconda3-5.1.0-Linux-x86_64.sh -b -f -p /usr/local
sys.path.append('/usr/local/lib/python3.6/site-packages')

and then I ran the first command you gave
it throws the following error:

ERROR conda.core.link:_execute(481): An error occurred while installing package 'conda-forge::cudatoolkit-dev-10.1.243-h516909a_3'

Could you please help?

And could you also please share the build log for spconv1.1 that was successful so that we could check if the following line has detected the GPU..

-- Automatic GPU detection failed. Building for common architectures.

Thanks

I think your issue is google colab related and I think this repository also is not really meant to be used by google colab, but I don't have much experience regarding that.

Did you make sure that you are connected to a GPU under
Runtime > Change runtime type > GPU
and then hit "connect" on the top right of your iPython notebook?

@fregu856
Copy link

@MartinHahner88 I got exactly the same error when trying to use spconv 1.1 (in a different project that originally used spconv 1.0).

I also had trouble installing spconv 1.0, but switching to pytorch 1.1.0 solved that issue for me.

@MartinHahner
Copy link
Contributor Author

Unfortunately downgrading to pytorch 1.1.0 did not resolve the issue for me,
then I ran into a bunch of other CUDA related issues.

But instead, I was able to build spconv v1.0 with a fairly up-to-date conda environment, namely:

  • python 3.7.6
  • pytorch 1.4.0
  • CUDA 10.1.243
  • cudnn 7.6.5

For more details, see here.

But now, after downgrading to spconv v1.0, I can only successfully train PointPillars and SECOND,
but PartA^2Net still crashes with:

Traceback (most recent call last):
File "train.py", line 215, in <module>
main()
File "train.py", line 210, in main
max_ckpt_save_num=arguments.max_ckpt_save_num)
File "/scratch_net/hox/mhahner/repositories/PCDet/tools/train_utils/train_utils.py", line 80, in train_model
leave_pbar=(cur_epoch + 1 == total_epochs)
File "/scratch_net/hox/mhahner/repositories/PCDet/tools/train_utils/train_utils.py", line 36, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/__init__.py", line 25, in model_func
ret_dict, tb_dict, disp_dict = model(input_dict)
File "/home/mhahner/scratch/apps/anaconda3/envs/spconv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/detectors/PartA2_net.py", line 112, in forward
batch_size, voxel_centers, coords, rpn_ret_dict, input_dict
File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/detectors/PartA2_net.py", line 98, in forward_rcnn
rcnn_ret_dict = self.rcnn_net.forward(rcnn_input_dict)
File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/rcnn/partA2_rcnn_net.py", line 323, in forward
targets_dict = self.assign_targets(batch_size, rcnn_dict)
File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/rcnn/partA2_rcnn_net.py", line 27, in assign_targets
targets_dict = proposal_target_layer(rcnn_dict, roi_sampler_cfg=self.rcnn_target_config)
File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/model_utils/proposal_target_layer.py", line 14, in proposal_target_layer
sample_rois_for_rcnn(rois, gt_boxes, roi_raw_scores, roi_labels, roi_sampler_cfg)
File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/model_utils/proposal_target_layer.py", line 82, in sample_rois_for_rcnn
cur_gt[:, 0:7], cur_gt_labels)
File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/models/model_utils/proposal_target_layer.py", line 183, in get_maxiou3d_with_same_class
iou3d = iou3d_nms_utils.boxes_iou3d_gpu(cur_roi, cur_gt) # (M, N)
File "/scratch_net/hox/mhahner/repositories/PCDet/pcdet/ops/iou3d_nms/iou3d_nms_utils.py", line 47, in boxes_iou3d_gpu
overlaps_h = torch.clamp(min_of_max - max_of_min, min=0)
RuntimeError: CUDA error: no kernel image is available for execution on the device

My conda environment looks like this:

# packages in environment at /home/mhahner/scratch/apps/anaconda3/envs/spconv:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
beautifulsoup4            4.9.1                    pypi_0    pypi
bzip2                     1.0.8                h516909a_2    conda-forge
ca-certificates           2020.1.1                      0
cachetools                4.1.0                    pypi_0    pypi
certifi                   2020.4.5.1               py37_0
chardet                   3.0.4                    pypi_0    pypi
cmake                     3.17.0               h28c56e5_0    conda-forge
coloredlogs               14.0                     pypi_0    pypi
cudatoolkit               10.1.243             h6bb024c_0
cudatoolkit-dev           10.1.243             h516909a_3    conda-forge
cudnn                     7.6.5                cuda10.1_0
cycler                    0.10.0                   pypi_0    pypi
decorator                 4.4.2                    pypi_0    pypi
easydict                  1.9                      pypi_0    pypi
expat                     2.2.9                he1b5a44_2    conda-forge
google                    2.0.3                    pypi_0    pypi
google-auth               1.15.0                   pypi_0    pypi
google-auth-oauthlib      0.4.1                    pypi_0    pypi
gspread                   3.6.0                    pypi_0    pypi
httplib2                  0.18.0                   pypi_0    pypi
humanfriendly             8.2                      pypi_0    pypi
idna                      2.9                      pypi_0    pypi
imagecodecs               2020.2.18                pypi_0    pypi
imageio                   2.8.0                    pypi_0    pypi
kiwisolver                1.2.0                    pypi_0    pypi
krb5                      1.17.1               h2fd8d38_0    conda-forge
ld_impl_linux-64          2.34                 h53a641e_0    conda-forge
libblas                   3.8.0               16_openblas    conda-forge
libcblas                  3.8.0               16_openblas    conda-forge
libcurl                   7.69.1               hf7181ac_0    conda-forge
libedit                   3.1.20170329      hf8c457e_1001    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.5.0                hdf63c60_6    conda-forge
liblapack                 3.8.0               16_openblas    conda-forge
libopenblas               0.3.9                h5ec1e0e_0    conda-forge
libssh2                   1.9.0                hab1572f_2    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libuv                     1.34.0               h516909a_0    conda-forge
llvm-openmp               10.0.0               hc9558a2_0    conda-forge
llvmlite                  0.32.1                   pypi_0    pypi
matplotlib                3.2.1                    pypi_0    pypi
mkl                       2020.1                      219    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
networkx                  2.4                      pypi_0    pypi
ninja                     1.10.0               hc9558a2_0    conda-forge
numba                     0.49.1                   pypi_0    pypi
numpy                     1.18.4           py37h8960a57_0    conda-forge
oauth2client              4.1.3                    pypi_0    pypi
oauthlib                  3.1.0                    pypi_0    pypi
openssl                   1.1.1g               h7b6447c_0
pillow                    7.1.2                    pypi_0    pypi
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
protobuf                  3.12.0                   pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pyparsing                 2.4.7                    pypi_0    pypi
python                    3.7.6           h8356626_5_cpython    conda-forge
python-dateutil           2.8.1                    pypi_0    pypi
python_abi                3.7                     1_cp37m    conda-forge
pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
pywavelets                1.1.1                    pypi_0    pypi
pyyaml                    5.3.1                    pypi_0    pypi
readline                  8.0                  hf8c457e_0    conda-forge
requests                  2.23.0                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rhash                     1.3.6             h14c3975_1001    conda-forge
rsa                       4.0                      pypi_0    pypi
scikit-image              0.17.2                   pypi_0    pypi
scipy                     1.4.1                    pypi_0    pypi
setuptools                46.4.0           py37hc8dfbb8_0    conda-forge
six                       1.14.0                   pypi_0    pypi
soupsieve                 2.0.1                    pypi_0    pypi
spconv                    1.0                      pypi_0    pypi
sqlite                    3.30.1               hcee41ef_0    conda-forge
tensorboardx              2.0                      pypi_0    pypi
tifffile                  2020.5.11                pypi_0    pypi
tk                        8.6.10               hed695b0_0    conda-forge
tqdm                      4.46.0                   pypi_0    pypi
urllib3                   1.25.9                   pypi_0    pypi
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h516909a_0    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge

@MartinHahner
Copy link
Contributor Author

I also had trouble installing spconv 1.0, but switching to pytorch 1.1.0 solved that issue for me.

@fregu856: Which version of gcc were you using to build spconv? (gcc --version)
Was it version 5.4 or another one?

@fregu856
Copy link

fregu856 commented Jun 3, 2020

I also had trouble installing spconv 1.0, but switching to pytorch 1.1.0 solved that issue for me.

@fregu856: Which version of gcc were you using to build spconv? (gcc --version)
Was it version 5.4 or another one?

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609

@MartinHahner
Copy link
Contributor Author

Finally SOLVED!

The final issue was, that we tried to build spconv on a Tesla K40 GPU,
which has a Kepler architecture and is seemingly too old to build spconv.

The reason why we always tried to build spconv on a Tesla K40 is that in our lab, it is hard to get a GPU on our GPU cluster interactively. (Usually, you can only submit jobs via qsub/SLURM.)
So we always tried to build spconv either on our local Linux clients or on old nodes which only have Tesla K40s. Then, out of despair, we tried to build spconv on a Titan X (Pascal) and we finally found a combination of requirements that worked.

So here is our solution:

conda create --name PCDet python=3.6 pytorch=1.1 cudatoolkit=9.2 cudatoolkit-dev=9.2 \
cmake --channel pytorch --channel=conda-forge
conda activate PCDet
conda install cudnn
conda install boost
git clone https://github.com/traveller59/spconv spconv_8da6f96 --recursive
cd spconv_8da6f96
git checkout 8da6f967fb9a054d8870c3515b1b44eca2103634

If necessary:
Download and extract
0001-Allow-to-specifiy-CUDA_ROOT-directory-and-pick-corre.patch.zip
and patch spconv via:
git am <PATH_TO_EXTRACTED_FILE>/0001-Allow-to-specifiy-CUDA_ROOT-directory-and-pick-corre.patch

CUDA_ROOT=<PATH_TO_YOUR_CONDA_INSTALLATION>/conda_envs/PCDet python setup.py bdist_wheel
cd dist/
pip install *

Test spconv via:
python -c 'import spconv'
(should just return and not raise any errors)

cd ../..
git clone https://github.com/sshaoshuai/PCDet.git
cd PCDet/
pip install -r requirements.txt
CUDA_ROOT=<PATH_TO_YOUR_CONDA_INSTALLATION>/conda_envs/PCDet python setup.py develop

Done!

I hope these instructions help someone else who struggles to build spconv as well.

@Nyte-BK201
Copy link

Finally SOLVED!

The final issue was, that we tried to build spconv on a Tesla K40 GPU,
which has a Kepler architecture and is seemingly too old to build spconv.

The reason why we always tried to build spconv on a Tesla K40 is that in our lab, it is hard to get a GPU on our GPU cluster interactively. (Usually, you can only submit jobs via qsub/SLURM.) So we always tried to build spconv either on our local Linux clients or on old nodes which only have Tesla K40s. Then, out of despair, we tried to build spconv on a Titan X (Pascal) and we finally found a combination of requirements that worked.

So here is our solution:

conda create --name PCDet python=3.6 pytorch=1.1 cudatoolkit=9.2 cudatoolkit-dev=9.2 \
cmake --channel pytorch --channel=conda-forge
conda activate PCDet
conda install cudnn
conda install boost
git clone https://github.com/traveller59/spconv spconv_8da6f96 --recursive
cd spconv_8da6f96
git checkout 8da6f967fb9a054d8870c3515b1b44eca2103634

If necessary:
Download and extract
0001-Allow-to-specifiy-CUDA_ROOT-directory-and-pick-corre.patch.zip
and patch spconv via:
git am <PATH_TO_EXTRACTED_FILE>/0001-Allow-to-specifiy-CUDA_ROOT-directory-and-pick-corre.patch

CUDA_ROOT=<PATH_TO_YOUR_CONDA_INSTALLATION>/conda_envs/PCDet python setup.py bdist_wheel
cd dist/
pip install *

Test spconv via:
python -c 'import spconv'
(should just return and not raise any errors)

cd ../..
git clone https://github.com/sshaoshuai/PCDet.git
cd PCDet/
pip install -r requirements.txt
CUDA_ROOT=<PATH_TO_YOUR_CONDA_INSTALLATION>/conda_envs/PCDet python setup.py develop

Done!

I hope these instructions help someone else who struggles to build spconv as well.

Damn, I am stuck in the same place and know this with that old K40m cluster..

@mikecheninoulu
Copy link

mikecheninoulu commented Sep 16, 2021

I successfully make it work by recompiling spconv with GPU (I didn't call GPU in the beginning so it failed).
Specifically, I'm using the national server platform, so I have to call the GPUs every time, otherwise, it will be compiled with CPU by default. So code cannot recognize the device when you start to train with GPUs.

In case some people might have the same situation as me: make sure you should call the GPU when compiling the spconv.

Don't forget to clean the cache of previous compiling by 'python setup.py clean' and also 'pip uninstall spconv' before the recompling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants