Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: src/spconv/indice.cu 120 cuda execution failed with error 98 #20

Closed
vignesh628 opened this issue Mar 12, 2021 · 9 comments

Comments

@vignesh628
Copy link

ERROR WHILE RUNNING THE INFERENCE

python /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py evaluate --config_path=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/configs/car.fhd.config --model_dir=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained --measure_time=True --batch_size=1

Predict_test: False
sparse_shape: [ 41 1600 1408]
num_class is : 1
load existing model
load existing model for fusion layer
Restoring parameters from /home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained/fusion_layer-11136.tckpt
remain number of infos: 3769
Generate output labels...
Traceback (most recent call last):
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 920, in
fire.Fire()
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 658, in evaluate
for example in iter(eval_dataloader):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/builder/input_reader_builder.py", line 18, in getitem
return self._dataset[idx]
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/data/dataset.py", line 70, in getitem
prep_func=self._prep_func)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/data/preprocess.py", line 313, in _read_and_prep_v9
count=-1).reshape([-1, num_point_features])
FileNotFoundError: [Errno 2] No such file or directory: '/home/developer/deep_learning/Projects/KITTI_DATASET_ROOT/KITTI_DATASET_ROOT/training/velodyne_reduced/000001.bin'
developer@f0f2e49fc4a2:/deep_learning/deepti_ubuntu20/CLOCs/second$
developer@f0f2e49fc4a2:
/deep_learning/deepti_ubuntu20/CLOCs/second$ python /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py evaluate --config_path=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/configs/car.fhd.config --model_dir=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained --measure_time=True --batch_size=1
Predict_test: False
sparse_shape: [ 41 1600 1408]
num_class is : 1
load existing model
load existing model for fusion layer
Restoring parameters from /home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained/fusion_layer-11136.tckpt
remain number of infos: 3769
Generate output labels...
/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/core/geometry.py:146: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: No implementation of function Function() found for signature:

getitem(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))

There are 22 candidate implementations:

  • Of which 20 did not match due to:
    Overload of function 'getitem': File: : Line N/A.
    With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
    No match.
  • Of which 2 did not match due to:
    Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 162.
    With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
    Rejected as the implementation raised a specific error:
    TypeError: unsupported array index type list(int64)<iv=None> in Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>)
    raised from /opt/conda/lib/python3.6/site-packages/numba/core/typing/arraydecl.py:69

During: typing of intrinsic-call at /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/core/geometry.py (162)

File "core/geometry.py", line 162:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):

vec1 = polygon - polygon[:, [num_points_of_polygon - 1] +
list(range(num_points_of_polygon - 1)), :]
^

@numba.jit
/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/core/geometry.py:146: NumbaWarning:
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: Cannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>

File "core/geometry.py", line 170:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):

cross = 0.0
for i in range(num_points):
^

@numba.jit
/opt/conda/lib/python3.6/site-packages/numba/core/object_mode_passes.py:152: NumbaWarning: Function "points_in_convex_polygon_jit" was compiled in object mode without forceobj=True, but has lifted loops.

File "core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):

# first convert polygon to directed lines
num_points_of_polygon = polygon.shape[1]
^

state.func_ir.loc))
/opt/conda/lib/python3.6/site-packages/numba/core/object_mode_passes.py:162: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):

# first convert polygon to directed lines
num_points_of_polygon = polygon.shape[1]
^

state.func_ir.loc))
Traceback (most recent call last):
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 920, in
fire.Fire()
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 671, in evaluate
model_cfg.lidar_input,global_set)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 462, in predict_kitti_to_anno
all_3d_output_camera_dict, all_3d_output, top_predictions, fusion_input,torch_index = net(example,detection_2d_path)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/voxelnet.py", line 304, in forward
voxel_features, coors, batch_size_dev)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/middle.py", line 545, in forward
ret = self.middle_conv(ret)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/modules.py", line 123, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/conv.py", line 151, in forward
self.stride, self.padding, self.dilation, self.output_padding, self.subm, self.transposed, grid=input.grid)
File "/opt/conda/lib/python3.6/site-packages/spconv/ops.py", line 89, in get_indice_pairs
stride, padding, dilation, out_padding, int(subm), int(transpose))
RuntimeError: /home/developer/deep_learning/deepti_ubuntu20/spconv-8da6f967fb9a054d8870c3515b1b44eca2103634/src/spconv/indice.cu 120
cuda execution failed with error 98

@vignesh628
Copy link
Author

Iam using the ubuntu 20. cuda is failing with error 98 . Is there any solution for this . By the way iam using spcov1.0 version. pytorch version 1.8.0.
cuda:11.1.1-cudnn8

@pangsu0613
Copy link
Owner

Hello @vignesh628 , most of the content you showed here are numba warnings, you dont need to worry about them, I found the mistake is "FileNotFoundError: [Errno 2] No such file or directory: '/home/developer/deep_learning/Projects/KITTI_DATASET_ROOT/KITTI_DATASET_ROOT/training/velodyne_reduced/000001.bin'", so it means that your KITTI dataset directory is not configured correctly, please check that. BTW, you could add the following code at the beginning of train.py to ignore these numba warnings: import warnings
warnings.simplefilter('ignore', category=NumbaDeprecationWarning)
warnings.simplefilter('ignore', category=NumbaPendingDeprecationWarning)
warnings.simplefilter('ignore', category=NumbaPerformanceWarning)
warnings.simplefilter('ignore', category=NumbaWarning)
warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')

@pangsu0613
Copy link
Owner

For ignoring the warnings, I missed one line: from numba.core.errors import NumbaDeprecationWarning, NumbaPendingDeprecationWarning,NumbaPerformanceWarning,NumbaWarning

@vignesh628
Copy link
Author

Hello @pangsu0613 , Thanks for your reply. I have supressed the numba warnings and provided the correct path. Still facing the cuda execution failed with error 98 issue.

python /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py evaluate --config_path=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/configs/car.fhd.config --model_dir=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained --measure_time=True --batch_size=1
Predict_test: False
sparse_shape: [ 41 1600 1408]
num_class is : 1
load existing model
load existing model for fusion layer
Restoring parameters from /home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained/fusion_layer-11136.tckpt
remain number of infos: 3769
Generate output labels...
Traceback (most recent call last):
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 928, in
fire.Fire()
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 679, in evaluate
model_cfg.lidar_input,global_set)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 470, in predict_kitti_to_anno
all_3d_output_camera_dict, all_3d_output, top_predictions, fusion_input,torch_index = net(example,detection_2d_path)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/voxelnet.py", line 304, in forward
voxel_features, coors, batch_size_dev)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/middle.py", line 545, in forward
ret = self.middle_conv(ret)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/modules.py", line 123, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/conv.py", line 151, in forward
self.stride, self.padding, self.dilation, self.output_padding, self.subm, self.transposed, grid=input.grid)
File "/opt/conda/lib/python3.6/site-packages/spconv/ops.py", line 89, in get_indice_pairs
stride, padding, dilation, out_padding, int(subm), int(transpose))
RuntimeError: /home/developer/deep_learning/deepti_ubuntu20/spconv-8da6f967fb9a054d8870c3515b1b44eca2103634/src/spconv/indice.cu 120
cuda execution failed with error 98

@pangsu0613
Copy link
Owner

I have not met or seen this type of error before, maybe this will help: open-mmlab/OpenPCDet#442

@vignesh628
Copy link
Author

@pangsu0613 . Thanks for this. I ran it on ubuntu 20 with GPU of pascal architecture. Everything is running fine.

@pangsu0613
Copy link
Owner

That's awesome !

@jayadeepk
Copy link

I got it working for Maxwell architecture using juimoisnono's method.
For compiling CUDA files for a specific architecture using nvcc, gencode flag need to be specified appropriately. For example, for compute capability 5.0, the nvcc flag is -gencode arch=compute_50,code=sm_50.
For spconv (commit 8da6f96), this flag can be specified using -DCMAKE_CUDA_FLAGS in spconv's setup.py.

diff --git a/setup.py b/setup.py
index 1e68a29..71b2d95 100644
--- a/setup.py
+++ b/setup.py
@@ -45,7 +45,7 @@ class CMakeBuild(build_ext):
                       '-DCMAKE_PREFIX_PATH=' + LIBTORCH_ROOT,
                       '-DPYBIND11_PYTHON_VERSION={}'.format(PYTHON_VERSION),
                       '-DSPCONV_BuildTests=OFF',
-                      '-DCMAKE_CUDA_FLAGS="--expt-relaxed-constexpr"']
+                      '-DCMAKE_CUDA_FLAGS=--expt-relaxed-constexpr -gencode arch=compute_50,code=sm_50']
 
         cfg = 'Debug' if self.debug else 'Release'
         # cfg = 'Debug'

@pangsu0613
Copy link
Owner

Thank you very much for sharing your solution @jayadeepk !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants