RuntimeError: src/spconv/indice.cu 120 cuda execution failed with error 98 #20

vignesh628 · 2021-03-12T18:26:00Z

ERROR WHILE RUNNING THE INFERENCE

python /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py evaluate --config_path=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/configs/car.fhd.config --model_dir=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained --measure_time=True --batch_size=1

Predict_test: False
sparse_shape: [ 41 1600 1408]
num_class is : 1
load existing model
load existing model for fusion layer
Restoring parameters from /home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained/fusion_layer-11136.tckpt
remain number of infos: 3769
Generate output labels...
Traceback (most recent call last):
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 920, in
fire.Fire()
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 658, in evaluate
for example in iter(eval_dataloader):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/builder/input_reader_builder.py", line 18, in getitem
return self._dataset[idx]
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/data/dataset.py", line 70, in getitem
prep_func=self._prep_func)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/data/preprocess.py", line 313, in _read_and_prep_v9
count=-1).reshape([-1, num_point_features])
FileNotFoundError: [Errno 2] No such file or directory: '/home/developer/deep_learning/Projects/KITTI_DATASET_ROOT/KITTI_DATASET_ROOT/training/velodyne_reduced/000001.bin'
developer@f0f2e49fc4a2:/deep_learning/deepti_ubuntu20/CLOCs/second$
developer@f0f2e49fc4a2:/deep_learning/deepti_ubuntu20/CLOCs/second$ python /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py evaluate --config_path=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/configs/car.fhd.config --model_dir=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained --measure_time=True --batch_size=1
Predict_test: False
sparse_shape: [ 41 1600 1408]
num_class is : 1
load existing model
load existing model for fusion layer
Restoring parameters from /home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained/fusion_layer-11136.tckpt
remain number of infos: 3769
Generate output labels...
/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/core/geometry.py:146: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: No implementation of function Function() found for signature:

getitem(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))

There are 22 candidate implementations:

Of which 20 did not match due to:
Overload of function 'getitem': File: : Line N/A.
With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
No match.
Of which 2 did not match due to:
Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 162.
With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
Rejected as the implementation raised a specific error:
TypeError: unsupported array index type list(int64)<iv=None> in Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>)
raised from /opt/conda/lib/python3.6/site-packages/numba/core/typing/arraydecl.py:69

During: typing of intrinsic-call at /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/core/geometry.py (162)

File "core/geometry.py", line 162:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):

vec1 = polygon - polygon[:, [num_points_of_polygon - 1] +
list(range(num_points_of_polygon - 1)), :]
^

@numba.jit
/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/core/geometry.py:146: NumbaWarning:
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: Cannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>

File "core/geometry.py", line 170:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):

cross = 0.0
for i in range(num_points):
^

@numba.jit
/opt/conda/lib/python3.6/site-packages/numba/core/object_mode_passes.py:152: NumbaWarning: Function "points_in_convex_polygon_jit" was compiled in object mode without forceobj=True, but has lifted loops.

File "core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):

# first convert polygon to directed lines
num_points_of_polygon = polygon.shape[1]
^

state.func_ir.loc))
/opt/conda/lib/python3.6/site-packages/numba/core/object_mode_passes.py:162: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):

# first convert polygon to directed lines
num_points_of_polygon = polygon.shape[1]
^

state.func_ir.loc))
Traceback (most recent call last):
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 920, in
fire.Fire()
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 671, in evaluate
model_cfg.lidar_input,global_set)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 462, in predict_kitti_to_anno
all_3d_output_camera_dict, all_3d_output, top_predictions, fusion_input,torch_index = net(example,detection_2d_path)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/voxelnet.py", line 304, in forward
voxel_features, coors, batch_size_dev)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/middle.py", line 545, in forward
ret = self.middle_conv(ret)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/modules.py", line 123, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/conv.py", line 151, in forward
self.stride, self.padding, self.dilation, self.output_padding, self.subm, self.transposed, grid=input.grid)
File "/opt/conda/lib/python3.6/site-packages/spconv/ops.py", line 89, in get_indice_pairs
stride, padding, dilation, out_padding, int(subm), int(transpose))
RuntimeError: /home/developer/deep_learning/deepti_ubuntu20/spconv-8da6f967fb9a054d8870c3515b1b44eca2103634/src/spconv/indice.cu 120
cuda execution failed with error 98

vignesh628 · 2021-03-12T18:26:37Z

Iam using the ubuntu 20. cuda is failing with error 98 . Is there any solution for this . By the way iam using spcov1.0 version. pytorch version 1.8.0.
cuda:11.1.1-cudnn8

pangsu0613 · 2021-03-12T18:41:37Z

Hello @vignesh628 , most of the content you showed here are numba warnings, you dont need to worry about them, I found the mistake is "FileNotFoundError: [Errno 2] No such file or directory: '/home/developer/deep_learning/Projects/KITTI_DATASET_ROOT/KITTI_DATASET_ROOT/training/velodyne_reduced/000001.bin'", so it means that your KITTI dataset directory is not configured correctly, please check that. BTW, you could add the following code at the beginning of train.py to ignore these numba warnings: import warnings
warnings.simplefilter('ignore', category=NumbaDeprecationWarning)
warnings.simplefilter('ignore', category=NumbaPendingDeprecationWarning)
warnings.simplefilter('ignore', category=NumbaPerformanceWarning)
warnings.simplefilter('ignore', category=NumbaWarning)
warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')

pangsu0613 · 2021-03-12T18:50:18Z

For ignoring the warnings, I missed one line: from numba.core.errors import NumbaDeprecationWarning, NumbaPendingDeprecationWarning,NumbaPerformanceWarning,NumbaWarning

vignesh628 · 2021-03-13T03:37:01Z

Hello @pangsu0613 , Thanks for your reply. I have supressed the numba warnings and provided the correct path. Still facing the cuda execution failed with error 98 issue.

python /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py evaluate --config_path=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/configs/car.fhd.config --model_dir=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained --measure_time=True --batch_size=1
Predict_test: False
sparse_shape: [ 41 1600 1408]
num_class is : 1
load existing model
load existing model for fusion layer
Restoring parameters from /home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained/fusion_layer-11136.tckpt
remain number of infos: 3769
Generate output labels...
Traceback (most recent call last):
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 928, in
fire.Fire()
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 679, in evaluate
model_cfg.lidar_input,global_set)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 470, in predict_kitti_to_anno
all_3d_output_camera_dict, all_3d_output, top_predictions, fusion_input,torch_index = net(example,detection_2d_path)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/voxelnet.py", line 304, in forward
voxel_features, coors, batch_size_dev)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/middle.py", line 545, in forward
ret = self.middle_conv(ret)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/modules.py", line 123, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/conv.py", line 151, in forward
self.stride, self.padding, self.dilation, self.output_padding, self.subm, self.transposed, grid=input.grid)
File "/opt/conda/lib/python3.6/site-packages/spconv/ops.py", line 89, in get_indice_pairs
stride, padding, dilation, out_padding, int(subm), int(transpose))
RuntimeError: /home/developer/deep_learning/deepti_ubuntu20/spconv-8da6f967fb9a054d8870c3515b1b44eca2103634/src/spconv/indice.cu 120
cuda execution failed with error 98

pangsu0613 · 2021-03-13T04:35:19Z

I have not met or seen this type of error before, maybe this will help: open-mmlab/OpenPCDet#442

vignesh628 · 2021-03-16T10:58:11Z

@pangsu0613 . Thanks for this. I ran it on ubuntu 20 with GPU of pascal architecture. Everything is running fine.

pangsu0613 · 2021-03-16T14:42:47Z

That's awesome !

jayadeepk · 2022-01-02T06:01:35Z

I got it working for Maxwell architecture using juimoisnono's method.
For compiling CUDA files for a specific architecture using nvcc, gencode flag need to be specified appropriately. For example, for compute capability 5.0, the nvcc flag is -gencode arch=compute_50,code=sm_50.
For spconv (commit 8da6f96), this flag can be specified using -DCMAKE_CUDA_FLAGS in spconv's setup.py.

diff --git a/setup.py b/setup.py
index 1e68a29..71b2d95 100644
--- a/setup.py
+++ b/setup.py
@@ -45,7 +45,7 @@ class CMakeBuild(build_ext):
                       '-DCMAKE_PREFIX_PATH=' + LIBTORCH_ROOT,
                       '-DPYBIND11_PYTHON_VERSION={}'.format(PYTHON_VERSION),
                       '-DSPCONV_BuildTests=OFF',
-                      '-DCMAKE_CUDA_FLAGS="--expt-relaxed-constexpr"']
+                      '-DCMAKE_CUDA_FLAGS=--expt-relaxed-constexpr -gencode arch=compute_50,code=sm_50']
 
         cfg = 'Debug' if self.debug else 'Release'
         # cfg = 'Debug'

pangsu0613 · 2022-01-15T01:31:26Z

Thank you very much for sharing your solution @jayadeepk !

pangsu0613 closed this as completed Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: src/spconv/indice.cu 120 cuda execution failed with error 98 #20

RuntimeError: src/spconv/indice.cu 120 cuda execution failed with error 98 #20

vignesh628 commented Mar 12, 2021

vignesh628 commented Mar 12, 2021

pangsu0613 commented Mar 12, 2021

pangsu0613 commented Mar 12, 2021

vignesh628 commented Mar 13, 2021

pangsu0613 commented Mar 13, 2021

vignesh628 commented Mar 16, 2021

pangsu0613 commented Mar 16, 2021

jayadeepk commented Jan 2, 2022

pangsu0613 commented Jan 15, 2022

RuntimeError: src/spconv/indice.cu 120 cuda execution failed with error 98 #20

RuntimeError: src/spconv/indice.cu 120 cuda execution failed with error 98 #20

Comments

vignesh628 commented Mar 12, 2021

vignesh628 commented Mar 12, 2021

pangsu0613 commented Mar 12, 2021

pangsu0613 commented Mar 12, 2021

vignesh628 commented Mar 13, 2021

pangsu0613 commented Mar 13, 2021

vignesh628 commented Mar 16, 2021

pangsu0613 commented Mar 16, 2021

jayadeepk commented Jan 2, 2022

pangsu0613 commented Jan 15, 2022