New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: src/spconv/indice.cu 120 cuda execution failed with error 98 #20
Comments
Iam using the ubuntu 20. cuda is failing with error 98 . Is there any solution for this . By the way iam using spcov1.0 version. pytorch version 1.8.0. |
Hello @vignesh628 , most of the content you showed here are numba warnings, you dont need to worry about them, I found the mistake is "FileNotFoundError: [Errno 2] No such file or directory: '/home/developer/deep_learning/Projects/KITTI_DATASET_ROOT/KITTI_DATASET_ROOT/training/velodyne_reduced/000001.bin'", so it means that your KITTI dataset directory is not configured correctly, please check that. BTW, you could add the following code at the beginning of train.py to ignore these numba warnings: import warnings |
For ignoring the warnings, I missed one line: from numba.core.errors import NumbaDeprecationWarning, NumbaPendingDeprecationWarning,NumbaPerformanceWarning,NumbaWarning |
Hello @pangsu0613 , Thanks for your reply. I have supressed the numba warnings and provided the correct path. Still facing the cuda execution failed with error 98 issue. python /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py evaluate --config_path=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/configs/car.fhd.config --model_dir=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained --measure_time=True --batch_size=1 |
I have not met or seen this type of error before, maybe this will help: open-mmlab/OpenPCDet#442 |
@pangsu0613 . Thanks for this. I ran it on ubuntu 20 with GPU of pascal architecture. Everything is running fine. |
That's awesome ! |
I got it working for Maxwell architecture using juimoisnono's method. diff --git a/setup.py b/setup.py
index 1e68a29..71b2d95 100644
--- a/setup.py
+++ b/setup.py
@@ -45,7 +45,7 @@ class CMakeBuild(build_ext):
'-DCMAKE_PREFIX_PATH=' + LIBTORCH_ROOT,
'-DPYBIND11_PYTHON_VERSION={}'.format(PYTHON_VERSION),
'-DSPCONV_BuildTests=OFF',
- '-DCMAKE_CUDA_FLAGS="--expt-relaxed-constexpr"']
+ '-DCMAKE_CUDA_FLAGS=--expt-relaxed-constexpr -gencode arch=compute_50,code=sm_50']
cfg = 'Debug' if self.debug else 'Release'
# cfg = 'Debug' |
Thank you very much for sharing your solution @jayadeepk ! |
ERROR WHILE RUNNING THE INFERENCE
python /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py evaluate --config_path=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/configs/car.fhd.config --model_dir=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained --measure_time=True --batch_size=1
Predict_test: False
sparse_shape: [ 41 1600 1408]
num_class is : 1
load existing model
load existing model for fusion layer
Restoring parameters from /home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained/fusion_layer-11136.tckpt
remain number of infos: 3769
Generate output labels...
Traceback (most recent call last):
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 920, in
fire.Fire()
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 658, in evaluate
for example in iter(eval_dataloader):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/builder/input_reader_builder.py", line 18, in getitem
return self._dataset[idx]
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/data/dataset.py", line 70, in getitem
prep_func=self._prep_func)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/data/preprocess.py", line 313, in _read_and_prep_v9
count=-1).reshape([-1, num_point_features])
FileNotFoundError: [Errno 2] No such file or directory: '/home/developer/deep_learning/Projects/KITTI_DATASET_ROOT/KITTI_DATASET_ROOT/training/velodyne_reduced/000001.bin'
developer@f0f2e49fc4a2:
/deep_learning/deepti_ubuntu20/CLOCs/second$/deep_learning/deepti_ubuntu20/CLOCs/second$ python /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py evaluate --config_path=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/configs/car.fhd.config --model_dir=/home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained --measure_time=True --batch_size=1developer@f0f2e49fc4a2:
Predict_test: False
sparse_shape: [ 41 1600 1408]
num_class is : 1
load existing model
load existing model for fusion layer
Restoring parameters from /home/developer/deep_learning/deepti_ubuntu20/CLOCs/CLOCs_SecCas_pretrained/fusion_layer-11136.tckpt
remain number of infos: 3769
Generate output labels...
/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/core/geometry.py:146: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: No implementation of function Function() found for signature:
There are 22 candidate implementations:
Overload of function 'getitem': File: : Line N/A.
With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
No match.
Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 162.
With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
Rejected as the implementation raised a specific error:
TypeError: unsupported array index type list(int64)<iv=None> in Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>)
raised from /opt/conda/lib/python3.6/site-packages/numba/core/typing/arraydecl.py:69
During: typing of intrinsic-call at /home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/core/geometry.py (162)
File "core/geometry.py", line 162:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
@numba.jit
/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/core/geometry.py:146: NumbaWarning:
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: Cannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>
File "core/geometry.py", line 170:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
@numba.jit
/opt/conda/lib/python3.6/site-packages/numba/core/object_mode_passes.py:152: NumbaWarning: Function "points_in_convex_polygon_jit" was compiled in object mode without forceobj=True, but has lifted loops.
File "core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
state.func_ir.loc))
/opt/conda/lib/python3.6/site-packages/numba/core/object_mode_passes.py:162: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.
For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
File "core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
state.func_ir.loc))
Traceback (most recent call last):
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 920, in
fire.Fire()
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 671, in evaluate
model_cfg.lidar_input,global_set)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/train.py", line 462, in predict_kitti_to_anno
all_3d_output_camera_dict, all_3d_output, top_predictions, fusion_input,torch_index = net(example,detection_2d_path)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/voxelnet.py", line 304, in forward
voxel_features, coors, batch_size_dev)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/developer/deep_learning/deepti_ubuntu20/CLOCs/second/pytorch/models/middle.py", line 545, in forward
ret = self.middle_conv(ret)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/modules.py", line 123, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/spconv/conv.py", line 151, in forward
self.stride, self.padding, self.dilation, self.output_padding, self.subm, self.transposed, grid=input.grid)
File "/opt/conda/lib/python3.6/site-packages/spconv/ops.py", line 89, in get_indice_pairs
stride, padding, dilation, out_padding, int(subm), int(transpose))
RuntimeError: /home/developer/deep_learning/deepti_ubuntu20/spconv-8da6f967fb9a054d8870c3515b1b44eca2103634/src/spconv/indice.cu 120
cuda execution failed with error 98
The text was updated successfully, but these errors were encountered: