nsm_impl error during validation process with PyTorch 2.0.0 #674

ttppss · 2023-03-17T16:09:07Z

Hi, I just installed a new mmyolo environment following the instruction, and when I train my model with custom dataset, it is fine during training process, but raised an error during validation. Can someone take a look and see what my issue is? I already checked some similar or same issues mentioned in MMDetection, but seems that they cannot fix my issue.

environment

`sys.platform: linux
Python: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA RTX A6000
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 2.0.0+cu118
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.8
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
CuDNN 8.7
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.15.1+cu118
OpenCV: 4.7.0
MMEngine: 0.7.0
MMCV: 2.0.0rc4
MMDetection: 3.0.0rc6
MMYOLO: 0.5.0+dc85144
`

Then I got the following error message (the following message appears when running image_demo.py, but it also appears when I run my own code.):

(yolomm) ➜ mmyolo git:(main) ✗ python demo/image_demo.py demo/demo.jpg
yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py
yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth
Loads checkpoint by local backend from path: yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth
03/17 11:08:07 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
03/17 11:08:07 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
03/17 11:08:28 - mmengine - WARNING - Visualizer backend is not initialized because save_dir is None.
[ ] 0/1, elapsed: 0s, ETA:Traceback (most recent call last):
File "demo/image_demo.py", line 168, in
main()
File "demo/image_demo.py", line 123, in main
result = inference_detector(model, file)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/apis/inference.py", line 177, in inference_detector
results = model.test_step(data_)[0]
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
return self._run_forward(data, mode='predict') # type: ignore
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 326, in _run_forward
results = self(**data, mode=mode)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 94, in forward
return self.predict(inputs, data_samples)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/models/detectors/single_stage.py", line 110, in predict
results_list = self.bbox_head.predict(
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 197, in predict
predictions = self.predict_by_feat(
File "/home/user/mmyolo/mmyolo/models/dense_heads/yolov5_head.py", line 430, in predict_by_feat
results = self._bbox_post_process(
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 479, in bbox_post_process
det_bboxes, keep_idxs = batched_nms(bboxes, results.scores,
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmcv/ops/nms.py", line 302, in batched_nms
dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmengine/utils/misc.py", line 354, in new_func
output = old_func(*args, **kwargs)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmcv/ops/nms.py", line 127, in nms
inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold,
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmcv/ops/nms.py", line 27, in forward
inds = ext_module.nms(
RuntimeError: nms_impl: implementation for device cuda:0 not found.

One of the reference I checked is open-mmlab/mmdetection#6765 (comment)

Please let me know if you need more details from me.

Thank you.

The text was updated successfully, but these errors were encountered:

ttppss · 2023-03-17T17:25:27Z

Downgrading pytorch from 2.0.0 to 1.13.1 can temporally fix this issue, but the installation instruction may need to be modified since the pip or conda command will install 2.0.0 by default.

RangeKing · 2023-03-19T04:30:23Z

Downgrading pytorch from 2.0.0 to 1.13.1 can temporally fix this issue, but the installation instruction may need to be modified since the pip or conda command will install 2.0.0 by default.

For now, MMCV doesn't support pytorch 2.0.0. We'll refine the docs. Thank you for your suggestions.

hhaAndroid · 2023-03-20T11:47:53Z

Downgrading pytorch from 2.0.0 to 1.13.1 can temporally fix this issue, but the installation instruction may need to be modified since the pip or conda command will install 2.0.0 by default.

Thank you for your feedback, we will support pytorch 2.0 in the near future

ttppss changed the title ~~nsm_impl error during validation process~~ nsm_impl error during validation process with PyTorch 2.0.0 Mar 17, 2023

hhaAndroid added the planned feature label Mar 20, 2023

hhaAndroid closed this as completed May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsm_impl error during validation process with PyTorch 2.0.0 #674

nsm_impl error during validation process with PyTorch 2.0.0 #674

ttppss commented Mar 17, 2023 •

edited

Loading

ttppss commented Mar 17, 2023 •

edited

Loading

RangeKing commented Mar 19, 2023

hhaAndroid commented Mar 20, 2023

nsm_impl error during validation process with PyTorch 2.0.0 #674

nsm_impl error during validation process with PyTorch 2.0.0 #674

Comments

ttppss commented Mar 17, 2023 • edited Loading

environment

Then I got the following error message (the following message appears when running image_demo.py, but it also appears when I run my own code.):

ttppss commented Mar 17, 2023 • edited Loading

RangeKing commented Mar 19, 2023

hhaAndroid commented Mar 20, 2023

ttppss commented Mar 17, 2023 •

edited

Loading

ttppss commented Mar 17, 2023 •

edited

Loading