Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsm_impl error during validation process with PyTorch 2.0.0 #674

Closed
ttppss opened this issue Mar 17, 2023 · 3 comments
Closed

nsm_impl error during validation process with PyTorch 2.0.0 #674

ttppss opened this issue Mar 17, 2023 · 3 comments

Comments

@ttppss
Copy link

ttppss commented Mar 17, 2023

Hi, I just installed a new mmyolo environment following the instruction, and when I train my model with custom dataset, it is fine during training process, but raised an error during validation. Can someone take a look and see what my issue is? I already checked some similar or same issues mentioned in MMDetection, but seems that they cannot fix my issue.

environment

`sys.platform: linux
Python: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA RTX A6000
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 2.0.0+cu118
PyTorch compiling details: PyTorch built with:

  • GCC 9.3
  • C++ Version: 201703
  • Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.8
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  • CuDNN 8.7
  • Magma 2.6.1
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.15.1+cu118
OpenCV: 4.7.0
MMEngine: 0.7.0
MMCV: 2.0.0rc4
MMDetection: 3.0.0rc6
MMYOLO: 0.5.0+dc85144
`

Then I got the following error message (the following message appears when running image_demo.py, but it also appears when I run my own code.):

(yolomm) ➜ mmyolo git:(main) ✗ python demo/image_demo.py demo/demo.jpg
yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py
yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth
Loads checkpoint by local backend from path: yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth
03/17 11:08:07 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
03/17 11:08:07 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
03/17 11:08:28 - mmengine - WARNING - Visualizer backend is not initialized because save_dir is None.
[ ] 0/1, elapsed: 0s, ETA:Traceback (most recent call last):
File "demo/image_demo.py", line 168, in
main()
File "demo/image_demo.py", line 123, in main
result = inference_detector(model, file)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/apis/inference.py", line 177, in inference_detector
results = model.test_step(data_)[0]
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
return self._run_forward(data, mode='predict') # type: ignore
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 326, in _run_forward
results = self(**data, mode=mode)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 94, in forward
return self.predict(inputs, data_samples)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/models/detectors/single_stage.py", line 110, in predict
results_list = self.bbox_head.predict(
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 197, in predict
predictions = self.predict_by_feat(
File "/home/user/mmyolo/mmyolo/models/dense_heads/yolov5_head.py", line 430, in predict_by_feat
results = self._bbox_post_process(
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 479, in bbox_post_process
det_bboxes, keep_idxs = batched_nms(bboxes, results.scores,
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmcv/ops/nms.py", line 302, in batched_nms
dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg
)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmengine/utils/misc.py", line 354, in new_func
output = old_func(*args, **kwargs)
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmcv/ops/nms.py", line 127, in nms
inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold,
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/user/anaconda3/envs/yolomm/lib/python3.8/site-packages/mmcv/ops/nms.py", line 27, in forward
inds = ext_module.nms(
RuntimeError: nms_impl: implementation for device cuda:0 not found.

One of the reference I checked is open-mmlab/mmdetection#6765 (comment)

Please let me know if you need more details from me.

Thank you.

@ttppss ttppss changed the title nsm_impl error during validation process nsm_impl error during validation process with PyTorch 2.0.0 Mar 17, 2023
@ttppss
Copy link
Author

ttppss commented Mar 17, 2023

Downgrading pytorch from 2.0.0 to 1.13.1 can temporally fix this issue, but the installation instruction may need to be modified since the pip or conda command will install 2.0.0 by default.

@RangeKing
Copy link
Collaborator

Downgrading pytorch from 2.0.0 to 1.13.1 can temporally fix this issue, but the installation instruction may need to be modified since the pip or conda command will install 2.0.0 by default.

For now, MMCV doesn't support pytorch 2.0.0. We'll refine the docs. Thank you for your suggestions.

@hhaAndroid
Copy link
Collaborator

Downgrading pytorch from 2.0.0 to 1.13.1 can temporally fix this issue, but the installation instruction may need to be modified since the pip or conda command will install 2.0.0 by default.

Thank you for your feedback, we will support pytorch 2.0 in the near future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants