ERROR Trying to train 3DSSD #426

RolandoAvides · 2021-04-08T08:47:55Z

Hello,

I had this error when trying to execute train.py for 3DSSD model:

File "/home/rmoreira/.local/lib/python3.8/site-packages/mmdet3d-0.11.0-py3.8-linux-x86_64.egg/mmdet3d/ops/ball_query/ball_query.py", line 4, in <module>
    from . import ball_query_ext
ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory

Hope anyone has some knowledge about this.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

Tai-Wang · 2021-04-08T11:20:08Z

It seems your cuda environment has something wrong. Please check the compatibility of your cuda, GPU driver and pytorch version.

RolandoAvides · 2021-04-08T11:23:38Z

Thanks! I'll look into it.

However, I strictly followed the installation steps.

Tai-Wang · 2021-04-08T11:36:55Z

But there may still exist some problems when you need to decide which version of cuda/pytorch to be installed. You can check whether torch.cuda.is_available() is True.

RolandoAvides · 2021-04-11T15:34:14Z

Does mmdet3D have specific requirements for cuda and pytorch version?

am currently using cuda 11.0 and pytorch 1.7.1

Tai-Wang · 2021-04-12T01:16:48Z

Ah yes, if you use pytorch 1.7, maybe you need to update to the latest master or at least 0.12.0 because we fix some errors of scatter_points_cuda.cu that will be triggered with pytorch>=1.7. But it seems not related to your bug. I think you should double check the environment about cuda (like the cuda version and driver, as well as the cuda used to compile mmdet3d and pytorch).

RolandoAvides · 2021-04-14T11:38:01Z

I checked my versions and everything seems to be alright. Do you suggest using another pytorch and cuda version?

Tai-Wang · 2021-04-14T11:45:51Z

Can you run python mmdet3d/utils/collect_env.py to collect the environment information? Please post it if it is successful. You can also refer to #438 and have a try.

RolandoAvides · 2021-04-14T11:58:05Z

Traceback (most recent call last):
File "mmdet3d/utils/collect_env.py", line 5, in
import mmdet3d
ModuleNotFoundError: No module named 'mmdet3d'

I reinstalled mmdet3d, but I get this error. However, I have mmdet3d in my env.

RolandoAvides · 2021-04-14T11:58:41Z

After conda list:

mmdet3d 0.12.0 dev_0

Tai-Wang · 2021-04-14T12:05:39Z

Try to run PYTHONPATH="$(dirname $0)":$PYTHONPATH python mmdet3d/utils/collect_env.py.

RolandoAvides · 2021-04-14T12:08:25Z

/home/rmoreira/.conda/envs/thesis/lib/python3.8/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
sys.platform: linux
Python: 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0]
CUDA available: False
GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2
OpenCV: 4.5.1
MMCV: 1.3.0
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: not available
MMDetection: 2.11.0
MMDetection3D: 0.12.0+d055876

Since I am using a slurm server, I have to run each python script with srun. That is why in beginning it didn't find any GPU.

RolandoAvides · 2021-04-14T12:09:30Z

Also, when I use python in cmd, I successfully import mmcv, mmdet and mmdet3d.

Tai-Wang · 2021-04-14T12:11:35Z

Then please use srun to run the command.

RolandoAvides · 2021-04-14T12:18:43Z

I got an error. However, I reinstalled mmdetedction3d and got a new error:

File "/home/rmoreira/.conda/envs/thesis/lib/python3.8/site-packages/mmcv/ops/nms.py", line 19, in forward
    inds = ext_module.nms(
RuntimeError: nms is not compiled with GPU support

Thanks for your help.

Tai-Wang · 2021-04-14T12:19:41Z

You need to use srun to compile mmcv/mmdet/mmdet3d. Otherwise, the local environment without GPU will only compile the lite CPU version packages.

RolandoAvides · 2021-04-14T12:24:01Z

Ok. I'm running this command:

srun python demo/pcd_demo.py /home/rmoreira/mmdetection3d/demo/kitti_000008.bin /home/rmoreira/mmdetection3d/configs/3dssd/3dssd_kitti-3d-car.py /home/rmoreira/mmdetection3d/checkpoints/3dssd_kitti-3d-car_20210324_122002-07e9a19b.pth

Thus, I need to reinstall mmcv and mmdet with srun right? Can I use srun pip....?

Tai-Wang · 2021-04-14T12:31:20Z

Typically I use srun python -m pip install ... But you can have a try.

RolandoAvides · 2021-04-14T12:33:04Z

Ok, I will try! Thank you! I'll soon give some feedback.

RolandoAvides · 2021-04-14T12:47:56Z

@Tai-Wang I manage to run the demo.py! Thank you!

I'll take this opportunity to show you the results. I found the first warning about incompatibility strange...

srun python demo/pcd_demo.py /home/rmoreira/mmdetection3d/demo/kitti_000008.bin /home/rmoreira/mmdetection3d/configs/3dssd/3dssd_kitti-3d-car.py /home/rmoreira/mmdetection3d/checkpoints/3dssd_kitti-3d-car_20210324_122002-07e9a19b.pthUse load_from_local loader
The model and loaded state dict do not match exactly

size mismatch for bbox_head.conv_pred.conv_cls.weight: copying a param with shape torch.Size([1, 128, 1]) from checkpoint, the shape in current model is torch.Size([3, 128, 1]).
size mismatch for bbox_head.conv_pred.conv_cls.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([3]).
[{'boxes_3d': LiDARInstance3DBoxes(
    tensor([[ 55.7404,  -8.6077,  -0.4224,  ...,   4.5783,   1.8301,   1.0405],
        [ 25.6243, -10.9809,  -1.5819,  ...,   3.7168,   1.4777,  -1.0689],
        [ 66.4182, -26.8103,  -1.1402,  ...,   3.7059,   1.5027,   1.6769],
        ...,
        [ 29.5463, -12.6929,  -3.2484,  ...,   4.1319,   1.8958,   3.1369],
        [ 33.4877,  -7.3111,  -1.3107,  ...,   4.0879,   1.6466,   1.7741],
        [ 20.6169,   0.4046,  -1.3966,  ...,   3.7587,   1.5981,   1.9495]])), 'scores_3d': tensor([0.7574, 0.6495, 0.6280, 0.6921, 0.7645, 0.7266, 0.7165, 0.5575, 0.5733,
        0.7091, 0.8246, 0.7201, 0.5388, 0.7428, 0.6678, 0.7330, 0.4847, 0.6177,
        0.7114, 0.6585, 0.6402, 0.5568, 0.7016, 0.5209, 0.6761, 0.6874, 0.7599,
        0.6817, 0.4953, 0.6884, 0.7174, 0.5856, 0.4783, 0.7312, 0.5483, 0.5400,
        0.5401, 0.4533, 0.6866, 0.6345, 0.7415, 0.7145, 0.5942, 0.7193, 0.6310,
        0.6636, 0.7538, 0.7477, 0.5689, 0.7458, 0.7574, 0.6495, 0.6280, 0.6921,
        0.7645, 0.7266, 0.7165, 0.5575, 0.5733, 0.7091, 0.8246, 0.7201, 0.5388,
        0.7428, 0.6678, 0.7330, 0.4847, 0.6177, 0.7114, 0.6585, 0.6402, 0.5568,
        0.7016, 0.5209, 0.6761, 0.6874, 0.7599, 0.6817, 0.4953, 0.6884, 0.7174,
        0.5856, 0.4783, 0.7312, 0.5483, 0.5400, 0.5401, 0.4533, 0.6866, 0.6345,
        0.7415, 0.7145, 0.5942, 0.7193, 0.6310, 0.6636, 0.7538, 0.7477, 0.5689,
        0.7458, 0.7574, 0.6495, 0.6280, 0.6921, 0.7645, 0.7266, 0.7165, 0.5575,
        0.5733, 0.7091, 0.8246, 0.7201, 0.5388, 0.7428, 0.6678, 0.7330, 0.4847,
        0.6177, 0.7114, 0.6585, 0.6402, 0.5568, 0.7016, 0.5209, 0.6761, 0.6874,
        0.7599, 0.6817, 0.4953, 0.6884, 0.7174, 0.5856, 0.4783, 0.7312, 0.5483,
        0.5400, 0.5401, 0.4533, 0.6866, 0.6345, 0.7415, 0.7145, 0.5942, 0.7193,
        0.6310, 0.6636, 0.7538, 0.7477, 0.5689, 0.7458]), 'labels_3d': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2])}]
/home/rmoreira/mmcv/mmcv/cnn/bricks/conv_module.py:107: UserWarning: ConvModule has norm and bias at the same time
  warnings.warn('ConvModule has norm and bias at the same time')
/home/rmoreira/.conda/envs/thesis/lib/python3.8/site-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/home/rmoreira/mmcv/mmcv/utils/misc.py:303: UserWarning: "iou_thr" is deprecated in `nms`, please use "iou_threshold" instead
  warnings.warn(

While I got this using PointPillars...

[{'boxes_3d': LiDARInstance3DBoxes(
    tensor([[  7.6368,   6.0337,  -0.7049,   0.5380,   1.6368,   1.6816,   1.1690],
        [  8.0813,   1.2360,  -1.5221,   1.5556,   3.5950,   1.5098,  -1.2870],
        [  6.4173,  -3.8148,  -1.7652,   1.4713,   3.1078,   1.4624,   1.8872],
        [ 14.7695,  -1.1130,  -1.5709,   1.5431,   3.8069,   1.4875,   1.8945],
        [ 33.3250,  -7.0627,  -1.2610,   1.6582,   4.0907,   1.6339,  -1.2844],
        [ 20.3003,  -8.4587,  -1.6727,   1.5208,   2.6914,   1.6241,   1.9094],
        [  3.6622,   2.7378,  -1.5460,   1.5642,   3.6437,   1.4970,   1.7988],
        [ 28.6279,  -1.6022,  -1.0425,   1.5140,   3.8013,   1.4418,   0.3095],
        [ 55.6062, -20.1399,  -1.3588,   1.6496,   4.0910,   1.6164,  -1.2947],
        [ 24.9041, -10.1028,  -1.6406,   1.6510,   3.6680,   1.4994,   1.8840],
        [ 40.7069,  -9.7905,  -1.3112,   1.5934,   3.9164,   1.5893,  -1.2865]])), 'scores_3d': tensor([0.4375, 0.9567, 0.9528, 0.9437, 0.8850, 0.8328, 0.7940, 0.7637, 0.6648,
        0.6040, 0.4795]), 'labels_3d': tensor([1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])}]

Thanks!

RolandoAvides · 2021-04-14T12:52:31Z

Never mind about the first error. I forgot I changed the 3DSSD config file

* fix ci * add nvidia key * remote torch * recover pytorch

* docs(docs/zh_cn): add doc and link checker * docs(REAME): update * docs(docs/zh_cn): update * docs(benchmark): update table * docs(zh_cn/benchmark): update link * CI(docs): update link check * ci(doc): update checker * docs(zh_cn): update * style(ci): remove useless para * style(ci): update * docs(zh_cn): update * docs(benchmark.md): fix mobilnet link error * docs(docs/zh_cn): add doc and link checker * docs(REAME): update * docs(docs/zh_cn): update * docs(benchmark): update table * docs(zh_cn/benchmark): update link * CI(docs): update link check * ci(doc): update checker * docs(zh_cn): update * style(ci): remove useless para * style(ci): update * docs(zh_cn): update * docs(benchmark.md): fix mobilnet link error * docs(zh_cn/do_regression_test.md): rebase * docs(docs/zh_cn): add doc and link checker * Update README_zh-CN.md * Update README_zh-CN.md * Update index.rst * Update check-doc-link.yml * [Fix] Fix ci (open-mmlab#426) * fix ci * add nvidia key * remote torch * recover pytorch * ci(codecov): ignore ci * docs(zh_cn): add get_started.md * docs(zh_cn): fix review advice * docs(readthedocs): update * docs(zh_CN): update * docs(zh_CN): revert * fix(docs): review advices * fix(docs): review advices * fix(docs): review Co-authored-by: q.yao <streetyao@live.com>

* refactor(onnx2ncnn.cpp): split it to shape_inference, pass and utils * refactor(onnx2ncnn.cpp): split it to shape_inference, pass and utils * refactor(onnx2ncnn.cpp): split code * refactor(net_module.cpp): fix build error * ci(test_onnx2ncnn.py): add generate model adn run * ci(onnx2ncnn): add ncnn backend * ci(test_onnx2ncnn): add converted onnx model` * ci(onnx2ncnn): fix ncnn tar * ci(backed-ncnn): simplify dependency install * ci(onnx2ncnn): fix apt install * Update backend-ncnn.yml * Update backend-ncnn.yml * Update backend-ncnn.yml * Update backend-ncnn.yml * Update backend-ncnn.yml * Update backend-ncnn.yml * Update backend-ncnn.yml * Update backend-ncnn.yml * Update backend-ncnn.yml * Update backend-ncnn.yml * Update backend-ncnn.yml * fix(ci): add include algorithm * Update build.yml * parent aa85760 author q.yao <streetyao@live.com> 1651287879 +0800 committer tpoisonooo <khj.application@aliyun.com> 1652169959 +0800 [Fix] Fix ci (open-mmlab#426) * fix ci * add nvidia key * remote torch * recover pytorch refactor(onnx2ncnn.cpp): split it to shape_inference, pass and utils * fix(onnx2ncnn): review * fix(onnx2ncnn): build error Co-authored-by: q.yao <streetyao@live.com>

Tai-Wang added the reimplementation label Apr 10, 2021

RolandoAvides closed this as completed Apr 14, 2021

ZCMax mentioned this issue Jan 18, 2022

ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory #1180

Closed

tpoisonooo pushed a commit to tpoisonooo/mmdetection3d that referenced this issue Sep 5, 2022

[Fix] Fix ci (open-mmlab#426)

59453dc

* fix ci * add nvidia key * remote torch * recover pytorch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR Trying to train 3DSSD #426

ERROR Trying to train 3DSSD #426

RolandoAvides commented Apr 8, 2021

Tai-Wang commented Apr 8, 2021

RolandoAvides commented Apr 8, 2021

Tai-Wang commented Apr 8, 2021

RolandoAvides commented Apr 11, 2021 •

edited

Tai-Wang commented Apr 12, 2021

RolandoAvides commented Apr 14, 2021

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021 •

edited

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021 •

edited

RolandoAvides commented Apr 14, 2021

ERROR Trying to train 3DSSD #426

ERROR Trying to train 3DSSD #426

Comments

RolandoAvides commented Apr 8, 2021

Tai-Wang commented Apr 8, 2021

RolandoAvides commented Apr 8, 2021

Tai-Wang commented Apr 8, 2021

RolandoAvides commented Apr 11, 2021 • edited

Tai-Wang commented Apr 12, 2021

RolandoAvides commented Apr 14, 2021

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021 • edited

Tai-Wang commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021

RolandoAvides commented Apr 14, 2021 • edited

RolandoAvides commented Apr 14, 2021

RolandoAvides commented Apr 11, 2021 •

edited

RolandoAvides commented Apr 14, 2021 •

edited

RolandoAvides commented Apr 14, 2021 •

edited