ValueError when training CenterPoint on Kitti Dataset #1497

ASarrouj · 2022-05-21T07:17:13Z

Describe the bug
When attempting to run tools/train.py, I encounter an error trying to get values from a tensor that appears to be smaller than expected. More specifically, a 3d bounding box variable is expected to contain 9 values when it only contains 7, leading to this error. Printing the tensor yields tensor([27.3678, -2.0306, 0.2165, 1.2211, 0.6933, 1.9455, 0.2747], device='cuda:0'). This error happens after training has 'started', ie during the first loss calculation

Reproduction

What command or script did you run?

python tools/train.py configs/centerpoint/custom_kitti.py

Did you make any modifications on the code or config? Did you understand what you have modified?
- Implementing the config listed in Issue implementing Centerpoint on KITTI dataset #871 with minor modifications, namely lowering the samples_per_gpu and increasing the
- gpu_ids range. I believe I understand the two fields modified but not much else
What dataset did you use?
- Kitti, using the link in the data_preperation.md doc

Environment

Please run python mmdet3d/utils/collect_env.py to collect necessary environment information and paste it here.
You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source] (Conda)
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

sys.platform: linux
Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]
CUDA available: True
GPU 0,1: Tesla V100S-PCIE-32GB
CUDA_HOME: /apps/cuda/11.1.1
NVCC: Cuda compilation tools, release 11.1, V11.1.105
GCC: gcc (GCC) 10.3.0
PyTorch: 1.10.1+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.11.2+cu111
OpenCV: 4.5.5
MMCV: 1.5.1
MMCV Compiler: GCC 8.4
MMCV CUDA Compiler: 11.1
MMDetection: 2.23.0
MMSegmentation: 0.24.1
MMDetection3D: 1.0.0rc2+76e351a
spconv2.0: False

Error traceback
Traceback (most recent call last):
File "tools/train.py", line 263, in
main()
File "tools/train.py", line 259, in main
meta=meta)
File "PATHTO/mmdetection3d/mmdet3d/apis/train.py", line 351, in train_model
meta=meta)
File "PATHTO/mmdetection3d/mmdet3d/apis/train.py", line 319, in train_detector
runner.run(data_loaders, cfg.workflow)
File "PATHTO/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "PATHTO/mmcv/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "PATHTO/mmcv/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "PATHTO/mmcv/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "PATHTO/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "PATHTO/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "PATHTO/mmcv/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "PATHTO/mmdetection3d/mmdet3d/models/detectors/base.py", line 60, in forward
return self.forward_train(**kwargs)
File "PATHTO/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 279, in forward_train
gt_bboxes_ignore)
File "PATHTO/mmdetection3d/mmdet3d/models/detectors/centerpoint.py", line 73, in forward_pts_train
losses = self.pts_bbox_head.loss(*loss_inputs)
File "PATHTO/mmcv/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(*args, **kwargs)
File "PATHTO/mmdetection3d/mmdet3d/models/dense_heads/centerpoint_head.py", line 586, in loss
gt_bboxes_3d, gt_labels_3d)
File "PATHTO/mmdetection3d/mmdet3d/models/dense_heads/centerpoint_head.py", line 418, in get_targets
self.get_targets_single, gt_bboxes_3d, gt_labels_3d)
File "PATHTO/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "PATHTO/mmdetection3d/mmdet3d/models/dense_heads/centerpoint_head.py", line 552, in get_targets_single
vx, vy = task_boxes[idx][k][7:]
ValueError: not enough values to unpack (expected 2, got 0)

The text was updated successfully, but these errors were encountered:

Tai-Wang · 2022-05-29T03:16:55Z

The current version of centerpoint may need some modifications to be compatible with KITTI training. You can refer to this PR for more information and experience.

Tai-Wang added the usage label May 29, 2022

Tai-Wang closed this as completed Jun 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError when training CenterPoint on Kitti Dataset #1497

ValueError when training CenterPoint on Kitti Dataset #1497

ASarrouj commented May 21, 2022

Tai-Wang commented May 29, 2022

ValueError when training CenterPoint on Kitti Dataset #1497

ValueError when training CenterPoint on Kitti Dataset #1497

Comments

ASarrouj commented May 21, 2022

Tai-Wang commented May 29, 2022