Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError when training CenterPoint on Kitti Dataset #1497

Closed
ASarrouj opened this issue May 21, 2022 · 1 comment
Closed

ValueError when training CenterPoint on Kitti Dataset #1497

ASarrouj opened this issue May 21, 2022 · 1 comment
Labels

Comments

@ASarrouj
Copy link

Describe the bug
When attempting to run tools/train.py, I encounter an error trying to get values from a tensor that appears to be smaller than expected. More specifically, a 3d bounding box variable is expected to contain 9 values when it only contains 7, leading to this error. Printing the tensor yields tensor([27.3678, -2.0306, 0.2165, 1.2211, 0.6933, 1.9455, 0.2747], device='cuda:0'). This error happens after training has 'started', ie during the first loss calculation

Reproduction

  1. What command or script did you run?
python tools/train.py configs/centerpoint/custom_kitti.py 
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
    • Implementing the config listed in Issue implementing Centerpoint on KITTI dataset #871 with minor modifications, namely lowering the samples_per_gpu and increasing the
    • gpu_ids range. I believe I understand the two fields modified but not much else
  2. What dataset did you use?
    • Kitti, using the link in the data_preperation.md doc

Environment

  1. Please run python mmdet3d/utils/collect_env.py to collect necessary environment information and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source] (Conda)
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)
sys.platform: linux
Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]
CUDA available: True
GPU 0,1: Tesla V100S-PCIE-32GB
CUDA_HOME: /apps/cuda/11.1.1
NVCC: Cuda compilation tools, release 11.1, V11.1.105
GCC: gcc (GCC) 10.3.0
PyTorch: 1.10.1+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.11.2+cu111
OpenCV: 4.5.5
MMCV: 1.5.1
MMCV Compiler: GCC 8.4
MMCV CUDA Compiler: 11.1
MMDetection: 2.23.0
MMSegmentation: 0.24.1
MMDetection3D: 1.0.0rc2+76e351a
spconv2.0: False

Error traceback
Traceback (most recent call last):
File "tools/train.py", line 263, in
main()
File "tools/train.py", line 259, in main
meta=meta)
File "PATHTO/mmdetection3d/mmdet3d/apis/train.py", line 351, in train_model
meta=meta)
File "PATHTO/mmdetection3d/mmdet3d/apis/train.py", line 319, in train_detector
runner.run(data_loaders, cfg.workflow)
File "PATHTO/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "PATHTO/mmcv/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "PATHTO/mmcv/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "PATHTO/mmcv/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "PATHTO/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "PATHTO/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "PATHTO/mmcv/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "PATHTO/mmdetection3d/mmdet3d/models/detectors/base.py", line 60, in forward
return self.forward_train(**kwargs)
File "PATHTO/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 279, in forward_train
gt_bboxes_ignore)
File "PATHTO/mmdetection3d/mmdet3d/models/detectors/centerpoint.py", line 73, in forward_pts_train
losses = self.pts_bbox_head.loss(*loss_inputs)
File "PATHTO/mmcv/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(*args, **kwargs)
File "PATHTO/mmdetection3d/mmdet3d/models/dense_heads/centerpoint_head.py", line 586, in loss
gt_bboxes_3d, gt_labels_3d)
File "PATHTO/mmdetection3d/mmdet3d/models/dense_heads/centerpoint_head.py", line 418, in get_targets
self.get_targets_single, gt_bboxes_3d, gt_labels_3d)
File "PATHTO/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "PATHTO/mmdetection3d/mmdet3d/models/dense_heads/centerpoint_head.py", line 552, in get_targets_single
vx, vy = task_boxes[idx][k][7:]
ValueError: not enough values to unpack (expected 2, got 0)

@Tai-Wang
Copy link
Member

The current version of centerpoint may need some modifications to be compatible with KITTI training. You can refer to this PR for more information and experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants