I got a problem when I use KITTI dataset to train the model D2det-mmdet2.1 #4700

Machine97 · 2021-03-02T02:48:18Z

I tried to train the model D2det-mmdet2.1 with KITTI dataset, the following errors occur every time:

2021-03-01 09:19:43,961 - mmdet - INFO - Epoch [1][440/1856] lr: 5.067e-06, eta: 8:32:39,
time: 0.400, data_time: 0.100, memory: 4264, loss_rpn_cls: 0.3128, loss_rpn_bbox: 0.2060, loss_cls: 0.2960, acc: 96.9043, loss_reg: 0.2682, loss_mask: 0.6795, loss: 1.7624
2021-03-01 09:19:47,923 - mmdet - INFO - Epoch [1][450/1856] lr: 5.177e-06, eta: 8:32:01,
time: 0.396, data_time: 0.095, memory: 4264, loss_rpn_cls: 0.3051, loss_rpn_bbox: 0.1585, loss_cls: 0.2783, acc: 96.7188, loss_reg: 0.2841, loss_mask: 0.6796, loss: 1.7056
2021-03-01 09:19:51,800 - mmdet - INFO - Epoch [1][460/1856] lr: 5.286e-06, eta: 8:31:11,
time: 0.388, data_time: 0.087, memory: 4264, loss_rpn_cls: 0.2838, loss_rpn_bbox: 0.1638, loss_cls: 0.2632, acc: 96.6406, loss_reg: 0.2799, loss_mask: 0.6799, loss: 1.6707
Traceback (most recent call last):
File "train.py", line 161, in
main()
File "train.py", line 157, in main
meta=meta)
File "/work_dirs/D2Det_mmdet2.1/mmdet/apis/train.py", line 179, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], *kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 43, in train
self.call_hook('after_train_iter')
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 282, in call_hook
getattr(hook, fn_name)(self) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 21, in after_train_iter runner.outputs['loss'].backward()
File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag
RuntimeError: shape mismatch: value tensor of shape [8, 256, 7, 7] cannot be broadcast to indexing result of shape [9, 256, 7, 7] (make_index_put_iterator at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/TensorAdvancedIndexing.cpp:215)frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f43abb90b5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: at::native::index_put_impl(at::Tensor&, c10::ArrayRefat::Tensor, at::Tensor const&, bool, bool) + 0x712 (0x7f43d38d0b82 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #2: + 0xee23de (0x7f43d3c543de in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #3: at::native::index_put_(at::Tensor&, c10::ArrayRefat::Tensor, at::Tensor const&, bool) + 0x135 (0x7f43d38c0255 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: + 0xee210e (0x7f43d3c5410e in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #5: + 0x288fa88 (0x7f43d5601a88 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #6: torch::autograd::generated::IndexPutBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x251 (0x7f43d53cc201 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #7: + 0x2ae8215 (0x7f43d585a215 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node, torch::autograd::InputBuffer&) + 0x16f3 (0x7f43d5857513 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7f43d58582f2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f43d5850969 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f43d8b97558 in
/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: + 0xc819d (0x7f43db5ff19d in /opt/conda/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #13: + 0x76db (0x7f43fbfdf6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #14: clone + 0x3f (0x7f43fbd0888f in /lib/x86_64-linux-gnu/libc.so.6)

This error occurs randomly in different iteration. In addition, every time the error occured, the first dimension size of the tensor [8, 256, 7, 7] is different.
Do you know the possible reasons for this error?

hhaAndroid · 2021-03-02T03:15:24Z

Sorry. I also don't know why does 'shape mismatch: value tensor of shape [8, 256, 7, 7] cannot be broadcast to indexing result of shape [9, 256, 7, 7]' appear. Can you provide more information that uses the issue template?

Machine97 · 2021-03-02T03:36:12Z

I think this problem should appear in the process of back propagation. Every time an error is reported, the size of the last three dimensions of tensor is always 256, 7 and 7, but the first dimension of 8 will change. I don't know if the number 8 represents the number of ROI feature maps.

Machine97 · 2021-03-02T06:54:28Z

Sorry. I also don't know why does 'shape mismatch: value tensor of shape [8, 256, 7, 7] cannot be broadcast to indexing result of shape [9, 256, 7, 7]' appear. Can you provide more information that uses the issue template?

Thank you for your reply. I think this problem should appear in the process of back propagation. Every time an error is reported, the size of the last three dimensions of tensor is always 256, 7 and 7, but the first dimension of 8 will change. I don't know if the number 8 represents the number of ROI feature maps.

hhaAndroid · 2021-03-02T07:01:28Z

Sorry. I also don't know why does 'shape mismatch: value tensor of shape [8, 256, 7, 7] cannot be broadcast to indexing result of shape [9, 256, 7, 7]' appear. Can you provide more information that uses the issue template?

Thank you for your reply. I think this problem should appear in the process of back propagation. Every time an error is reported, the size of the last three dimensions of tensor is always 256, 7 and 7, but the first dimension of 8 will change. I don't know if the number 8 represents the number of ROI feature maps.

I think so.

Machine97 · 2021-03-02T07:13:21Z

Sorry. I also don't know why does 'shape mismatch: value tensor of shape [8, 256, 7, 7] cannot be broadcast to indexing result of shape [9, 256, 7, 7]' appear. Can you provide more information that uses the issue template?

Thank you for your reply. I think this problem should appear in the process of back propagation. Every time an error is reported, the size of the last three dimensions of tensor is always 256, 7 and 7, but the first dimension of 8 will change. I don't know if the number 8 represents the number of ROI feature maps.

I think so.

Could you please give me some suggestions to solve this problem?

hhaAndroid · 2021-03-02T07:23:55Z

Can you provide more information that uses the Error report? @Machine97

Machine97 · 2021-03-02T07:42:02Z

@hhaAndroid What I've shown is the whole Error report. The other Environment information and Config are as follows:

/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py:64: UserWarning: The old API of
register_module(module, force=False) is deprecated and will be removed, please use the new A
PI register_module(name=None, force=False, module=None) instead.
'The old API of register_module(module, force=False) '
2021-03-01 09:16:33,083 - mmdet - INFO - Environment info:

sys.platform: linux
Python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0,1,2,3: GeForce RTX 2080 Ti
GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
PyTorch: 1.5.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 arch
itecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,cod
e=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch
=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=comp
ute_37
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inl
ines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK
-DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wn
o-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sig
n-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result
-Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-ov
erflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics
-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-e
rrno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX
2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_
MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DIS
PATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8
OpenCV: 4.5.1
MMCV: 0.6.1
MMDetection: 2.1.0+unknown
MMDetection Compiler: GCC 7.4
MMDetection CUDA Compiler: 10.1

2021-03-01 09:16:33,083 - mmdet - INFO - Distributed training: False
2021-03-01 09:16:33,867 - mmdet - INFO - Config:
dataset_type = 'KittiDataset'
data_root = '/dataset/kitti/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
img_norm_cfg = dict(
mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=True)
input_modality = dict(use_lidar=True, use_camera=True)
voxel_size = [0.1, 0.1, 0.1]
point_cloud_range = [0, -40, -3, 70, 40, 1]
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='Resize',
img_scale=(1242, 375),
multiscale_mode='value',
keep_ratio=False),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='CollectKitti', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1242, 375),
flip=False,
transforms=[
dict(
type='Resize',
img_scale=(1242, 375),
multiscale_mode='value',
keep_ratio=False),
dict(type='RandomFlip'),
dict(type='PointsRange2DFilter', target_size=[384, 1248]),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='PointsToTensor', keys=['points']),
dict(type='CollectKitti', keys=['points', 'img'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=0,
train=dict(
type='KittiDataset',
data_root='/dataset/kitti/',
ann_file='/dataset/kitti/kitti_infos_train.pkl',
split='training',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='Resize',
img_scale=(1242, 375),
multiscale_mode='value',
keep_ratio=False),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='CollectKitti', keys=['img', 'gt_bboxes', 'gt_labels'])
],
modality=dict(use_lidar=True, use_camera=True),
classes=['Pedestrian', 'Cyclist', 'Car'],
test_mode=False),
val=dict(
type='KittiDataset',
data_root='/dataset/kitti/',
ann_file='/dataset/kitti/kitti_infos_val.pkl',
split='training',
pipeline=[
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4,
use_dim=4),
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1242, 375),
flip=False,
transforms=[
dict(
type='Resize',
img_scale=(1242, 375),
multiscale_mode='value',
keep_ratio=False),
dict(type='RandomFlip'),
dict(type='PointsRange2DFilter', target_size=[384, 1248]),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='PointsToTensor', keys=['points']),
dict(type='CollectKitti', keys=['points', 'img'])
])
],
modality=dict(use_lidar=True, use_camera=True),
classes=['Pedestrian', 'Cyclist', 'Car'],
test_mode=True),
test=dict(
type='KittiDataset',
data_root='/dataset/kitti/',
ann_file='/dataset/kitti/kitti_infos_val.pkl',
split='training',
pipeline=[
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4,
use_dim=4),
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1242, 375),
flip=False,
transforms=[
dict(
type='Resize',
img_scale=(1242, 375),
multiscale_mode='value',
keep_ratio=False),
dict(type='RandomFlip'),
dict(type='PointsRange2DFilter', target_size=[384, 1248]),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='PointsToTensor', keys=['points']),
dict(type='CollectKitti', keys=['points', 'img'])
])
],
modality=dict(use_lidar=True, use_camera=True),
classes=['Pedestrian', 'Cyclist', 'Car'],
test_mode=True))
model = dict(
type='D2Det',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(
type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
roi_head=dict(
type='D2DetRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(
type='DeformRoIPoolingPack',
out_size=7,
sample_per_part=2,
out_channels=256,
no_trans=False,
group_size=1,
trans_std=0.1),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
with_reg=False,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=2.0)),
reg_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=14, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
d2det_head=dict(
type='D2DetHead',
num_convs=1,
in_channels=256,
num_classes=80,
norm_cfg=dict(type='GN', num_groups=36),
MASK_ON=False)))
train_cfg = dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_across_levels=False,
nms_pre=2000,
nms_post=2000,
max_num=2000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_radius=1,
pos_weight=-1,
max_num_reg=192,
mask_size=28,
debug=False))
test_cfg = dict(
rpn=dict(
nms_across_levels=False,
nms_pre=1000,
nms_post=1000,
max_num=1000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=dict(
score_thr=0.03, nms=dict(type='nms', iou_thr=0.5), max_per_img=125))
optimizer = dict(type='SGD', lr=2e-05, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=1800,
warmup_ratio=0.0125,
step=[20, 23])
checkpoint_config = dict(interval=1)
log_config = dict(interval=10, hooks=[dict(type='TextLoggerHook')])
evaluation = dict(interval=1)
total_epochs = 40
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = '/work_dirs/D2Det_mmdet2.1/work_dir/'
load_from = None
resume_from = None
workflow = [('train', 1)]
gpu_ids = [0]

2021-03-01 09:16:34,738 - mmdet - INFO - load model from: torchvision://resnet50
2021-03-01 09:16:35,128 - mmdet - WARNING - The model and loaded state dict do not match exac
tly

unexpected key in source state_dict: fc.weight, fc.bias

/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py:168: UserWarning: Ru
nner was deprecated, please use EpochBasedRunner instead
'Runner was deprecated, please use EpochBasedRunner instead')
/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py:59: UserWarning: batch_proc
essor is deprecated, please implement train_step() and val_step() in the model instead.
warnings.warn('batch_processor is deprecated, please implement '
2021-03-01 09:16:40,538 - mmdet - INFO - Start running, host: root@6176961423d3, work_dir: /w
ork_dirs/D2Det_mmdet2.1/work_dir
2021-03-01 09:16:40,538 - mmdet - INFO - workflow: [('train', 1)], max: 40 epochs
2021-03-01 09:16:47,118 - mmdet - INFO - Epoch [1][10/1856] lr: 3.487e-07, eta: 13:30:46,
time: 0.655, data_time: 0.278, memory: 4260, loss_rpn_cls: 1.1617, loss_rpn_bbox: 0.8447, lo
ss_cls: 48.0927, acc: 42.9688, loss_reg: 1.0057, loss_mask: 0.6919, loss: 51.7967
2021-03-01 09:16:51,577 - mmdet - INFO - Epoch [1][20/1856] lr: 4.585e-07, eta: 11:21:03,
time: 0.446, data_time: 0.133, memory: 4262, loss_rpn_cls: 0.9803, loss_rpn_bbox: 0.9297, lo
ss_cls: 1.3802, acc: 98.9551, loss_reg: 1.0286, loss_mask: 0.6909, loss: 5.0097
2021-03-01 09:16:55,695 - mmdet - INFO - Epoch [1][30/1856] lr: 5.682e-07, eta: 10:23:49,
time: 0.412, data_time: 0.100, memory: 4262, loss_rpn_cls: 0.7760, loss_rpn_bbox: 0.6778, lo
ss_cls: 1.3409, acc: 99.1309, loss_reg: 1.0046, loss_mask: 0.6911, loss: 4.4905
。。。。

Config about testing is not used, so you don't have to pay attention to it. Thank you very much!

Machine97 · 2021-03-02T09:34:35Z

@hhaAndroid When the above error occurs, batchsize is 2. Once I set batchsize to 1, the following error will also occur apart from the above occur:

Traceback (most recent call last):
File "/work_dirs/D2Det_mmdet2.1/tools/train.py", line 161, in
main()
File "/work_dirs/D2Det_mmdet2.1/tools/train.py", line 157, in main
meta=meta)
File "/work_dirs/D2Det_mmdet2.1/mmdet/apis/train.py", line 179, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], *kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 43, in train
self.call_hook('after_train_iter')
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 282, in call_hook
getattr(hook, fn_name)(self)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 21, in after_train_iter
runner.outputs['loss'].backward()
File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Function IndexPutBackward returned an invalid gradient at index 1 - got [353, 256, 7, 7] but expected shape compatible with [355, 256, 7, 7] (validate_outputs at /opt/conda/conda-bld/pytorch_1587428398394/work/torch/csrc/autograd/engine.cpp:472)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7fc24d54fb5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x2ae3134 (0x7fc277214134 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #2: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node, torch::autograd::InputBuffer&) + 0x548 (0x7fc277215368 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7fc2772172f2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fc27720f969 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7fc27a556558 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0xc819d (0x7fc28fae019d in /opt/conda/bin/../lib/libstdc++.so.6)
frame #7: + 0x76db (0x7fc29e1746db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #8: clone + 0x3f (0x7fc29de9d88f in /lib/x86_64-linux-gnu/libc.so.6)

Process finished with exit code 1

Machine97 · 2021-03-04T11:53:51Z

@hhaAndroid The problem has been solved. The reason is that during the processing of KITTI data set, other classes except Car,Pedestrian, Cyclist and DontCare will be marked with -1. In version 2.1 of mmdetection, there is no error reminder of ' assertion`cur _ target > = 0&&cur _ target < n _ classes' failed'.

openmmlab-bot assigned hhaAndroid Mar 2, 2021

Machine97 closed this as completed Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I got a problem when I use KITTI dataset to train the model D2det-mmdet2.1 #4700

I got a problem when I use KITTI dataset to train the model D2det-mmdet2.1 #4700

Machine97 commented Mar 2, 2021

hhaAndroid commented Mar 2, 2021

Machine97 commented Mar 2, 2021

Machine97 commented Mar 2, 2021

hhaAndroid commented Mar 2, 2021

Machine97 commented Mar 2, 2021

hhaAndroid commented Mar 2, 2021

Machine97 commented Mar 2, 2021

Machine97 commented Mar 2, 2021

Machine97 commented Mar 4, 2021

I got a problem when I use KITTI dataset to train the model D2det-mmdet2.1 #4700

I got a problem when I use KITTI dataset to train the model D2det-mmdet2.1 #4700

Comments

Machine97 commented Mar 2, 2021

hhaAndroid commented Mar 2, 2021

Machine97 commented Mar 2, 2021

Machine97 commented Mar 2, 2021

hhaAndroid commented Mar 2, 2021

Machine97 commented Mar 2, 2021

hhaAndroid commented Mar 2, 2021

Machine97 commented Mar 2, 2021

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.5.1 MMCV: 0.6.1 MMDetection: 2.1.0+unknown MMDetection Compiler: GCC 7.4 MMDetection CUDA Compiler: 10.1

Machine97 commented Mar 2, 2021

Machine97 commented Mar 4, 2021

TorchVision: 0.6.0a0+82fd1c8
OpenCV: 4.5.1
MMCV: 0.6.1
MMDetection: 2.1.0+unknown
MMDetection Compiler: GCC 7.4
MMDetection CUDA Compiler: 10.1