-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I got a problem when I use KITTI dataset to train the model D2det-mmdet2.1 #4700
Comments
Sorry. I also don't know why does 'shape mismatch: value tensor of shape [8, 256, 7, 7] cannot be broadcast to indexing result of shape [9, 256, 7, 7]' appear. Can you provide more information that uses the issue template? |
I think this problem should appear in the process of back propagation. Every time an error is reported, the size of the last three dimensions of tensor is always 256, 7 and 7, but the first dimension of 8 will change. I don't know if the number 8 represents the number of ROI feature maps. |
Thank you for your reply. I think this problem should appear in the process of back propagation. Every time an error is reported, the size of the last three dimensions of tensor is always 256, 7 and 7, but the first dimension of 8 will change. I don't know if the number 8 represents the number of ROI feature maps. |
I think so. |
Could you please give me some suggestions to solve this problem? |
Can you provide more information that uses the Error report? @Machine97 |
@hhaAndroid What I've shown is the whole Error report. The other Environment information and Config are as follows: /opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py:64: UserWarning: The old API of
|
@hhaAndroid When the above error occurs, batchsize is 2. Once I set batchsize to 1, the following error will also occur apart from the above occur: Traceback (most recent call last): Process finished with exit code 1 |
@hhaAndroid The problem has been solved. The reason is that during the processing of KITTI data set, other classes except Car,Pedestrian, Cyclist and DontCare will be marked with -1. In version 2.1 of mmdetection, there is no error reminder of ' assertion`cur _ target > = 0&&cur _ target < n _ classes' failed'. |
I tried to train the model D2det-mmdet2.1 with KITTI dataset, the following errors occur every time:
2021-03-01 09:19:43,961 - mmdet - INFO - Epoch [1][440/1856] lr: 5.067e-06, eta: 8:32:39,
time: 0.400, data_time: 0.100, memory: 4264, loss_rpn_cls: 0.3128, loss_rpn_bbox: 0.2060, loss_cls: 0.2960, acc: 96.9043, loss_reg: 0.2682, loss_mask: 0.6795, loss: 1.7624
2021-03-01 09:19:47,923 - mmdet - INFO - Epoch [1][450/1856] lr: 5.177e-06, eta: 8:32:01,
time: 0.396, data_time: 0.095, memory: 4264, loss_rpn_cls: 0.3051, loss_rpn_bbox: 0.1585, loss_cls: 0.2783, acc: 96.7188, loss_reg: 0.2841, loss_mask: 0.6796, loss: 1.7056
2021-03-01 09:19:51,800 - mmdet - INFO - Epoch [1][460/1856] lr: 5.286e-06, eta: 8:31:11,
time: 0.388, data_time: 0.087, memory: 4264, loss_rpn_cls: 0.2838, loss_rpn_bbox: 0.1638, loss_cls: 0.2632, acc: 96.6406, loss_reg: 0.2799, loss_mask: 0.6799, loss: 1.6707
Traceback (most recent call last):
File "train.py", line 161, in
main()
File "train.py", line 157, in main
meta=meta)
File "/work_dirs/D2Det_mmdet2.1/mmdet/apis/train.py", line 179, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], *kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 43, in train
self.call_hook('after_train_iter')
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 282, in call_hook
getattr(hook, fn_name)(self) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 21, in after_train_iter runner.outputs['loss'].backward()
File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag
RuntimeError: shape mismatch: value tensor of shape [8, 256, 7, 7] cannot be broadcast to indexing result of shape [9, 256, 7, 7] (make_index_put_iterator at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/TensorAdvancedIndexing.cpp:215)frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f43abb90b5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: at::native::index_put_impl(at::Tensor&, c10::ArrayRefat::Tensor, at::Tensor const&, bool, bool) + 0x712 (0x7f43d38d0b82 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #2: + 0xee23de (0x7f43d3c543de in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #3: at::native::index_put_(at::Tensor&, c10::ArrayRefat::Tensor, at::Tensor const&, bool) + 0x135 (0x7f43d38c0255 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: + 0xee210e (0x7f43d3c5410e in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #5: + 0x288fa88 (0x7f43d5601a88 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #6: torch::autograd::generated::IndexPutBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x251 (0x7f43d53cc201 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #7: + 0x2ae8215 (0x7f43d585a215 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node, torch::autograd::InputBuffer&) + 0x16f3 (0x7f43d5857513 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7f43d58582f2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f43d5850969 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f43d8b97558 in
/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: + 0xc819d (0x7f43db5ff19d in /opt/conda/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #13: + 0x76db (0x7f43fbfdf6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #14: clone + 0x3f (0x7f43fbd0888f in /lib/x86_64-linux-gnu/libc.so.6)
This error occurs randomly in different iteration. In addition, every time the error occured, the first dimension size of the tensor [8, 256, 7, 7] is different.
Do you know the possible reasons for this error?
The text was updated successfully, but these errors were encountered: